diff options
43 files changed, 639 insertions, 429 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index fb2b953ad..7ae8754e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,33 @@ # Changelog +## [v1.29.2](https://github.com/netdata/netdata/tree/v1.29.2) (2021-02-18) + +[Full Changelog](https://github.com/netdata/netdata/compare/v1.29.1...v1.29.2) + +**Merged pull requests:** + +- Fix the context filtering on the data query endpoint [\#10652](https://github.com/netdata/netdata/pull/10652) ([stelfrag](https://github.com/stelfrag)) +- fix container/host detection in system-info script [\#10647](https://github.com/netdata/netdata/pull/10647) ([ilyam8](https://github.com/ilyam8)) +- Enable apps.plugin aggregation debug messages [\#10645](https://github.com/netdata/netdata/pull/10645) ([vlvkobal](https://github.com/vlvkobal)) +- add small delay to the ipv4\_tcp\_resets alarms [\#10644](https://github.com/netdata/netdata/pull/10644) ([ilyam8](https://github.com/ilyam8)) +- collectors/proc: fix collecting operstate for virtual network interfaces [\#10633](https://github.com/netdata/netdata/pull/10633) ([ilyam8](https://github.com/ilyam8)) +- fix sendmail unrecognized option F error [\#10631](https://github.com/netdata/netdata/pull/10631) ([ilyam8](https://github.com/ilyam8)) +- Fix typo in web/gui/readme.md [\#10623](https://github.com/netdata/netdata/pull/10623) ([OdysLam](https://github.com/OdysLam)) +- add freeswitch to apps\_groups [\#10621](https://github.com/netdata/netdata/pull/10621) ([fayak](https://github.com/fayak)) +- Add ACLK proxy setting as host label [\#10619](https://github.com/netdata/netdata/pull/10619) ([underhood](https://github.com/underhood)) +- dashboard@v2.13.6 [\#10618](https://github.com/netdata/netdata/pull/10618) ([jacekkolasa](https://github.com/jacekkolasa)) +- Disable stock alarms [\#10617](https://github.com/netdata/netdata/pull/10617) ([thiagoftsm](https://github.com/thiagoftsm)) +- Fixes \#10597 raw binary data should never be printed [\#10603](https://github.com/netdata/netdata/pull/10603) ([rda0](https://github.com/rda0)) +- collectors/proc: change ksm mem chart type to stacked [\#10598](https://github.com/netdata/netdata/pull/10598) ([ilyam8](https://github.com/ilyam8)) +- ACLK reduce excessive logging [\#10596](https://github.com/netdata/netdata/pull/10596) ([underhood](https://github.com/underhood)) +- add k8s\_cluster\_id host label [\#10588](https://github.com/netdata/netdata/pull/10588) ([ilyam8](https://github.com/ilyam8)) +- add resetting CapabilityBoundingSet workaround to the python.d collectors \(that use `sudo`\) readmes [\#10587](https://github.com/netdata/netdata/pull/10587) ([ilyam8](https://github.com/ilyam8)) +- collectors/elasticsearch: document `scheme` option [\#10572](https://github.com/netdata/netdata/pull/10572) ([vjt](https://github.com/vjt)) +- Update claiming docs for Docker containers. [\#10570](https://github.com/netdata/netdata/pull/10570) ([Ferroin](https://github.com/Ferroin)) +- health: make Opsgenie API URL configurable [\#10561](https://github.com/netdata/netdata/pull/10561) ([tinyhammers](https://github.com/tinyhammers)) +- Allow the REMOVED alarm status via ACLK if the previous status was WARN/CRIT [\#10533](https://github.com/netdata/netdata/pull/10533) ([stelfrag](https://github.com/stelfrag)) +- Change eBPF plugin internal [\#10442](https://github.com/netdata/netdata/pull/10442) ([thiagoftsm](https://github.com/thiagoftsm)) + ## [v1.29.1](https://github.com/netdata/netdata/tree/v1.29.1) (2021-02-09) [Full Changelog](https://github.com/netdata/netdata/compare/v1.29.0...v1.29.1) @@ -217,22 +245,6 @@ - Add docsv2 project to master branch [\#10000](https://github.com/netdata/netdata/pull/10000) ([joelhans](https://github.com/joelhans)) - Allow connecting to arbitrary MQTT WSS broker for devs [\#9999](https://github.com/netdata/netdata/pull/9999) ([underhood](https://github.com/underhood)) - minor - removes leading whitespace before JSON in ACLK [\#9998](https://github.com/netdata/netdata/pull/9998) ([underhood](https://github.com/underhood)) -- Fixed typos in installer functions. [\#9992](https://github.com/netdata/netdata/pull/9992) ([Ferroin](https://github.com/Ferroin)) -- Fixed locking order to address CID\_362348 [\#9991](https://github.com/netdata/netdata/pull/9991) ([stelfrag](https://github.com/stelfrag)) -- Switched to our installer's bundling code for libJudy in static installs. [\#9988](https://github.com/netdata/netdata/pull/9988) ([Ferroin](https://github.com/Ferroin)) -- Fix cleanup of obsolete charts [\#9985](https://github.com/netdata/netdata/pull/9985) ([mfundul](https://github.com/mfundul)) -- Added more stringent check for C99 support in configure script. [\#9982](https://github.com/netdata/netdata/pull/9982) ([Ferroin](https://github.com/Ferroin)) -- Improved the data query when using the context parameter [\#9978](https://github.com/netdata/netdata/pull/9978) ([stelfrag](https://github.com/stelfrag)) -- Fix missing libelf-dev dependency. [\#9974](https://github.com/netdata/netdata/pull/9974) ([roedie](https://github.com/roedie)) -- allow using LWS without SOCKS5 [\#9973](https://github.com/netdata/netdata/pull/9973) ([underhood](https://github.com/underhood)) -- Cleanup CODEOWNERS [\#9971](https://github.com/netdata/netdata/pull/9971) ([prologic](https://github.com/prologic)) -- Fix Stackpulse doc [\#9968](https://github.com/netdata/netdata/pull/9968) ([thiagoftsm](https://github.com/thiagoftsm)) -- Fix setting for disabling eBPF-apps.plugin integration [\#9967](https://github.com/netdata/netdata/pull/9967) ([joelhans](https://github.com/joelhans)) -- Added improved auto-update support. [\#9966](https://github.com/netdata/netdata/pull/9966) ([Ferroin](https://github.com/Ferroin)) -- Stackpulse integration [\#9965](https://github.com/netdata/netdata/pull/9965) ([thiagoftsm](https://github.com/thiagoftsm)) -- Fix typo inside netdata-installer.sh [\#9962](https://github.com/netdata/netdata/pull/9962) ([thiagoftsm](https://github.com/thiagoftsm)) -- Add missing period in netdata dashboard [\#9960](https://github.com/netdata/netdata/pull/9960) ([hydrogen-mvm](https://github.com/hydrogen-mvm)) -- Fixed chart's last accessed time during context queries [\#9952](https://github.com/netdata/netdata/pull/9952) ([stelfrag](https://github.com/stelfrag)) ## [before_rebase](https://github.com/netdata/netdata/tree/before_rebase) (2020-09-24) diff --git a/aclk/legacy/aclk_common.c b/aclk/legacy/aclk_common.c index 7c8421a93..d7188b1f0 100644 --- a/aclk/legacy/aclk_common.c +++ b/aclk/legacy/aclk_common.c @@ -234,3 +234,26 @@ int aclk_decode_base_url(char *url, char **aclk_hostname, int *aclk_port) info("Setting ACLK target host=%s port=%d from %s", *aclk_hostname, *aclk_port, url); return 0; } + +struct label *add_aclk_host_labels(struct label *label) { +#ifdef ENABLE_ACLK + ACLK_PROXY_TYPE aclk_proxy; + char *proxy_str; + aclk_get_proxy(&aclk_proxy); + + switch(aclk_proxy) { + case PROXY_TYPE_SOCKS5: + proxy_str = "SOCKS5"; + break; + case PROXY_TYPE_HTTP: + proxy_str = "HTTP"; + break; + default: + proxy_str = "none"; + break; + } + return add_label_to_list(label, "_aclk_proxy", proxy_str, LABEL_SOURCE_AUTO); +#else + return label; +#endif +} diff --git a/aclk/legacy/aclk_common.h b/aclk/legacy/aclk_common.h index 2dc0aa553..eedb5b51c 100644 --- a/aclk/legacy/aclk_common.h +++ b/aclk/legacy/aclk_common.h @@ -67,4 +67,6 @@ void safe_log_proxy_censor(char *proxy); int aclk_decode_base_url(char *url, char **aclk_hostname, int *aclk_port); const char *aclk_get_proxy(ACLK_PROXY_TYPE *type); +struct label *add_aclk_host_labels(struct label *label); + #endif //ACLK_COMMON_H diff --git a/aclk/legacy/aclk_lws_wss_client.c b/aclk/legacy/aclk_lws_wss_client.c index 2e6fd4ec8..f06df3f42 100644 --- a/aclk/legacy/aclk_lws_wss_client.c +++ b/aclk/legacy/aclk_lws_wss_client.c @@ -377,7 +377,9 @@ static const char *aclk_lws_callback_name(enum lws_callback_reasons reason) return "LWS_CALLBACK_EVENT_WAIT_CANCELLED"; default: // Not using an internal buffer here for thread-safety with unknown calling context. +#ifdef ACLK_TRP_DEBUG_VERBOSE error("Unknown LWS callback %u", reason); +#endif return "unknown"; } } @@ -489,7 +491,9 @@ static int aclk_lws_wss_callback(struct lws *wsi, enum lws_callback_reasons reas case LWS_CALLBACK_EVENT_WAIT_CANCELLED: case LWS_CALLBACK_OPENSSL_PERFORM_SERVER_CERT_VERIFICATION: // Expected and safe to ignore. +#ifdef ACLK_TRP_DEBUG_VERBOSE debug(D_ACLK, "Ignoring expected callback from LWS: %s", aclk_lws_callback_name(reason)); +#endif return retval; default: @@ -497,7 +501,9 @@ static int aclk_lws_wss_callback(struct lws *wsi, enum lws_callback_reasons reas break; } // Log to info - volume is proportional to connection attempts. +#ifdef ACLK_TRP_DEBUG_VERBOSE info("Processing callback %s", aclk_lws_callback_name(reason)); +#endif switch (reason) { case LWS_CALLBACK_PROTOCOL_INIT: aclk_lws_wss_connect(engine_instance->host, engine_instance->port); // Makes the outgoing connection @@ -531,7 +537,9 @@ static int aclk_lws_wss_callback(struct lws *wsi, enum lws_callback_reasons reas break; default: +#ifdef ACLK_TRP_DEBUG_VERBOSE error("Unexpected callback from libwebsockets %s", aclk_lws_callback_name(reason)); +#endif break; } return retval; //0-OK, other connection should be closed! diff --git a/claim/README.md b/claim/README.md index ade6a221f..1d0d6eebe 100644 --- a/claim/README.md +++ b/claim/README.md @@ -80,29 +80,30 @@ you don't see the node in your Space after 60 seconds, see the [troubleshooting ### Claim an Agent running in Docker -The claiming process works with Agents running inside of Docker containers. You can use `docker exec` to run the -claiming script on containers already running, or append the claiming script to `docker run` to create a new container -and immediately claim it. +To claim an instance of the Netdata Agent running inside of a Docker container, either set claiming environment +variables in the container to have it automatically claimed on startup or restart, or use `docker exec` to manually +claim an already running container. -#### Running Agent containers +For claiming to work, the contents of `/var/lib/netdata` _must_ be preserved across container +restarts using a persistent volume. See our [recommended `docker run` and Docker Compose +examples](/packaging/docker/README.md#create-a-new-netdata-agent-container) for details. -Claim a _running Agent container_ by appending the script offered by Cloud to a `docker exec ...` command, replacing -`netdata` with the name of your running container: - -```bash -docker exec -it netdata netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud -``` +#### Using environment variables -The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if -you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). +The Netdata Docker container looks for the following environment variables on startup: -#### New/ephemeral Agent containers +- `NETDATA_CLAIM_TOKEN` +- `NETDATA_CLAIM_URL` +- `NETDATA_CLAIM_ROOMS` +- `NETDATA_CLAIM_PROXY` -Claim a newly-created container with `docker run ...`. +If the token and URL are specified in their corresponding variables _and_ the container is not already claimed, +it will use these values to attempt to claim the container, automatically adding the node to the specified War +Rooms. If a proxy is specified, it will be used for the claiming process and for connecting to Netdata Cloud. -In the example below, the last line calls the [daemon binary](/daemon/README.md), sets essential variables, and then -executes claiming using the information after `-W "claim... `. You should copy the relevant token, rooms, and URL from -Cloud. +These variables can be specified using any mechanism supported by your container tooling for setting environment +variables inside containers. For example, when creating a new Netdata continer using `docker run`, the following +modified version of the command can be used to set the variables: ```bash docker run -d --name=netdata \ @@ -114,24 +115,32 @@ docker run -d --name=netdata \ -v /proc:/host/proc:ro \ -v /sys:/host/sys:ro \ -v /etc/os-release:/host/etc/os-release:ro \ + -e NETDATA_CLAIM_TOKEN=TOKEN \ + -e NETDATA_CLAIM_URL="https://app.netdata.cloud" \ + -e NETDATA_CLAIM_ROOMS=ROOM1,ROOM2 \ --restart unless-stopped \ --cap-add SYS_PTRACE \ --security-opt apparmor=unconfined \ - netdata/netdata \ - -W set2 cloud global enabled true -W set2 cloud global "cloud base url" "https://app.netdata.cloud" -W "claim \ - -token=TOKEN \ - -rooms=ROOM1,ROOM2 \ - -url=https://app.netdata.cloud" + netdata/netdata ``` -The container runs in detached mode, so you won't see any output. If the node does not appear in your Space, you can run -the following to find any error output and use that to guide your [troubleshooting](#troubleshooting). Replace `netdata` -with the name of your container if different. +Output that would be seen from the claiming script when using other methods will be present in the container logs. + +Using the environment variables like this to handle claiming is the preferred method of claiming Docker containers +as it works in the widest variety of situations and simplifies configuration management. + +#### Using docker exec + +Claim a _running Netdata Agent container_ by appending the script offered by Cloud to a `docker exec ...` command, replacing +`netdata` with the name of your running container: ```bash -docker logs netdata 2>&1 | grep -E --line-buffered 'ACLK|claim|cloud' +docker exec -it netdata netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud ``` +The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + ### Claim a Kubernetes cluster's parent Netdata pod Read our [Kubernetes installation](/packaging/installer/methods/kubernetes.md#claim-a-kubernetes-clusters-parent-pod) diff --git a/collectors/apps.plugin/apps_groups.conf b/collectors/apps.plugin/apps_groups.conf index 9bf928712..7242ed30d 100644 --- a/collectors/apps.plugin/apps_groups.conf +++ b/collectors/apps.plugin/apps_groups.conf @@ -312,3 +312,5 @@ factorio: factorio p4: p4* git-services: gitea gitlab-runner + +freeswitch: freeswitch* diff --git a/collectors/apps.plugin/apps_plugin.c b/collectors/apps.plugin/apps_plugin.c index 0cfeeacd4..7cbbb075c 100644 --- a/collectors/apps.plugin/apps_plugin.c +++ b/collectors/apps.plugin/apps_plugin.c @@ -3879,9 +3879,8 @@ static void parse_args(int argc, char **argv) } if(strcmp("debug", argv[i]) == 0) { -#ifdef NETDATA_INTERNAL_CHECKS debug_enabled = 1; -#else +#ifndef NETDATA_INTERNAL_CHECKS fprintf(stderr, "apps.plugin has been compiled without debugging\n"); #endif continue; diff --git a/collectors/ebpf.plugin/ebpf.c b/collectors/ebpf.plugin/ebpf.c index 56e084e97..26bcfcf17 100644 --- a/collectors/ebpf.plugin/ebpf.c +++ b/collectors/ebpf.plugin/ebpf.c @@ -56,6 +56,7 @@ char *ebpf_user_config_dir = CONFIG_DIR; char *ebpf_stock_config_dir = LIBCONFIG_DIR; static char *ebpf_configured_log_dir = LOG_DIR; +char *ebpf_algorithms[] = {"absolute", "incremental"}; int update_every = 1; static int thread_finished = 0; int close_ebpf_plugin = 0; @@ -100,6 +101,19 @@ ebpf_network_viewer_options_t network_viewer_opt; *****************************************************************/ /** + * Cleanup publish syscall + * + * @param nps list of structures to clean + */ +void ebpf_cleanup_publish_syscall(netdata_publish_syscall_t *nps) +{ + while (nps) { + freez(nps->algorithm); + nps = nps->next; + } +} + +/** * Clean port Structure * * Clean the allocated list. @@ -307,17 +321,21 @@ void write_err_chart(char *name, char *family, netdata_publish_syscall_t *move, /** * Call the necessary functions to create a chart. * + * @param chart the chart name * @param family the chart family - * @param move the pointer with the values that will be published + * @param dwrite the dimension name + * @param vwrite the value for previous dimension + * @param dread the dimension name + * @param vread the value for previous dimension * * @return It returns a variable tha maps the charts that did not have zero values. */ -void write_io_chart(char *chart, char *family, char *dwrite, char *dread, netdata_publish_vfs_common_t *pvc) +void write_io_chart(char *chart, char *family, char *dwrite, long long vwrite, char *dread, long long vread) { write_begin_chart(family, chart); - write_chart_dimension(dwrite, (long long)pvc->write); - write_chart_dimension(dread, (long long)pvc->read); + write_chart_dimension(dwrite, vwrite); + write_chart_dimension(dread, vread); write_end_chart(); } @@ -349,12 +367,13 @@ void ebpf_write_chart_cmd(char *type, char *id, char *title, char *units, char * /** * Write the dimension command on standard output * - * @param n the dimension name - * @param d the dimension information + * @param name the dimension name + * @param id the dimension id + * @param algo the dimension algorithm */ -void ebpf_write_global_dimension(char *n, char *d) +void ebpf_write_global_dimension(char *name, char *id, char *algorithm) { - printf("DIMENSION %s %s absolute 1 1\n", n, d); + printf("DIMENSION %s %s %s 1 1\n", name, id, algorithm); } /** @@ -369,7 +388,7 @@ void ebpf_create_global_dimension(void *ptr, int end) int i = 0; while (move && i < end) { - ebpf_write_global_dimension(move->name, move->dimension); + ebpf_write_global_dimension(move->name, move->dimension, move->algorithm); move = move->next; i++; @@ -411,16 +430,18 @@ void ebpf_create_chart(char *type, * @param units the value displayed on vertical axis. * @param family Submenu that the chart will be attached on dashboard. * @param order the chart order + * @param algorithm the algorithm used by dimension * @param root structure used to create the dimensions. */ -void ebpf_create_charts_on_apps(char *id, char *title, char *units, char *family, int order, struct target *root) +void ebpf_create_charts_on_apps(char *id, char *title, char *units, char *family, int order, + char *algorithm, struct target *root) { struct target *w; ebpf_write_chart_cmd(NETDATA_APPS_FAMILY, id, title, units, family, "stacked", order); for (w = root; w; w = w->next) { if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); + fprintf(stdout, "DIMENSION %s '' %s 1 1\n", w->name, algorithm); } } @@ -437,9 +458,11 @@ void ebpf_create_charts_on_apps(char *id, char *title, char *units, char *family * @param pio structure used to generate charts. * @param dim a pointer for the dimensions name * @param name a pointer for the tensor with the name of the functions. + * @param algorithm a vector with the algorithms used to make the charts * @param end the number of elements in the previous 4 arguments. */ -void ebpf_global_labels(netdata_syscall_stat_t *is, netdata_publish_syscall_t *pio, char **dim, char **name, int end) +void ebpf_global_labels(netdata_syscall_stat_t *is, netdata_publish_syscall_t *pio, char **dim, + char **name, int *algorithm, int end) { int i; @@ -453,6 +476,7 @@ void ebpf_global_labels(netdata_syscall_stat_t *is, netdata_publish_syscall_t *p pio[i].dimension = dim[i]; pio[i].name = name[i]; + pio[i].algorithm = strdupz(ebpf_algorithms[algorithm[i]]); if (publish_prev) { publish_prev->next = &pio[i]; } diff --git a/collectors/ebpf.plugin/ebpf.h b/collectors/ebpf.plugin/ebpf.h index 1f5822951..35013c2b2 100644 --- a/collectors/ebpf.plugin/ebpf.h +++ b/collectors/ebpf.plugin/ebpf.h @@ -43,6 +43,7 @@ typedef uint64_t netdata_idx_t; typedef struct netdata_publish_syscall { char *dimension; char *name; + char *algorithm; unsigned long nbyte; unsigned long pbyte; uint64_t ncall; @@ -98,6 +99,11 @@ extern ebpf_module_t ebpf_modules[]; #define EBPF_SYS_CLONE_IDX 11 #define EBPF_MAX_MAPS 32 +enum ebpf_algorithms_list { + NETDATA_EBPF_ABSOLUTE_IDX, + NETDATA_EBPF_INCREMENTAL_IDX +}; + // Threads extern void *ebpf_process_thread(void *ptr); extern void *ebpf_socket_thread(void *ptr); @@ -118,6 +124,7 @@ extern void ebpf_global_labels(netdata_syscall_stat_t *is, netdata_publish_syscall_t *pio, char **dim, char **name, + int *algorithm, int end); extern void ebpf_write_chart_cmd(char *type, @@ -128,7 +135,7 @@ extern void ebpf_write_chart_cmd(char *type, char *charttype, int order); -extern void ebpf_write_global_dimension(char *n, char *d); +extern void ebpf_write_global_dimension(char *name, char *id, char *algorithm); extern void ebpf_create_global_dimension(void *ptr, int end); @@ -150,7 +157,8 @@ extern void write_count_chart(char *name, char *family, netdata_publish_syscall_ extern void write_err_chart(char *name, char *family, netdata_publish_syscall_t *move, int end); -extern void write_io_chart(char *chart, char *family, char *dwrite, char *dread, netdata_publish_vfs_common_t *pvc); +extern void write_io_chart(char *chart, char *family, char *dwrite, long long vwrite, + char *dread, long long vread); extern void fill_ebpf_data(ebpf_data_t *ef); @@ -159,17 +167,21 @@ extern void ebpf_create_charts_on_apps(char *name, char *units, char *family, int order, + char *algorithm, struct target *root); extern void write_end_chart(); +extern void ebpf_cleanup_publish_syscall(netdata_publish_syscall_t *nps); + #define EBPF_GLOBAL_SECTION "global" #define EBPF_PROGRAMS_SECTION "ebpf programs" #define EBPF_NETWORK_VIEWER_SECTION "network connections" #define EBPF_SERVICE_NAME_SECTION "service name" #define EBPF_COMMON_DIMENSION_CALL "calls/s" -#define EBPF_COMMON_DIMENSION_BYTESS "bytes/s" +#define EBPF_COMMON_DIMENSION_BITS "kilobits/s" +#define EBPF_COMMON_DIMENSION_BYTES "bytes/s" #define EBPF_COMMON_DIMENSION_DIFFERENCE "difference" #define EBPF_COMMON_DIMENSION_PACKETS "packets" @@ -178,6 +190,7 @@ extern char *ebpf_user_config_dir; extern char *ebpf_stock_config_dir; extern int debug_enabled; extern struct pid_stat *root_of_pids; +extern char *ebpf_algorithms[]; // Socket functions and variables // Common functions diff --git a/collectors/ebpf.plugin/ebpf_apps.c b/collectors/ebpf.plugin/ebpf_apps.c index 062c9a4e4..844ce23b8 100644 --- a/collectors/ebpf.plugin/ebpf_apps.c +++ b/collectors/ebpf.plugin/ebpf_apps.c @@ -931,13 +931,11 @@ void cleanup_exited_pids() freez(current_apps_data[r]); current_apps_data[r] = NULL; - prev_apps_data[r] = NULL; // Clean socket structures if (socket_bandwidth_curr) { freez(socket_bandwidth_curr[r]); socket_bandwidth_curr[r] = NULL; - socket_bandwidth_prev[r] = NULL; } } else { if (unlikely(p->keep)) @@ -1055,13 +1053,11 @@ void collect_data_for_all_processes(int tbl_pid_stats_fd) freez(current_apps_data[key]); current_apps_data[key] = NULL; - prev_apps_data[key] = NULL; // Clean socket structures if (socket_bandwidth_curr) { freez(socket_bandwidth_curr[key]); socket_bandwidth_curr[key] = NULL; - socket_bandwidth_prev[key] = NULL; } pids = pids->next; diff --git a/collectors/ebpf.plugin/ebpf_apps.h b/collectors/ebpf.plugin/ebpf_apps.h index 46d36966e..f8cb7ac72 100644 --- a/collectors/ebpf.plugin/ebpf_apps.h +++ b/collectors/ebpf.plugin/ebpf_apps.h @@ -426,6 +426,5 @@ extern void collect_data_for_all_processes(int tbl_pid_stats_fd); extern ebpf_process_stat_t **global_process_stats; extern ebpf_process_publish_apps_t **current_apps_data; -extern ebpf_process_publish_apps_t **prev_apps_data; #endif /* NETDATA_EBPF_APPS_H */ diff --git a/collectors/ebpf.plugin/ebpf_process.c b/collectors/ebpf.plugin/ebpf_process.c index 9a1d69c06..27e39d1a5 100644 --- a/collectors/ebpf.plugin/ebpf_process.c +++ b/collectors/ebpf.plugin/ebpf_process.c @@ -11,9 +11,9 @@ * *****************************************************************/ -static char *process_dimension_names[NETDATA_MAX_MONITOR_VECTOR] = { "open", "close", "delete", "read", "write", +static char *process_dimension_names[NETDATA_KEY_PUBLISH_PROCESS_END] = { "open", "close", "delete", "read", "write", "process", "task", "process", "thread" }; -static char *process_id_names[NETDATA_MAX_MONITOR_VECTOR] = { "do_sys_open", "__close_fd", "vfs_unlink", +static char *process_id_names[NETDATA_KEY_PUBLISH_PROCESS_END] = { "do_sys_open", "__close_fd", "vfs_unlink", "vfs_read", "vfs_write", "do_exit", "release_task", "_do_fork", "sys_clone" }; static char *status[] = { "process", "zombie" }; @@ -26,7 +26,6 @@ static ebpf_data_t process_data; ebpf_process_stat_t **global_process_stats = NULL; ebpf_process_publish_apps_t **current_apps_data = NULL; -ebpf_process_publish_apps_t **prev_apps_data = NULL; int process_enabled = 0; @@ -51,67 +50,36 @@ static void ebpf_update_global_publish( netdata_publish_syscall_t *publish, netdata_publish_vfs_common_t *pvc, netdata_syscall_stat_t *input) { netdata_publish_syscall_t *move = publish; + int selector = NETDATA_KEY_PUBLISH_PROCESS_OPEN; while (move) { - if (input->call != move->pcall) { - //This condition happens to avoid initial values with dimensions higher than normal values. - if (move->pcall) { - move->ncall = (input->call > move->pcall) ? input->call - move->pcall : move->pcall - input->call; - move->nbyte = (input->bytes > move->pbyte) ? input->bytes - move->pbyte : move->pbyte - input->bytes; - move->nerr = (input->ecall > move->nerr) ? input->ecall - move->perr : move->perr - input->ecall; - } else { - move->ncall = 0; - move->nbyte = 0; - move->nerr = 0; - } + // Until NETDATA_KEY_PUBLISH_PROCESS_READ we are creating accumulators, so it is possible + // to use incremental charts, but after this we will do some math with the values, so we are storing + // absolute values + if (selector < NETDATA_KEY_PUBLISH_PROCESS_READ) { + move->ncall = input->call; + move->nbyte = input->bytes; + move->nerr = input->ecall; + } else { + move->ncall = (input->call > move->pcall) ? input->call - move->pcall : move->pcall - input->call; + move->nbyte = (input->bytes > move->pbyte) ? input->bytes - move->pbyte : move->pbyte - input->bytes; + move->nerr = (input->ecall > move->nerr) ? input->ecall - move->perr : move->perr - input->ecall; move->pcall = input->call; move->pbyte = input->bytes; move->perr = input->ecall; - } else { - move->ncall = 0; - move->nbyte = 0; - move->nerr = 0; } input = input->next; move = move->next; + selector++; } - pvc->write = -((long)publish[2].nbyte); - pvc->read = (long)publish[3].nbyte; - - pvc->running = (long)publish[7].ncall - (long)publish[8].ncall; - publish[6].ncall = -publish[6].ncall; // release - pvc->zombie = (long)publish[5].ncall + (long)publish[6].ncall; -} - -/** - * Update apps dimension to publish. - * - * @param curr Last values read from memory. - * @param prev Previous values read from memory. - * @param first was it allocated now? - */ -static void -ebpf_process_update_apps_publish(ebpf_process_publish_apps_t *curr, ebpf_process_publish_apps_t *prev, int first) -{ - if (first) - return; + pvc->write = -((long)publish[NETDATA_KEY_PUBLISH_PROCESS_WRITE].nbyte); + pvc->read = (long)publish[NETDATA_KEY_PUBLISH_PROCESS_READ].nbyte; - curr->publish_open = curr->call_sys_open - prev->call_sys_open; - curr->publish_closed = curr->call_close_fd - prev->call_close_fd; - curr->publish_deleted = curr->call_vfs_unlink - prev->call_vfs_unlink; - curr->publish_write_call = curr->call_write - prev->call_write; - curr->publish_write_bytes = curr->bytes_written - prev->bytes_written; - curr->publish_read_call = curr->call_read - prev->call_read; - curr->publish_read_bytes = curr->bytes_read - prev->bytes_read; - curr->publish_process = curr->call_do_fork - prev->call_do_fork; - curr->publish_thread = curr->call_sys_clone - prev->call_sys_clone; - curr->publish_task = curr->call_release_task - prev->call_release_task; - curr->publish_open_error = curr->ecall_sys_open - prev->ecall_sys_open; - curr->publish_close_error = curr->ecall_close_fd - prev->ecall_close_fd; - curr->publish_write_error = curr->ecall_write - prev->ecall_write; - curr->publish_read_error = curr->ecall_read - prev->ecall_read; + pvc->running = (long)publish[NETDATA_KEY_PUBLISH_PROCESS_FORK].ncall - (long)publish[NETDATA_KEY_PUBLISH_PROCESS_CLONE].ncall; + publish[NETDATA_KEY_PUBLISH_PROCESS_RELEASE_TASK].ncall = -publish[NETDATA_KEY_PUBLISH_PROCESS_RELEASE_TASK].ncall; + pvc->zombie = (long)publish[NETDATA_KEY_PUBLISH_PROCESS_EXIT].ncall + (long)publish[NETDATA_KEY_PUBLISH_PROCESS_RELEASE_TASK].ncall; } /** @@ -164,7 +132,9 @@ static void ebpf_process_send_data(ebpf_module_t *em) NETDATA_PROCESS_ERROR_NAME, NETDATA_EBPF_FAMILY, &process_publish_aggregated[NETDATA_PROCESS_START], 2); } - write_io_chart(NETDATA_VFS_IO_FILE_BYTES, NETDATA_EBPF_FAMILY, process_id_names[3], process_id_names[4], &pvc); + write_io_chart(NETDATA_VFS_IO_FILE_BYTES, NETDATA_EBPF_FAMILY, + process_id_names[NETDATA_KEY_PUBLISH_PROCESS_WRITE], (long long) pvc.write, + process_id_names[NETDATA_KEY_PUBLISH_PROCESS_READ], (long long)pvc.read); } /** @@ -230,7 +200,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) write_begin_chart(NETDATA_APPS_FAMILY, NETDATA_SYSCALL_APPS_FILE_OPEN); for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { - value = ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_open)); + value = ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_sys_open)); write_chart_dimension(w->name, value); } } @@ -241,7 +211,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_open_error)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, ecall_sys_open)); write_chart_dimension(w->name, value); } } @@ -252,7 +222,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = - ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_closed)); + ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_close_fd)); write_chart_dimension(w->name, value); } } @@ -263,7 +233,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_close_error)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, ecall_close_fd)); write_chart_dimension(w->name, value); } } @@ -274,7 +244,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = - ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_deleted)); + ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_vfs_unlink)); write_chart_dimension(w->name, value); } } @@ -284,7 +254,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_write_call)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, call_write)); write_chart_dimension(w->name, value); } } @@ -295,7 +265,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_write_error)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, ecall_write)); write_chart_dimension(w->name, value); } } @@ -306,7 +276,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = - ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_read_call)); + ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_read)); write_chart_dimension(w->name, value); } } @@ -317,7 +287,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_read_error)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, ecall_read)); write_chart_dimension(w->name, value); } } @@ -328,7 +298,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_write_bytes)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, bytes_written)); write_chart_dimension(w->name, value); } } @@ -338,7 +308,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_process_sum_values_for_pids( - w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_read_bytes)); + w->root_pid, offsetof(ebpf_process_publish_apps_t, bytes_read)); write_chart_dimension(w->name, value); } } @@ -348,7 +318,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = - ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_process)); + ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_do_fork)); write_chart_dimension(w->name, value); } } @@ -358,7 +328,7 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = - ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_thread)); + ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, call_sys_clone)); write_chart_dimension(w->name, value); } } @@ -367,7 +337,8 @@ void ebpf_process_send_apps_data(ebpf_module_t *em, struct target *root) write_begin_chart(NETDATA_APPS_FAMILY, NETDATA_SYSCALL_APPS_TASK_CLOSE); for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { - value = ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, publish_task)); + value = ebpf_process_sum_values_for_pids(w->root_pid, offsetof(ebpf_process_publish_apps_t, + call_release_task)); write_chart_dimension(w->name, value); } } @@ -405,27 +376,27 @@ static void read_hash_global_tables() } } - process_aggregated_data[0].call = res[NETDATA_KEY_CALLS_DO_SYS_OPEN]; - process_aggregated_data[1].call = res[NETDATA_KEY_CALLS_CLOSE_FD]; - process_aggregated_data[2].call = res[NETDATA_KEY_CALLS_VFS_UNLINK]; - process_aggregated_data[3].call = res[NETDATA_KEY_CALLS_VFS_READ] + res[NETDATA_KEY_CALLS_VFS_READV]; - process_aggregated_data[4].call = res[NETDATA_KEY_CALLS_VFS_WRITE] + res[NETDATA_KEY_CALLS_VFS_WRITEV]; - process_aggregated_data[5].call = res[NETDATA_KEY_CALLS_DO_EXIT]; - process_aggregated_data[6].call = res[NETDATA_KEY_CALLS_RELEASE_TASK]; - process_aggregated_data[7].call = res[NETDATA_KEY_CALLS_DO_FORK]; - process_aggregated_data[8].call = res[NETDATA_KEY_CALLS_SYS_CLONE]; - - process_aggregated_data[0].ecall = res[NETDATA_KEY_ERROR_DO_SYS_OPEN]; - process_aggregated_data[1].ecall = res[NETDATA_KEY_ERROR_CLOSE_FD]; - process_aggregated_data[2].ecall = res[NETDATA_KEY_ERROR_VFS_UNLINK]; - process_aggregated_data[3].ecall = res[NETDATA_KEY_ERROR_VFS_READ] + res[NETDATA_KEY_ERROR_VFS_READV]; - process_aggregated_data[4].ecall = res[NETDATA_KEY_ERROR_VFS_WRITE] + res[NETDATA_KEY_ERROR_VFS_WRITEV]; - process_aggregated_data[7].ecall = res[NETDATA_KEY_ERROR_DO_FORK]; - process_aggregated_data[8].ecall = res[NETDATA_KEY_ERROR_SYS_CLONE]; - - process_aggregated_data[2].bytes = (uint64_t)res[NETDATA_KEY_BYTES_VFS_WRITE] + + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_OPEN].call = res[NETDATA_KEY_CALLS_DO_SYS_OPEN]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_CLOSE].call = res[NETDATA_KEY_CALLS_CLOSE_FD]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_UNLINK].call = res[NETDATA_KEY_CALLS_VFS_UNLINK]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_READ].call = res[NETDATA_KEY_CALLS_VFS_READ] + res[NETDATA_KEY_CALLS_VFS_READV]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_WRITE].call = res[NETDATA_KEY_CALLS_VFS_WRITE] + res[NETDATA_KEY_CALLS_VFS_WRITEV]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_EXIT].call = res[NETDATA_KEY_CALLS_DO_EXIT]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_RELEASE_TASK].call = res[NETDATA_KEY_CALLS_RELEASE_TASK]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_FORK].call = res[NETDATA_KEY_CALLS_DO_FORK]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_CLONE].call = res[NETDATA_KEY_CALLS_SYS_CLONE]; + + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_OPEN].ecall = res[NETDATA_KEY_ERROR_DO_SYS_OPEN]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_CLOSE].ecall = res[NETDATA_KEY_ERROR_CLOSE_FD]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_UNLINK].ecall = res[NETDATA_KEY_ERROR_VFS_UNLINK]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_READ].ecall = res[NETDATA_KEY_ERROR_VFS_READ] + res[NETDATA_KEY_ERROR_VFS_READV]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_WRITE].ecall = res[NETDATA_KEY_ERROR_VFS_WRITE] + res[NETDATA_KEY_ERROR_VFS_WRITEV]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_FORK].ecall = res[NETDATA_KEY_ERROR_DO_FORK]; + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_CLONE].ecall = res[NETDATA_KEY_ERROR_SYS_CLONE]; + + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_WRITE].bytes = (uint64_t)res[NETDATA_KEY_BYTES_VFS_WRITE] + (uint64_t)res[NETDATA_KEY_BYTES_VFS_WRITEV]; - process_aggregated_data[3].bytes = (uint64_t)res[NETDATA_KEY_BYTES_VFS_READ] + + process_aggregated_data[NETDATA_KEY_PUBLISH_PROCESS_READ].bytes = (uint64_t)res[NETDATA_KEY_BYTES_VFS_READ] + (uint64_t)res[NETDATA_KEY_BYTES_VFS_READV]; } @@ -444,18 +415,9 @@ static void ebpf_process_update_apps_data() } ebpf_process_publish_apps_t *cad = current_apps_data[current_pid]; - ebpf_process_publish_apps_t *pad = prev_apps_data[current_pid]; - int lstatus; if (!cad) { - ebpf_process_publish_apps_t *ptr = callocz(2, sizeof(ebpf_process_publish_apps_t)); - cad = &ptr[0]; + cad = callocz(1, sizeof(ebpf_process_publish_apps_t)); current_apps_data[current_pid] = cad; - pad = &ptr[1]; - prev_apps_data[current_pid] = pad; - lstatus = 1; - } else { - memcpy(pad, cad, sizeof(ebpf_process_publish_apps_t)); - lstatus = 0; } //Read data @@ -480,8 +442,6 @@ static void ebpf_process_update_apps_data() cad->bytes_written = (uint64_t)ps->write_bytes + (uint64_t)ps->write_bytes; cad->bytes_read = (uint64_t)ps->read_bytes + (uint64_t)ps->readv_bytes; - ebpf_process_update_apps_publish(cad, pad, lstatus); - pids = pids->next; } } @@ -500,8 +460,9 @@ static void ebpf_process_update_apps_data() * @param axis the axis label * @param web the group name used to attach the chart on dashaboard * @param order the order number of the specified chart + * @param algorithm the algorithm used to make the charts. */ -static void ebpf_create_io_chart(char *family, char *name, char *axis, char *web, int order) +static void ebpf_create_io_chart(char *family, char *name, char *axis, char *web, int order, int algorithm) { printf("CHART %s.%s '' 'Bytes written and read' '%s' '%s' '' line %d %d\n", family, @@ -511,8 +472,14 @@ static void ebpf_create_io_chart(char *family, char *name, char *axis, char *web order, update_every); - printf("DIMENSION %s %s absolute 1 1\n", process_id_names[3], NETDATA_VFS_DIM_OUT_FILE_BYTES); - printf("DIMENSION %s %s absolute 1 1\n", process_id_names[4], NETDATA_VFS_DIM_IN_FILE_BYTES); + printf("DIMENSION %s %s %s 1 1\n", + process_id_names[NETDATA_KEY_PUBLISH_PROCESS_READ], + process_dimension_names[NETDATA_KEY_PUBLISH_PROCESS_READ], + ebpf_algorithms[algorithm]); + printf("DIMENSION %s %s %s 1 1\n", + process_id_names[NETDATA_KEY_PUBLISH_PROCESS_WRITE], + process_dimension_names[NETDATA_KEY_PUBLISH_PROCESS_WRITE], + ebpf_algorithms[algorithm]); } /** @@ -524,7 +491,8 @@ static void ebpf_create_io_chart(char *family, char *name, char *axis, char *web * @param web the group name used to attach the chart on dashaboard * @param order the order number of the specified chart */ -static void ebpf_process_status_chart(char *family, char *name, char *axis, char *web, int order) +static void ebpf_process_status_chart(char *family, char *name, char *axis, + char *web, char *algorithm, int order) { printf("CHART %s.%s '' 'Process not closed' '%s' '%s' '' line %d %d ''\n", family, @@ -534,8 +502,8 @@ static void ebpf_process_status_chart(char *family, char *name, char *axis, char order, update_every); - printf("DIMENSION %s '' absolute 1 1\n", status[0]); - printf("DIMENSION %s '' absolute 1 1\n", status[1]); + printf("DIMENSION %s '' %s 1 1\n", status[0], algorithm); + printf("DIMENSION %s '' %s 1 1\n", status[1], algorithm); } /** @@ -590,10 +558,10 @@ static void ebpf_create_global_charts(ebpf_module_t *em) 2); ebpf_create_io_chart(NETDATA_EBPF_FAMILY, - NETDATA_VFS_IO_FILE_BYTES, - EBPF_COMMON_DIMENSION_BYTESS, + NETDATA_VFS_IO_FILE_BYTES, EBPF_COMMON_DIMENSION_BYTES, NETDATA_VFS_GROUP, - 21004); + 21004, + NETDATA_EBPF_ABSOLUTE_IDX); if (em->mode < MODE_ENTRY) { ebpf_create_chart(NETDATA_EBPF_FAMILY, @@ -631,6 +599,7 @@ static void ebpf_create_global_charts(ebpf_module_t *em) NETDATA_PROCESS_STATUS_NAME, EBPF_COMMON_DIMENSION_DIFFERENCE, NETDATA_PROCESS_GROUP, + ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX], 21008); if (em->mode < MODE_ENTRY) { @@ -661,6 +630,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_FILE_GROUP, 20061, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); if (em->mode < MODE_ENTRY) { @@ -669,6 +639,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_FILE_GROUP, 20062, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); } @@ -677,6 +648,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_FILE_GROUP, 20063, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); if (em->mode < MODE_ENTRY) { @@ -685,6 +657,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_FILE_GROUP, 20064, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); } @@ -693,6 +666,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_VFS_GROUP, 20065, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_VFS_WRITE_CALLS, @@ -700,6 +674,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_VFS_GROUP, 20066, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], apps_groups_root_target); if (em->mode < MODE_ENTRY) { @@ -708,6 +683,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_VFS_GROUP, 20067, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); } @@ -716,6 +692,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_VFS_GROUP, 20068, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); if (em->mode < MODE_ENTRY) { @@ -724,21 +701,22 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_VFS_GROUP, 20069, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); } ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_VFS_WRITE_BYTES, - "Bytes written on disk", - EBPF_COMMON_DIMENSION_BYTESS, + "Bytes written on disk", EBPF_COMMON_DIMENSION_BYTES, NETDATA_APPS_VFS_GROUP, 20070, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_VFS_READ_BYTES, - "Bytes read from disk", - EBPF_COMMON_DIMENSION_BYTESS, + "Bytes read from disk", EBPF_COMMON_DIMENSION_BYTES, NETDATA_APPS_VFS_GROUP, 20071, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_TASK_PROCESS, @@ -746,6 +724,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_PROCESS_GROUP, 20072, + ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX], root); ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_TASK_THREAD, @@ -753,6 +732,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_PROCESS_GROUP, 20073, + ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX], root); ebpf_create_charts_on_apps(NETDATA_SYSCALL_APPS_TASK_CLOSE, @@ -760,6 +740,7 @@ static void ebpf_process_create_apps_charts(ebpf_module_t *em, struct target *ro EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_PROCESS_GROUP, 20074, + ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX], root); } @@ -917,20 +898,20 @@ static void ebpf_process_cleanup(void *ptr) heartbeat_t hb; heartbeat_init(&hb); - uint32_t tick = 200*USEC_PER_MS; + uint32_t tick = 50*USEC_PER_MS; while (!finalized_threads) { usec_t dt = heartbeat_next(&hb, tick); UNUSED(dt); } freez(process_aggregated_data); + ebpf_cleanup_publish_syscall(process_publish_aggregated); freez(process_publish_aggregated); freez(process_hash_values); clean_global_memory(); freez(global_process_stats); freez(current_apps_data); - freez(prev_apps_data); clean_apps_structures(apps_groups_root_target); freez(process_data.map_fd); @@ -965,7 +946,6 @@ static void ebpf_process_allocate_global_vectors(size_t length) global_process_stats = callocz((size_t)pid_max, sizeof(ebpf_process_stat_t *)); current_apps_data = callocz((size_t)pid_max, sizeof(ebpf_process_publish_apps_t *)); - prev_apps_data = callocz((size_t)pid_max, sizeof(ebpf_process_publish_apps_t *)); } static void change_syscalls() @@ -1052,9 +1032,15 @@ void *ebpf_process_thread(void *ptr) goto endprocess; } + int algorithms[NETDATA_KEY_PUBLISH_PROCESS_END] = { + NETDATA_EBPF_INCREMENTAL_IDX, NETDATA_EBPF_INCREMENTAL_IDX,NETDATA_EBPF_INCREMENTAL_IDX, //open, close, unlink + NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, + NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX + }; + ebpf_global_labels( process_aggregated_data, process_publish_aggregated, process_dimension_names, process_id_names, - NETDATA_MAX_MONITOR_VECTOR); + algorithms, NETDATA_MAX_MONITOR_VECTOR); if (process_enabled) { ebpf_create_global_charts(em); diff --git a/collectors/ebpf.plugin/ebpf_process.h b/collectors/ebpf.plugin/ebpf_process.h index 9553434b0..aa6ed66d6 100644 --- a/collectors/ebpf.plugin/ebpf_process.h +++ b/collectors/ebpf.plugin/ebpf_process.h @@ -32,8 +32,6 @@ #define NETDATA_PROCESS_STATUS_NAME "process_status" #define NETDATA_VFS_IO_FILE_BYTES "io_bytes" -#define NETDATA_VFS_DIM_IN_FILE_BYTES "write" -#define NETDATA_VFS_DIM_OUT_FILE_BYTES "read" // Charts created on Apps submenu #define NETDATA_SYSCALL_APPS_FILE_OPEN "file_open" @@ -93,6 +91,25 @@ typedef enum ebpf_process_index { } ebpf_process_index_t; +// This enum acts as an index for publish vector. +// Do not change the enum order because we use +// different algorithms to make charts with incremental +// values (the three initial positions) and absolute values +// (the remaining charts). +typedef enum netdata_publish_process { + NETDATA_KEY_PUBLISH_PROCESS_OPEN, + NETDATA_KEY_PUBLISH_PROCESS_CLOSE, + NETDATA_KEY_PUBLISH_PROCESS_UNLINK, + NETDATA_KEY_PUBLISH_PROCESS_READ, + NETDATA_KEY_PUBLISH_PROCESS_WRITE, + NETDATA_KEY_PUBLISH_PROCESS_EXIT, + NETDATA_KEY_PUBLISH_PROCESS_RELEASE_TASK, + NETDATA_KEY_PUBLISH_PROCESS_FORK, + NETDATA_KEY_PUBLISH_PROCESS_CLONE, + + NETDATA_KEY_PUBLISH_PROCESS_END +} netdata_publish_process_t; + typedef struct ebpf_process_publish_apps { // Number of calls during the last read uint64_t call_sys_open; @@ -117,22 +134,6 @@ typedef struct ebpf_process_publish_apps { // Number of bytes during the last read uint64_t bytes_written; uint64_t bytes_read; - - // Dimensions sent to chart - uint64_t publish_open; - uint64_t publish_closed; - uint64_t publish_deleted; - uint64_t publish_write_call; - uint64_t publish_write_bytes; - uint64_t publish_read_call; - uint64_t publish_read_bytes; - uint64_t publish_process; - uint64_t publish_thread; - uint64_t publish_task; - uint64_t publish_open_error; - uint64_t publish_close_error; - uint64_t publish_write_error; - uint64_t publish_read_error; } ebpf_process_publish_apps_t; #endif /* NETDATA_EBPF_PROCESS_H */ diff --git a/collectors/ebpf.plugin/ebpf_socket.c b/collectors/ebpf.plugin/ebpf_socket.c index 2f73cf4dd..7fbc24421 100644 --- a/collectors/ebpf.plugin/ebpf_socket.c +++ b/collectors/ebpf.plugin/ebpf_socket.c @@ -23,7 +23,6 @@ static netdata_publish_syscall_t *socket_publish_aggregated = NULL; static ebpf_data_t socket_data; ebpf_socket_publish_apps_t **socket_bandwidth_curr = NULL; -ebpf_socket_publish_apps_t **socket_bandwidth_prev = NULL; static ebpf_bandwidth_t *bandwidth_vector = NULL; static int socket_apps_created = 0; @@ -86,10 +85,10 @@ static void ebpf_update_global_publish( move = move->next; } - tcp->write = -((long)publish[0].nbyte); + tcp->write = -(long)publish[0].nbyte; tcp->read = (long)publish[1].nbyte; - udp->write = -((long)publish[3].nbyte); + udp->write = -(long)publish[3].nbyte; udp->read = (long)publish[4].nbyte; } @@ -257,24 +256,6 @@ static void ebpf_socket_send_nv_data(netdata_vector_plot_t *ptr) } } - -/** - * Update the publish strctures to create the dimenssions - * - * @param curr Last values read from memory. - * @param prev Previous values read from memory. - */ -static void ebpf_socket_update_apps_publish(ebpf_socket_publish_apps_t *curr, ebpf_socket_publish_apps_t *prev) -{ - curr->publish_received_bytes = curr->bytes_received - prev->bytes_received; - curr->publish_sent_bytes = curr->bytes_sent - prev->bytes_sent; - curr->publish_tcp_sent = curr->call_tcp_sent - prev->call_tcp_sent; - curr->publish_tcp_received = curr->call_tcp_received - prev->call_tcp_received; - curr->publish_retransmit = curr->retransmit - prev->retransmit; - curr->publish_udp_sent = curr->call_udp_sent - prev->call_udp_sent; - curr->publish_udp_received = curr->call_udp_received - prev->call_udp_received; -} - /** * Send data to Netdata calling auxiliar functions. * @@ -286,10 +267,13 @@ static void ebpf_socket_send_data(ebpf_module_t *em) netdata_publish_vfs_common_t common_udp; ebpf_update_global_publish(socket_publish_aggregated, &common_tcp, &common_udp, socket_aggregated_data); + // We read bytes from function arguments, but bandiwdth is given in bits, + // so we need to multiply by 8 to convert for the final value. write_count_chart( NETDATA_TCP_FUNCTION_COUNT, NETDATA_EBPF_FAMILY, socket_publish_aggregated, 3); write_io_chart( - NETDATA_TCP_FUNCTION_BYTES, NETDATA_EBPF_FAMILY, socket_id_names[0], socket_id_names[1], &common_tcp); + NETDATA_TCP_FUNCTION_BITS, NETDATA_EBPF_FAMILY, socket_id_names[0], common_tcp.write*8/1000, + socket_id_names[1], common_tcp.read*8/1000); if (em->mode < MODE_ENTRY) { write_err_chart( NETDATA_TCP_FUNCTION_ERROR, NETDATA_EBPF_FAMILY, socket_publish_aggregated, 2); @@ -300,7 +284,9 @@ static void ebpf_socket_send_data(ebpf_module_t *em) write_count_chart( NETDATA_UDP_FUNCTION_COUNT, NETDATA_EBPF_FAMILY, &socket_publish_aggregated[NETDATA_UDP_START], 2); write_io_chart( - NETDATA_UDP_FUNCTION_BYTES, NETDATA_EBPF_FAMILY, socket_id_names[3], socket_id_names[4], &common_udp); + NETDATA_UDP_FUNCTION_BITS, NETDATA_EBPF_FAMILY, + socket_id_names[3],(long long)common_udp.write*8/100, + socket_id_names[4], (long long)common_udp.read*8/1000); if (em->mode < MODE_ENTRY) { write_err_chart( NETDATA_UDP_FUNCTION_ERROR, NETDATA_EBPF_FAMILY, &socket_publish_aggregated[NETDATA_UDP_START], 2); @@ -351,8 +337,9 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_sent_bytes)); - write_chart_dimension(w->name, value); + bytes_sent)); + // We multiply by 0.008, because we read bytes, but we display bits + write_chart_dimension(w->name, ((value)*8)/1000); } } write_end_chart(); @@ -361,8 +348,9 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_received_bytes)); - write_chart_dimension(w->name, value); + bytes_received)); + // We multiply by 0.008, because we read bytes, but we display bits + write_chart_dimension(w->name, ((value)*8)/1000); } } write_end_chart(); @@ -371,7 +359,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_tcp_sent)); + call_tcp_sent)); write_chart_dimension(w->name, value); } } @@ -381,7 +369,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_tcp_received)); + call_tcp_received)); write_chart_dimension(w->name, value); } } @@ -391,7 +379,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_retransmit)); + retransmit)); write_chart_dimension(w->name, value); } } @@ -401,7 +389,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_udp_sent)); + call_udp_sent)); write_chart_dimension(w->name, value); } } @@ -411,7 +399,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct target *root) for (w = root; w; w = w->next) { if (unlikely(w->exposed && w->processes)) { value = ebpf_socket_sum_values_for_pids(w->root_pid, offsetof(ebpf_socket_publish_apps_t, - publish_udp_received)); + call_udp_received)); write_chart_dimension(w->name, value); } } @@ -444,10 +432,8 @@ static void ebpf_create_global_charts(ebpf_module_t *em) socket_publish_aggregated, 3); - ebpf_create_chart(NETDATA_EBPF_FAMILY, - NETDATA_TCP_FUNCTION_BYTES, - "TCP bandwidth", - EBPF_COMMON_DIMENSION_BYTESS, + ebpf_create_chart(NETDATA_EBPF_FAMILY, NETDATA_TCP_FUNCTION_BITS, + "TCP bandwidth", EBPF_COMMON_DIMENSION_BITS, NETDATA_SOCKET_GROUP, 21071, ebpf_create_global_dimension, @@ -486,10 +472,8 @@ static void ebpf_create_global_charts(ebpf_module_t *em) &socket_publish_aggregated[NETDATA_UDP_START], 2); - ebpf_create_chart(NETDATA_EBPF_FAMILY, - NETDATA_UDP_FUNCTION_BYTES, - "UDP bandwidth", - EBPF_COMMON_DIMENSION_BYTESS, + ebpf_create_chart(NETDATA_EBPF_FAMILY, NETDATA_UDP_FUNCTION_BITS, + "UDP bandwidth", EBPF_COMMON_DIMENSION_BITS, NETDATA_SOCKET_GROUP, 21075, ebpf_create_global_dimension, @@ -520,17 +504,17 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) { UNUSED(em); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_SENT, - "Bytes sent", - EBPF_COMMON_DIMENSION_BYTESS, + "Bytes sent", EBPF_COMMON_DIMENSION_BITS, NETDATA_APPS_NET_GROUP, 20080, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_RECV, - "bytes received", - EBPF_COMMON_DIMENSION_BYTESS, + "bytes received", EBPF_COMMON_DIMENSION_BITS, NETDATA_APPS_NET_GROUP, 20081, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_TCP_SEND_CALLS, @@ -538,6 +522,7 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_NET_GROUP, 20082, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_TCP_RECV_CALLS, @@ -545,6 +530,7 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_NET_GROUP, 20083, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_TCP_RETRANSMIT, @@ -552,6 +538,7 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_NET_GROUP, 20084, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_UDP_SEND_CALLS, @@ -559,6 +546,7 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_NET_GROUP, 20085, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); ebpf_create_charts_on_apps(NETDATA_NET_APPS_BANDWIDTH_UDP_RECV_CALLS, @@ -566,6 +554,7 @@ void ebpf_socket_create_apps_charts(ebpf_module_t *em, struct target *root) EBPF_COMMON_DIMENSION_CALL, NETDATA_APPS_NET_GROUP, 20086, + ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX], root); socket_apps_created = 1; @@ -658,8 +647,7 @@ static void ebpf_socket_create_nv_charts(netdata_vector_plot_t *ptr) if (ptr == (netdata_vector_plot_t *)&outbound_vectors) { ebpf_socket_create_nv_chart(NETDATA_NV_OUTBOUND_BYTES, - "Outbound connections (bytes).", - EBPF_COMMON_DIMENSION_BYTESS, + "Outbound connections (bytes).", EBPF_COMMON_DIMENSION_BYTES, NETDATA_NETWORK_CONNECTIONS_GROUP, 21080, ptr); @@ -679,8 +667,7 @@ static void ebpf_socket_create_nv_charts(netdata_vector_plot_t *ptr) ptr); } else { ebpf_socket_create_nv_chart(NETDATA_NV_INBOUND_BYTES, - "Inbound connections (bytes)", - EBPF_COMMON_DIMENSION_BYTESS, + "Inbound connections (bytes)", EBPF_COMMON_DIMENSION_BYTES, NETDATA_NETWORK_CONNECTIONS_GROUP, 21084, ptr); @@ -1511,15 +1498,9 @@ static void read_hash_global_tables() void ebpf_socket_fill_publish_apps(uint32_t current_pid, ebpf_bandwidth_t *eb) { ebpf_socket_publish_apps_t *curr = socket_bandwidth_curr[current_pid]; - ebpf_socket_publish_apps_t *prev = socket_bandwidth_prev[current_pid]; if (!curr) { - ebpf_socket_publish_apps_t *ptr = callocz(2, sizeof(ebpf_socket_publish_apps_t)); - curr = &ptr[0]; + curr = callocz(1, sizeof(ebpf_socket_publish_apps_t)); socket_bandwidth_curr[current_pid] = curr; - prev = &ptr[1]; - socket_bandwidth_prev[current_pid] = prev; - } else { - memcpy(prev, curr, sizeof(ebpf_socket_publish_apps_t)); } curr->bytes_sent = eb->bytes_sent; @@ -1529,8 +1510,6 @@ void ebpf_socket_fill_publish_apps(uint32_t current_pid, ebpf_bandwidth_t *eb) curr->retransmit = eb->retransmit; curr->call_udp_sent = eb->call_udp_sent; curr->call_udp_received = eb->call_udp_received; - - ebpf_socket_update_apps_publish(curr, prev); } /** @@ -1778,19 +1757,19 @@ static void ebpf_socket_cleanup(void *ptr) heartbeat_t hb; heartbeat_init(&hb); - uint32_t tick = 200*USEC_PER_MS; + uint32_t tick = 2*USEC_PER_MS; while (!read_thread_closed) { usec_t dt = heartbeat_next(&hb, tick); UNUSED(dt); } freez(socket_aggregated_data); + ebpf_cleanup_publish_syscall(socket_publish_aggregated); freez(socket_publish_aggregated); freez(socket_hash_values); clean_thread_structures(); freez(socket_bandwidth_curr); - freez(socket_bandwidth_prev); freez(bandwidth_vector); freez(socket_values); @@ -1843,7 +1822,6 @@ static void ebpf_socket_allocate_global_vectors(size_t length) socket_hash_values = callocz(ebpf_nprocs, sizeof(netdata_idx_t)); socket_bandwidth_curr = callocz((size_t)pid_max, sizeof(ebpf_socket_publish_apps_t *)); - socket_bandwidth_prev = callocz((size_t)pid_max, sizeof(ebpf_socket_publish_apps_t *)); bandwidth_vector = callocz((size_t)ebpf_nprocs, sizeof(ebpf_bandwidth_t)); socket_values = callocz((size_t)ebpf_nprocs, sizeof(netdata_socket_t)); @@ -1921,9 +1899,13 @@ void *ebpf_socket_thread(void *ptr) goto endsocket; } + int algorithms[NETDATA_MAX_SOCKET_VECTOR] = { + NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, + NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX, NETDATA_EBPF_ABSOLUTE_IDX + }; ebpf_global_labels( socket_aggregated_data, socket_publish_aggregated, socket_dimension_names, socket_id_names, - NETDATA_MAX_SOCKET_VECTOR); + algorithms, NETDATA_MAX_SOCKET_VECTOR); ebpf_create_global_charts(em); diff --git a/collectors/ebpf.plugin/ebpf_socket.h b/collectors/ebpf.plugin/ebpf_socket.h index 0e19f80e8..1316c003a 100644 --- a/collectors/ebpf.plugin/ebpf_socket.h +++ b/collectors/ebpf.plugin/ebpf_socket.h @@ -46,16 +46,16 @@ typedef enum ebpf_socket_idx { // Global chart name #define NETDATA_TCP_FUNCTION_COUNT "tcp_functions" -#define NETDATA_TCP_FUNCTION_BYTES "tcp_bandwidth" +#define NETDATA_TCP_FUNCTION_BITS "total_tcp_bandwidth" #define NETDATA_TCP_FUNCTION_ERROR "tcp_error" #define NETDATA_TCP_RETRANSMIT "tcp_retransmit" #define NETDATA_UDP_FUNCTION_COUNT "udp_functions" -#define NETDATA_UDP_FUNCTION_BYTES "udp_bandwidth" +#define NETDATA_UDP_FUNCTION_BITS "total_udp_bandwidth" #define NETDATA_UDP_FUNCTION_ERROR "udp_error" // Charts created on Apps submenu -#define NETDATA_NET_APPS_BANDWIDTH_SENT "bandwidth_sent" -#define NETDATA_NET_APPS_BANDWIDTH_RECV "bandwidth_recv" +#define NETDATA_NET_APPS_BANDWIDTH_SENT "total_bandwidth_sent" +#define NETDATA_NET_APPS_BANDWIDTH_RECV "total_bandwidth_recv" #define NETDATA_NET_APPS_BANDWIDTH_TCP_SEND_CALLS "bandwidth_tcp_send" #define NETDATA_NET_APPS_BANDWIDTH_TCP_RECV_CALLS "bandwidth_tcp_recv" #define NETDATA_NET_APPS_BANDWIDTH_TCP_RETRANSMIT "bandwidth_tcp_retransmit" @@ -271,6 +271,5 @@ extern ebpf_network_viewer_port_list_t *listen_ports; extern void update_listen_table(uint16_t value, uint8_t proto); extern ebpf_socket_publish_apps_t **socket_bandwidth_curr; -extern ebpf_socket_publish_apps_t **socket_bandwidth_prev; #endif diff --git a/collectors/proc.plugin/proc_net_dev.c b/collectors/proc.plugin/proc_net_dev.c index a90e3c3ee..5355077f8 100644 --- a/collectors/proc.plugin/proc_net_dev.c +++ b/collectors/proc.plugin/proc_net_dev.c @@ -590,10 +590,9 @@ int do_proc_net_dev(int update_every, usec_t dt) { snprintfz(buffer, FILENAME_MAX, path_to_sys_class_net_duplex, d->name); d->filename_duplex = strdupz(buffer); - - snprintfz(buffer, FILENAME_MAX, path_to_sys_class_net_operstate, d->name); - d->filename_operstate = strdupz(buffer); } + snprintfz(buffer, FILENAME_MAX, path_to_sys_class_net_operstate, d->name); + d->filename_operstate = strdupz(buffer); snprintfz(buffer, FILENAME_MAX, "plugin:proc:/proc/net/dev:%s", d->name); d->enabled = config_get_boolean_ondemand(buffer, "enabled", d->enabled); diff --git a/collectors/proc.plugin/sys_kernel_mm_ksm.c b/collectors/proc.plugin/sys_kernel_mm_ksm.c index 0a93f54ee..a0e5690fe 100644 --- a/collectors/proc.plugin/sys_kernel_mm_ksm.c +++ b/collectors/proc.plugin/sys_kernel_mm_ksm.c @@ -110,7 +110,7 @@ int do_sys_kernel_mm_ksm(int update_every, usec_t dt) { , PLUGIN_PROC_MODULE_KSM_NAME , NETDATA_CHART_PRIO_MEM_KSM , update_every - , RRDSET_TYPE_AREA + , RRDSET_TYPE_STACKED ); rd_shared = rrddim_add(st_mem_ksm, "shared", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); diff --git a/collectors/python.d.plugin/adaptec_raid/README.md b/collectors/python.d.plugin/adaptec_raid/README.md index d35ccecbc..b14e8f9ba 100644 --- a/collectors/python.d.plugin/adaptec_raid/README.md +++ b/collectors/python.d.plugin/adaptec_raid/README.md @@ -6,53 +6,73 @@ sidebar_label: "Adaptec RAID" # Adaptec RAID controller monitoring with Netdata -Collects logical and physical devices metrics. +Collects logical and physical devices metrics using `arcconf` command-line utility. + +Executed commands: + +- `sudo -n arcconf GETCONFIG 1 LD` +- `sudo -n arcconf GETCONFIG 1 PD` ## Requirements -The module uses `arcconf`, which can only be executed by root. It uses -`sudo` and assumes that it is configured such that the `netdata` user can -execute `arcconf` as root without password. +The module uses `arcconf`, which can only be executed by `root`. It uses +`sudo` and assumes that it is configured such that the `netdata` user can execute `arcconf` as root without a password. -Add to `sudoers`: +- Add to your `/etc/sudoers` file: -``` +`which arcconf` shows the full path to the binary. + +```bash netdata ALL=(root) NOPASSWD: /path/to/arcconf ``` -To grab stats it executes: +- Reset Netdata's systemd + unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux + distributions with systemd) -- `sudo -n arcconf GETCONFIG 1 LD` -- `sudo -n arcconf GETCONFIG 1 PD` +The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `arcconf` using `sudo`. -It produces: -1. **Logical Device Status** +As the `root` user, do the following: -2. **Physical Device State** +```cmd +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` -3. **Physical Device S.M.A.R.T warnings** +## Charts -4. **Physical Device Temperature** +- Logical Device Status +- Physical Device State +- Physical Device S.M.A.R.T warnings +- Physical Device Temperature -## Configuration +## Enable the collector -**adaptec_raid** is disabled by default. Should be explicitly enabled in `python.d.conf`. +The `adaptec_raid` collector is disabled by default. To enable it, use `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` +file. -```yaml -adaptec_raid: yes +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf ``` -Edit the `python.d/adaptec_raid.conf` configuration file using `edit-config` from the Netdata [config -directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. +Change the value of the `adaptec_raid` setting to `yes`. Save the file and restart the Netdata Agent +with `sudo systemctl restart netdata`, or the appropriate method for your system. + +## Configuration + +Edit the `python.d/adaptec_raid.conf` configuration file using `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. ```bash cd /etc/netdata # Replace this path with your Netdata config directory, if different sudo ./edit-config python.d/adaptec_raid.conf ``` - - ![image](https://user-images.githubusercontent.com/22274335/47278133-6d306680-d601-11e8-87c2-cc9c0f42d686.png) --- diff --git a/collectors/python.d.plugin/elasticsearch/README.md b/collectors/python.d.plugin/elasticsearch/README.md index d8d7581bc..cf1834c5a 100644 --- a/collectors/python.d.plugin/elasticsearch/README.md +++ b/collectors/python.d.plugin/elasticsearch/README.md @@ -80,6 +80,7 @@ Sample: local: host : 'ipaddress' # Elasticsearch server ip address or hostname. port : 'port' # Port on which elasticsearch listens. + scheme : 'http' # URL scheme. Use 'https' if your elasticsearch uses TLS. node_status : yes/no # Get metrics from "/_nodes/_local/stats". Enabled by default. cluster_health : yes/no # Get metrics from "/_cluster/health". Enabled by default. cluster_stats : yes/no # Get metrics from "'/_cluster/stats". Enabled by default. diff --git a/collectors/python.d.plugin/hpssa/README.md b/collectors/python.d.plugin/hpssa/README.md index 2079ff2ad..af8c4378e 100644 --- a/collectors/python.d.plugin/hpssa/README.md +++ b/collectors/python.d.plugin/hpssa/README.md @@ -8,44 +8,64 @@ sidebar_label: "HP Smart Storage Arrays" Monitors controller, cache module, logical and physical drive state and temperature using `ssacli` tool. +Executed commands: + +- `sudo -n ssacli ctrl all show config detail` + ## Requirements: This module uses `ssacli`, which can only be executed by root. It uses -`sudo` and assumes that it is configured such that the `netdata` user can -execute `ssacli` as root without password. +`sudo` and assumes that it is configured such that the `netdata` user can execute `ssacli` as root without a password. -Add to `sudoers`: +- Add to your `/etc/sudoers` file: -``` +`which ssacli` shows the full path to the binary. + +```bash netdata ALL=(root) NOPASSWD: /path/to/ssacli ``` -To collect metrics, the module executes: `sudo -n ssacli ctrl all show config detail` +- Reset Netdata's systemd + unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux + distributions with systemd) + +The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `ssacli` using `sudo`. + +As the `root` user, do the following: + +```cmd +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` -This module produces: +## Charts -1. Controller state and temperature -2. Cache module state and temperature -3. Logical drive state -4. Physical drive state and temperature +- Controller status +- Controller temperature +- Logical drive status +- Physical drive status +- Physical drive temperature ## Enable the collector -The `hpssa` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config -directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. +The `hpssa` collector is disabled by default. To enable it, use `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` +file. ```bash cd /etc/netdata # Replace this path with your Netdata config directory, if different sudo ./edit-config python.d.conf ``` -Change the value of the `hpssa` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl -restart netdata`, or the appropriate method for your system, to finish enabling the `hpssa` collector. +Change the value of the `hpssa` setting to `yes`. Save the file and restart the Netdata Agent +with `sudo systemctl restart netdata`, or the appropriate method for your system. ## Configuration -Edit the `python.d/hpssa.conf` configuration file using `edit-config` from the Netdata [config -directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. +Edit the `python.d/hpssa.conf` configuration file using `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. ```bash cd /etc/netdata # Replace this path with your Netdata config directory, if different diff --git a/collectors/python.d.plugin/megacli/README.md b/collectors/python.d.plugin/megacli/README.md index 400a45973..4fb7eb1c2 100644 --- a/collectors/python.d.plugin/megacli/README.md +++ b/collectors/python.d.plugin/megacli/README.md @@ -6,50 +6,68 @@ sidebar_label: "MegaRAID controllers" # MegaRAID controller monitoring with Netdata -Collects adapter, physical drives and battery stats. +Collects adapter, physical drives and battery stats using `megacli` command-line tool. + +Executed commands: + +- `sudo -n megacli -LDPDInfo -aAll` +- `sudo -n megacli -AdpBbuCmd -a0` ## Requirements -Uses the `megacli` program, which can only be executed by root. It uses -`sudo` and assumes that it is configured such that the `netdata` user can -execute `megacli` as root without password. +The module uses `megacli`, which can only be executed by `root`. It uses +`sudo` and assumes that it is configured such that the `netdata` user can execute `megacli` as root without a password. -Add to `sudoers`: +- Add to your `/etc/sudoers` file: -``` +`which megacli` shows the full path to the binary. + +```bash netdata ALL=(root) NOPASSWD: /path/to/megacli ``` +- Reset Netdata's systemd + unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux + distributions with systemd) -To grab stats it executes: +The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `megacli` using `sudo`. -- `sudo -n megacli -LDPDInfo -aAll` -- `sudo -n megacli -AdpBbuCmd -a0` -It produces: +As the `root` user, do the following: -1. **Adapter State** +```cmd +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` -2. **Physical Drives Media Errors** +## Charts -3. **Physical Drives Predictive Failures** +- Adapter State +- Physical Drives Media Errors +- Physical Drives Predictive Failures +- Battery Relative State of Charge +- Battery Cycle Count -4. **Battery Relative State of Charge** +## Enable the collector -5. **Battery Cycle Count** +The `megacli` collector is disabled by default. To enable it, use `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` +file. +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf +``` +Change the value of the `megacli` setting to `yes`. Save the file and restart the Netdata Agent +with `sudo systemctl restart netdata`, or the appropriate method for your system. ## Configuration -**megacli** is disabled by default. Should be explicitly enabled in `python.d.conf`. - -```yaml -megacli: yes -``` - -Edit the `python.d/megacli.conf` configuration file using `edit-config` from the Netdata [config -directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. +Edit the `python.d/megacli.conf` configuration file using `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. ```bash cd /etc/netdata # Replace this path with your Netdata config directory, if different diff --git a/collectors/python.d.plugin/python_modules/bases/FrameworkServices/SocketService.py b/collectors/python.d.plugin/python_modules/bases/FrameworkServices/SocketService.py index bef3792da..d6c755058 100644 --- a/collectors/python.d.plugin/python_modules/bases/FrameworkServices/SocketService.py +++ b/collectors/python.d.plugin/python_modules/bases/FrameworkServices/SocketService.py @@ -247,7 +247,7 @@ class SocketService(SimpleService): if self._check_raw_data(data): break - self.debug(u'final response: {0}'.format(data)) + self.debug(u'final response: {0}'.format(data if not raw else u'binary data')) return data def _get_raw_data(self, raw=False, request=None): diff --git a/collectors/python.d.plugin/samba/README.md b/collectors/python.d.plugin/samba/README.md index ed26d2871..a5126510f 100644 --- a/collectors/python.d.plugin/samba/README.md +++ b/collectors/python.d.plugin/samba/README.md @@ -6,83 +6,110 @@ sidebar_label: "Samba" # Samba monitoring with Netdata -Monitors the performance metrics of Samba file sharing. +Monitors the performance metrics of Samba file sharing using `smbstatus` command-line tool. + +Executed commands: + +- `sudo -n smbstatus -P` ## Requirements -- `smbstatus` program -- `sudo` program -- `smbd` must be compiled with profiling enabled -- `smbd` must be started either with the `-P 1` option or inside `smb.conf` using `smbd profiling level` -- `netdata` user needs to be able to sudo the `smbstatus` program without password +- `smbstatus` program +- `sudo` program +- `smbd` must be compiled with profiling enabled +- `smbd` must be started either with the `-P 1` option or inside `smb.conf` using `smbd profiling level` -It produces the following charts: +The module uses `smbstatus`, which can only be executed by `root`. It uses +`sudo` and assumes that it is configured such that the `netdata` user can execute `smbstatus` as root without a +password. -1. **Syscall R/Ws** in kilobytes/s +- Add to your `/etc/sudoers` file: - - sendfile - - recvfile +`which smbstatus` shows the full path to the binary. -2. **Smb2 R/Ws** in kilobytes/s +```bash +netdata ALL=(root) NOPASSWD: /path/to/smbstatus +``` - - readout - - writein - - readin - - writeout +- Reset Netdata's systemd + unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux + distributions with systemd) -3. **Smb2 Create/Close** in operations/s +The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `smbstatus` using `sudo`. - - create - - close -4. **Smb2 Info** in operations/s +As the `root` user, do the following: - - getinfo - - setinfo +```cmd +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` -5. **Smb2 Find** in operations/s +## Charts - - find +1. **Syscall R/Ws** in kilobytes/s -6. **Smb2 Notify** in operations/s + - sendfile + - recvfile - - notify +2. **Smb2 R/Ws** in kilobytes/s -7. **Smb2 Lesser Ops** as counters + - readout + - writein + - readin + - writeout - - tcon - - negprot - - tdis - - cancel - - logoff - - flush - - lock - - keepalive - - break - - sessetup +3. **Smb2 Create/Close** in operations/s -## prerequisite + - create + - close -This module uses `smbstatus` which can only be executed by root. It uses -`sudo` and assumes that it is configured such that the `netdata` user can -execute `smbstatus` as root without password. +4. **Smb2 Info** in operations/s -Add to `sudoers`: + - getinfo + - setinfo -``` -netdata ALL=(root) NOPASSWD: /path/to/smbstatus -``` +5. **Smb2 Find** in operations/s -## Configuration + - find -**samba** is disabled by default. Should be explicitly enabled in `python.d.conf`. +6. **Smb2 Notify** in operations/s -```yaml -samba: yes + - notify + +7. **Smb2 Lesser Ops** as counters + + - tcon + - negprot + - tdis + - cancel + - logoff + - flush + - lock + - keepalive + - break + - sessetup + +## Enable the collector + +The `samba` collector is disabled by default. To enable it, use `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` +file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf ``` -Edit the `python.d/samba.conf` configuration file using `edit-config` from the Netdata [config -directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. +Change the value of the `samba` setting to `yes`. Save the file and restart the Netdata Agent +with `sudo systemctl restart netdata`, or the appropriate method for your system. + +## Configuration + +Edit the `python.d/samba.conf` configuration file using `edit-config` from the +Netdata [config directory](/docs/configure/nodes.md), which is typically at `/etc/netdata`. ```bash cd /etc/netdata # Replace this path with your Netdata config directory, if different diff --git a/daemon/get-kubernetes-labels.sh.in b/daemon/get-kubernetes-labels.sh.in index 805d027b8..5aa89ab9d 100644 --- a/daemon/get-kubernetes-labels.sh.in +++ b/daemon/get-kubernetes-labels.sh.in @@ -2,17 +2,40 @@ # Checks if netdata is running in a kubernetes pod and fetches that pod's labels -if [ -n "${KUBERNETES_SERVICE_HOST}" ] && [ -n "${KUBERNETES_PORT_443_TCP_PORT}" ] && [ -n "${MY_POD_NAMESPACE}" ] && [ -n "${MY_POD_NAME}" ]; then - if command -v jq >/dev/null 2>&1; then - KUBE_TOKEN="$(</var/run/secrets/kubernetes.io/serviceaccount/token)" - URL="https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/$MY_POD_NAMESPACE/pods/$MY_POD_NAME" - curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" "$URL" | - jq -r '.metadata.labels' | grep ':' | tr -d '," ' - exit 0 - else - echo "jq command not available. Please install jq to get host labels for kubernetes pods." - exit 1 - fi -else - exit 0 +if [ -z "${KUBERNETES_SERVICE_HOST}" ] || [ -z "${KUBERNETES_PORT_443_TCP_PORT}" ] || [ -z "${MY_POD_NAMESPACE}" ] || [ -z "${MY_POD_NAME}" ]; then + exit 0 fi + +if ! command -v jq > /dev/null 2>&1; then + echo "jq command not available. Please install jq to get host labels for kubernetes pods." + exit 1 +fi + +TOKEN="$(< /var/run/secrets/kubernetes.io/serviceaccount/token)" +HEADER="Authorization: Bearer $TOKEN" +HOST="$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT" + +URL="https://$HOST/api/v1/namespaces/$MY_POD_NAMESPACE/pods/$MY_POD_NAME" +if ! POD_DATA=$(curl -sSk -H "$HEADER" "$URL" 2>&1); then + echo "error on curl '${URL}': ${POD_DATA}." + exit 1 +fi + +URL="https://$HOST/api/v1/namespaces/kube-system" +if ! KUBE_SYSTEM_NS_DATA=$(curl -sSk -H "$HEADER" "$URL" 2>&1); then + echo "error on curl '${URL}': ${KUBE_SYSTEM_NS_DATA}." + exit 1 +fi + +if ! POD_LABELS=$(jq -r '.metadata.labels' <<< "$POD_DATA" | grep ':' | tr -d '," ' 2>&1); then + echo "error on 'jq' parse pod data: ${POD_LABELS}." + exit 1 +fi + +if ! KUBE_SYSTEM_NS_UID=$(jq -r '.metadata.uid' <<< "$KUBE_SYSTEM_NS_DATA" 2>&1); then + echo "error on 'jq' parse kube_system_ns: ${KUBE_SYSTEM_NS_UID}." + exit 1 +fi + +echo -e "$POD_LABELS\nk8s_cluster_id:$KUBE_SYSTEM_NS_UID" +exit 0 diff --git a/daemon/system-info.sh b/daemon/system-info.sh index 80eb82f86..05d8667c2 100755 --- a/daemon/system-info.sh +++ b/daemon/system-info.sh @@ -108,9 +108,9 @@ else fi # shellcheck disable=SC2153 - if [ "${NAME}" = "unknown" ] || [ "${VERSION}" = "unknown" ] || [ "${ID}" = "unknown" ]; then + if [ "${CONTAINER_NAME}" = "unknown" ] || [ "${CONTAINER_VERSION}" = "unknown" ] || [ "${CONTAINER_ID}" = "unknown" ]; then if [ -f "/etc/lsb-release" ]; then - if [ "${OS_DETECTION}" = "unknown" ]; then + if [ "${CONTAINER_OS_DETECTION}" = "unknown" ]; then CONTAINER_OS_DETECTION="/etc/lsb-release" else CONTAINER_OS_DETECTION="Mixed" @@ -119,19 +119,19 @@ else DISTRIB_RELEASE="unknown" DISTRIB_CODENAME="unknown" eval "$(grep -E "^(DISTRIB_ID|DISTRIB_RELEASE|DISTRIB_CODENAME)=" < /etc/lsb-release)" - if [ "${NAME}" = "unknown" ]; then CONTAINER_NAME="${DISTRIB_ID}"; fi - if [ "${VERSION}" = "unknown" ]; then CONTAINER_VERSION="${DISTRIB_RELEASE}"; fi - if [ "${ID}" = "unknown" ]; then CONTAINER_ID="${DISTRIB_CODENAME}"; fi + if [ "${CONTAINER_NAME}" = "unknown" ]; then CONTAINER_NAME="${DISTRIB_ID}"; fi + if [ "${CONTAINER_VERSION}" = "unknown" ]; then CONTAINER_VERSION="${DISTRIB_RELEASE}"; fi + if [ "${CONTAINER_ID}" = "unknown" ]; then CONTAINER_ID="${DISTRIB_CODENAME}"; fi fi if [ -n "$(command -v lsb_release 2> /dev/null)" ]; then - if [ "${OS_DETECTION}" = "unknown" ]; then + if [ "${CONTAINER_OS_DETECTION}" = "unknown" ]; then CONTAINER_OS_DETECTION="lsb_release" else CONTAINER_OS_DETECTION="Mixed" fi - if [ "${NAME}" = "unknown" ]; then CONTAINER_NAME="$(lsb_release -is 2> /dev/null)"; fi - if [ "${VERSION}" = "unknown" ]; then CONTAINER_VERSION="$(lsb_release -rs 2> /dev/null)"; fi - if [ "${ID}" = "unknown" ]; then CONTAINER_ID="$(lsb_release -cs 2> /dev/null)"; fi + if [ "${CONTAINER_NAME}" = "unknown" ]; then CONTAINER_NAME="$(lsb_release -is 2> /dev/null)"; fi + if [ "${CONTAINER_VERSION}" = "unknown" ]; then CONTAINER_VERSION="$(lsb_release -rs 2> /dev/null)"; fi + if [ "${CONTAINER_ID}" = "unknown" ]; then CONTAINER_ID="$(lsb_release -cs 2> /dev/null)"; fi fi fi fi @@ -143,7 +143,9 @@ HOST_VERSION="unknown" HOST_VERSION_ID="unknown" HOST_ID="unknown" HOST_ID_LIKE="unknown" -if [ "${CONTAINER}" = "unknown" ]; then + +# 'systemd-detect-virt' returns 'none' if there is no hardware/container virtualization. +if [ "${CONTAINER}" = "unknown" ] || [ "${CONTAINER}" = "none" ]; then for v in NAME ID ID_LIKE VERSION VERSION_ID OS_DETECTION; do eval "HOST_$v=\$CONTAINER_$v; CONTAINER_$v=none" done diff --git a/database/rrdhost.c b/database/rrdhost.c index 1af27114d..45c314602 100644 --- a/database/rrdhost.c +++ b/database/rrdhost.c @@ -986,6 +986,8 @@ static struct label *rrdhost_load_auto_labels(void) label_list = add_label_to_list(label_list, "_is_k8s_node", localhost->system_info->is_k8s_node, LABEL_SOURCE_AUTO); + label_list = add_aclk_host_labels(label_list); + label_list = add_label_to_list( label_list, "_is_parent", (localhost->next || configured_as_parent()) ? "true" : "false", LABEL_SOURCE_AUTO); diff --git a/health/health.d/tcp_resets.conf b/health/health.d/tcp_resets.conf index 91dad3c6a..36a550a5d 100644 --- a/health/health.d/tcp_resets.conf +++ b/health/health.d/tcp_resets.conf @@ -36,7 +36,7 @@ units: tcp resets/s every: 10s warn: $this > ((($1m_ipv4_tcp_resets_sent < 5)?(5):($1m_ipv4_tcp_resets_sent)) * (($status >= $WARNING) ? (1) : (20))) - delay: up 0 down 60m multiplier 1.2 max 2h + delay: up 20s down 60m multiplier 1.2 max 2h options: no-clear-notification info: average TCP RESETS this host is sending, over the last 10 seconds (this can be an indication that a port scan is made, or that a service running on this host has crashed; clear notification for this alarm will not be sent) to: sysadmin @@ -61,7 +61,7 @@ units: tcp resets/s every: 10s warn: $this > ((($1m_ipv4_tcp_resets_received < 5)?(5):($1m_ipv4_tcp_resets_received)) * (($status >= $WARNING) ? (1) : (10))) - delay: up 0 down 60m multiplier 1.2 max 2h + delay: up 20s down 60m multiplier 1.2 max 2h options: no-clear-notification info: average TCP RESETS this host is receiving, over the last 10 seconds (this can be an indication that a service this host needs, has crashed; clear notification for this alarm will not be sent) to: sysadmin diff --git a/health/health_config.c b/health/health_config.c index a200a0dbf..1acf36933 100644 --- a/health/health_config.c +++ b/health/health_config.c @@ -1023,5 +1023,13 @@ void health_readdir(RRDHOST *host, const char *user_path, const char *stock_path return; } + int stock_enabled = (int)config_get_boolean(CONFIG_SECTION_HEALTH, "enable stock health configuration", + CONFIG_BOOLEAN_YES); + + if (!stock_enabled) { + info("Netdata will not load stock alarms."); + stock_path = user_path; + } + recursive_config_double_dir_load(user_path, stock_path, subpath, health_readfile, (void *) host, 0); } diff --git a/health/health_json.c b/health/health_json.c index d068b5427..7b5a1e3cb 100644 --- a/health/health_json.c +++ b/health/health_json.c @@ -352,14 +352,15 @@ void health_active_log_alarms_2json(RRDHOST *host, BUFFER *wb) { unsigned int count = 0; ALARM_ENTRY *ae; for(ae = host->health_log.alarms; ae && count < max ; ae = ae->next) { - - if(likely(!((ae->new_status == RRDCALC_STATUS_WARNING || ae->new_status == RRDCALC_STATUS_CRITICAL) - && !ae->updated_by_id))) - continue; - - if(likely(count)) buffer_strcat(wb, ","); + if (!ae->updated_by_id && + ((ae->new_status == RRDCALC_STATUS_WARNING || ae->new_status == RRDCALC_STATUS_CRITICAL) || + ((ae->old_status == RRDCALC_STATUS_WARNING || ae->old_status == RRDCALC_STATUS_CRITICAL) && + ae->new_status == RRDCALC_STATUS_REMOVED))) { + if (likely(count)) + buffer_strcat(wb, ","); health_alarm_entry2json_nolock(wb, ae, host); - count++; + count++; + } } buffer_strcat(wb, "]"); diff --git a/health/notifications/alarm-notify.sh.in b/health/notifications/alarm-notify.sh.in index 456e20cc5..3bf8db5f6 100755 --- a/health/notifications/alarm-notify.sh.in +++ b/health/notifications/alarm-notify.sh.in @@ -411,6 +411,8 @@ else done fi +OPSGENIE_API_URL=${OPSGENIE_API_URL:-"https://api.opsgenie.com"} + # If we didn't autodetect the character set for e-mail and it wasn't # set by the user, we need to set it to a reasonable default. UTF-8 # should be correct for almost all modern UNIX systems. @@ -853,7 +855,7 @@ send_email() { fi [ -n "${sender_email}" ] && opts+=(-f "${sender_email}") - [ -n "${sender_name}" ] && opts+=(-F "${sender_name}") + [ -n "${sender_name}" ] && sendmail --help 2>&1 | grep -q "\-F " && opts+=(-F "${sender_name}") if [ "${debug}" = "1" ]; then echo >&2 "--- BEGIN sendmail command ---" @@ -2052,7 +2054,7 @@ send_dynatrace() { local dynatrace_url="${DYNATRACE_SERVER}/e/${DYNATRACE_SPACE}/api/v1/events" local description="NetData Notification for: ${host} ${chart}.${name} is ${status}" local payload="" - + payload=$(cat <<EOF { "title": "NetData Alarm from ${host}", @@ -2179,7 +2181,7 @@ send_opsgenie() { EOF ) - httpcode=$(docurl -X POST -H "Content-Type: application/json" -d "${payload}" "https://api.opsgenie.com/v1/json/integrations/webhooks/netdata?apiKey=${OPSGENIE_API_KEY}") + httpcode=$(docurl -X POST -H "Content-Type: application/json" -d "${payload}" "${OPSGENIE_API_URL}/v1/json/integrations/webhooks/netdata?apiKey=${OPSGENIE_API_KEY}") # https://docs.opsgenie.com/docs/alert-api#create-alert if [ "${httpcode}" = "200" ]; then info "sent opsgenie notification for: ${host} ${chart}.${name} is ${status}" diff --git a/health/notifications/health_alarm_notify.conf b/health/notifications/health_alarm_notify.conf index 827a47d99..be669e135 100755 --- a/health/notifications/health_alarm_notify.conf +++ b/health/notifications/health_alarm_notify.conf @@ -284,6 +284,7 @@ SEND_OPSGENIE="YES" # Api key OPSGENIE_API_KEY="" +OPSGENIE_API_URL="" DEFAULT_RECIPIENT_OPSGENIE="" diff --git a/health/notifications/opsgenie/README.md b/health/notifications/opsgenie/README.md index aeb315489..7ae409df4 100644 --- a/health/notifications/opsgenie/README.md +++ b/health/notifications/opsgenie/README.md @@ -20,14 +20,16 @@ directory](/docs/configure/nodes.md): ./edit-config health_alarm_notify.conf ``` -Change the variable `OPSGENIE_API_KEY` with the API key you got from Opsgenie. +Change the variable `OPSGENIE_API_KEY` with the API key you got from Opsgenie. +`OPSGENIE_API_URL` defaults to https://api.opsgenie.com, however there are region specific API URLs such as https://eu.api.opsgenie.com, so set this if required. ``` SEND_OPSGENIE="YES" # Api key -# Default Opsgenie APi +# Default Opsgenie API OPSGENIE_API_KEY="11111111-2222-3333-4444-555555555555" +OPSGENIE_API_URL="" ``` Changes to `health_alarm_notify.conf` do not require a Netdata restart. You can test your Opsgenie notifications diff --git a/libnetdata/libnetdata.c b/libnetdata/libnetdata.c index c9b7ab198..325df3f76 100644 --- a/libnetdata/libnetdata.c +++ b/libnetdata/libnetdata.c @@ -1406,7 +1406,7 @@ void recursive_config_double_dir_load(const char *user_path, const char *stock_p if (!dir) { error("CONFIG cannot open stock config directory '%s'.", sdir); } - else { + else if (strcmp(udir, sdir)) { struct dirent *de = NULL; while((de = readdir(dir))) { if(de->d_type == DT_DIR || de->d_type == DT_LNK) { diff --git a/packaging/dashboard.checksums b/packaging/dashboard.checksums index 6b12cd41c..f343ef57e 100644 --- a/packaging/dashboard.checksums +++ b/packaging/dashboard.checksums @@ -1 +1 @@ -fa70b08877061d939e16efefe779ce9674624d483215d0cbea6d0b96056f36f3 dashboard.tar.gz +804e4610477ab64726f62cf6093613197be3ccf0140959364426065871075309 dashboard.tar.gz diff --git a/packaging/dashboard.version b/packaging/dashboard.version index c757c9824..3ef97df69 100644 --- a/packaging/dashboard.version +++ b/packaging/dashboard.version @@ -1 +1 @@ -v2.13.0 +v2.13.6 diff --git a/packaging/ebpf.checksums b/packaging/ebpf.checksums index 9db8e7e4b..e35fa1f17 100644 --- a/packaging/ebpf.checksums +++ b/packaging/ebpf.checksums @@ -1,3 +1,3 @@ -bcc2e38754f277e84aefdb2760d7de2b32611576718234e1cecdb70a87e93497 netdata-kernel-collector-glibc-v0.5.4.tar.xz -912675155f438c9fdccc1e91c1423fa4bb914a9c7e2d7b843f551e053f4374eb netdata-kernel-collector-musl-v0.5.4.tar.xz -dd0f63895305c38669b512f9e95a75057340f04ea999c3ea3540cb18a893dc52 netdata-kernel-collector-static-v0.5.4.tar.xz +d9c1c81fe3a8b9af7fc1174a28c16ddb24e2f3ff79e6beb1b2eb184bf0d2e8c0 netdata-kernel-collector-glibc-v0.5.5.tar.xz +0e1dd5e12a58dda53576b2dab963cd26fa26fe2084d84c51becb9238d1055fc1 netdata-kernel-collector-musl-v0.5.5.tar.xz +d6d65e5f40a83880aa7dd740829a7ffe6a0805637e1616805aebdff088a3fcb0 netdata-kernel-collector-static-v0.5.5.tar.xz diff --git a/packaging/ebpf.version b/packaging/ebpf.version index 8ea9cc1eb..12aa8c541 100644 --- a/packaging/ebpf.version +++ b/packaging/ebpf.version @@ -1 +1 @@ -v0.5.4 +v0.5.5 diff --git a/packaging/version b/packaging/version index d52647a80..11d6ae8ef 100644 --- a/packaging/version +++ b/packaging/version @@ -1 +1 @@ -v1.29.1 +v1.29.2 diff --git a/web/api/formatters/rrd2json.c b/web/api/formatters/rrd2json.c index 9168f76eb..d8e248066 100644 --- a/web/api/formatters/rrd2json.c +++ b/web/api/formatters/rrd2json.c @@ -32,6 +32,30 @@ void free_context_param_list(struct context_param **param_list) *param_list = NULL; } +void rebuild_context_param_list(struct context_param *context_param_list, time_t after_requested) +{ + RRDDIM *temp_rd = context_param_list->rd; + RRDDIM *new_rd_list = NULL, *t; + while (temp_rd) { + t = temp_rd->next; + if (rrdset_last_entry_t(temp_rd->rrdset) >= after_requested) { + temp_rd->next = new_rd_list; + new_rd_list = temp_rd; + } else { + freez((char *)temp_rd->id); + freez((char *)temp_rd->name); +#ifdef ENABLE_DBENGINE + if (temp_rd->rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE) + freez(temp_rd->state->metric_uuid); +#endif + freez(temp_rd->state); + freez(temp_rd); + } + temp_rd = t; + } + context_param_list->rd = new_rd_list; +}; + void build_context_param_list(struct context_param **param_list, RRDSET *st) { if (unlikely(!param_list || !st)) @@ -193,7 +217,6 @@ int rrdset2anything_api_v1( time_t last_accessed_time = now_realtime_sec(); st->last_accessed_time = last_accessed_time; - RRDDIM *temp_rd = context_param_list ? context_param_list->rd : NULL; RRDR *r = rrd2rrdr(st, points, after, before, group_method, group_time, options, dimensions?buffer_tostring(dimensions):NULL, context_param_list); if(!r) { @@ -201,6 +224,8 @@ int rrdset2anything_api_v1( return HTTP_RESP_INTERNAL_SERVER_ERROR; } + RRDDIM *temp_rd = context_param_list ? context_param_list->rd : NULL; + if(r->result_options & RRDR_RESULT_OPTION_RELATIVE) buffer_no_cacheable(wb); else if(r->result_options & RRDR_RESULT_OPTION_ABSOLUTE) diff --git a/web/api/formatters/rrd2json.h b/web/api/formatters/rrd2json.h index 1f929c494..3dc598973 100644 --- a/web/api/formatters/rrd2json.h +++ b/web/api/formatters/rrd2json.h @@ -86,6 +86,7 @@ extern int rrdset2value_api_v1( ); extern void build_context_param_list(struct context_param **param_list, RRDSET *st); +extern void rebuild_context_param_list(struct context_param *context_param_list, time_t after_requested); extern void free_context_param_list(struct context_param **param_list); #endif /* NETDATA_RRD2JSON_H */ diff --git a/web/api/queries/query.c b/web/api/queries/query.c index 3b9077cd6..663e4bd14 100644 --- a/web/api/queries/query.c +++ b/web/api/queries/query.c @@ -1591,6 +1591,9 @@ RRDR *rrd2rrdr( if (first_entry_t > after_requested) first_entry_t = after_requested; + if (context_param_list) + rebuild_context_param_list(context_param_list, after_requested); + #ifdef ENABLE_DBENGINE if (st->rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE) { struct rrdeng_region_info *region_info_array; diff --git a/web/api/queries/rrdr.c b/web/api/queries/rrdr.c index 6cd0c0b14..ef237fa02 100644 --- a/web/api/queries/rrdr.c +++ b/web/api/queries/rrdr.c @@ -130,8 +130,8 @@ RRDR *rrdr_create(struct rrdset *st, long n, struct context_param *context_param // set the hidden flag on hidden dimensions int c; - for(c = 0, rd = temp_rd?temp_rd:st->dimensions ; rd ; c++, rd = rd->next) { - if(unlikely(rrddim_flag_check(rd, RRDDIM_FLAG_HIDDEN))) + for (c = 0, rd = temp_rd ? temp_rd : st->dimensions; rd; c++, rd = rd->next) { + if (unlikely(rrddim_flag_check(rd, RRDDIM_FLAG_HIDDEN))) r->od[c] = RRDR_DIMENSION_HIDDEN; else r->od[c] = RRDR_DIMENSION_DEFAULT; diff --git a/web/gui/README.md b/web/gui/README.md index c13f3d6cb..166cea7b2 100644 --- a/web/gui/README.md +++ b/web/gui/README.md @@ -29,7 +29,7 @@ behind an [Nginx proxy](https://learn.netdata.cloud/docs/agent/running-behind-ng Beyond charts, the local dashboard can be broken down into three key areas: 1. [**Sections**](#sections) -2. [**Time & date picker](#time--date-picker) +2. [**Time & date picker**](#time--date-picker) 3. [**Metrics menus/submenus**](#metrics-menus) 4. [**Netdata Cloud menus: Spaces, War Rooms, and Visited nodes)**](#cloud-menus-spaces-war-rooms-and-visited-nodes) |