diff options
Diffstat (limited to 'src/daemon')
31 files changed, 681 insertions, 880 deletions
diff --git a/src/daemon/README.md b/src/daemon/README.md index bc2ec7757..da70f41e3 100644 --- a/src/daemon/README.md +++ b/src/daemon/README.md @@ -1,8 +1,8 @@ # Netdata daemon -The Netdata daemon is practically a synonym for the Netdata Agent, as it controls its -entire operation. We support various methods to -[start, stop, or restart the daemon](/packaging/installer/README.md#maintaining-a-netdata-agent-installation). +The Netdata daemon is practically a synonym for the Netdata Agent, as it controls its +entire operation. We support various methods to +[start, stop, or restart the daemon](/docs/netdata-agent/start-stop-restart.md). This document provides some basic information on the command line options, log files, and how to debug and troubleshoot @@ -104,9 +104,6 @@ The command line options of the Netdata 1.10.0 version are the following: -W simple-pattern pattern string Check if string matches pattern and exit. - -W "claim -token=TOKEN -rooms=ROOM1,ROOM2 url=https://app.netdata.cloud" - Connect the agent to the workspace Rooms pointed to by TOKEN and ROOM*. - Signals netdata handles: - HUP Close and reopen log files. @@ -119,10 +116,10 @@ You can send commands during runtime via [netdatacli](/src/cli/README.md). Netdata uses 4 log files: -1. `error.log` -2. `collector.log` -3. `access.log` -4. `debug.log` +1. `error.log` +2. `collector.log` +3. `access.log` +4. `debug.log` Any of them can be disabled by setting it to `/dev/null` or `none` in `netdata.conf`. By default `error.log`, `collector.log`, and `access.log` are enabled. `debug.log` is only enabled if debugging/tracing is also enabled @@ -136,8 +133,8 @@ The `error.log` is the `stderr` of the `netdata` daemon . For most Netdata programs (including standard external plugins shipped by netdata), the following lines may appear: -| tag | description | -|:-:|:----------| +| tag | description | +|:-------:|:--------------------------------------------------------------------------------------------------------------------------| | `INFO` | Something important the user should know. | | `ERROR` | Something that might disable a part of netdata.<br/>The log line includes `errno` (if it is not zero). | | `FATAL` | Something prevented a program from running.<br/>The log line includes `errno` (if it is not zero) and the program exited. | @@ -163,21 +160,21 @@ Data stored inside this file follows pattern already described for `error.log`. The `access.log` logs web requests. The format is: -```txt +```text DATE: ID: (sent/all = SENT_BYTES/ALL_BYTES bytes PERCENT_COMPRESSION%, prep/sent/total PREP_TIME/SENT_TIME/TOTAL_TIME ms): ACTION CODE URL ``` where: -- `ID` is the client ID. Client IDs are auto-incremented every time a client connects to netdata. -- `SENT_BYTES` is the number of bytes sent to the client, without the HTTP response header. -- `ALL_BYTES` is the number of bytes of the response, before compression. -- `PERCENT_COMPRESSION` is the percentage of traffic saved due to compression. -- `PREP_TIME` is the time in milliseconds needed to prepared the response. -- `SENT_TIME` is the time in milliseconds needed to sent the response to the client. -- `TOTAL_TIME` is the total time the request was inside Netdata (from the first byte of the request to the last byte +- `ID` is the client ID. Client IDs are auto-incremented every time a client connects to netdata. +- `SENT_BYTES` is the number of bytes sent to the client, without the HTTP response header. +- `ALL_BYTES` is the number of bytes of the response, before compression. +- `PERCENT_COMPRESSION` is the percentage of traffic saved due to compression. +- `PREP_TIME` is the time in milliseconds needed to prepared the response. +- `SENT_TIME` is the time in milliseconds needed to sent the response to the client. +- `TOTAL_TIME` is the total time the request was inside Netdata (from the first byte of the request to the last byte of the response). -- `ACTION` can be `filecopy`, `options` (used in CORS), `data` (API call). +- `ACTION` can be `filecopy`, `options` (used in CORS), `data` (API call). ### debug.log @@ -194,20 +191,20 @@ issues with gaps in charts on busy systems while still keeping the impact on the You can set Netdata scheduling policy in `netdata.conf`, like this: -```conf +```text [global] process scheduling policy = idle ``` You can use the following: -| policy | description | -| :-----------------------: | :---------- | -| `idle` | use CPU only when there is spare - this is lower than nice 19 - it is the default for Netdata and it is so low that Netdata will run in "slow motion" under extreme system load, resulting in short (1-2 seconds) gaps at the charts. | +| policy | description | +|:-------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `idle` | use CPU only when there is spare - this is lower than nice 19 - it is the default for Netdata and it is so low that Netdata will run in "slow motion" under extreme system load, resulting in short (1-2 seconds) gaps at the charts. | | `other`<br/>or<br/>`nice` | this is the default policy for all processes under Linux. It provides dynamic priorities based on the `nice` level of each process. Check below for setting this `nice` level for netdata. | -| `batch` | This policy is similar to `other` in that it schedules the thread according to its dynamic priority (based on the `nice` value). The difference is that this policy will cause the scheduler to always assume that the thread is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wake-up behavior, so that this thread is mildly disfavored in scheduling decisions. | -| `fifo` | `fifo` can be used only with static priorities higher than 0, which means that when a `fifo` threads becomes runnable, it will always immediately preempt any currently running `other`, `batch`, or `idle` thread. `fifo` is a simple scheduling algorithm without time slicing. | -| `rr` | a simple enhancement of `fifo`. Everything described above for `fifo` also applies to `rr`, except that each thread is allowed to run only for a maximum time quantum. | +| `batch` | This policy is similar to `other` in that it schedules the thread according to its dynamic priority (based on the `nice` value). The difference is that this policy will cause the scheduler to always assume that the thread is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wake-up behavior, so that this thread is mildly disfavored in scheduling decisions. | +| `fifo` | `fifo` can be used only with static priorities higher than 0, which means that when a `fifo` threads becomes runnable, it will always immediately preempt any currently running `other`, `batch`, or `idle` thread. `fifo` is a simple scheduling algorithm without time slicing. | +| `rr` | a simple enhancement of `fifo`. Everything described above for `fifo` also applies to `rr`, except that each thread is allowed to run only for a maximum time quantum. | | `keep`<br/>or<br/>`none` | do not set scheduling policy, priority or nice level - i.e. keep running with whatever it is set already (e.g. by systemd). | For more information see `man sched`. @@ -216,7 +213,7 @@ For more information see `man sched`. Once the policy is set to one of `rr` or `fifo`, the following will appear: -```conf +```text [global] process scheduling priority = 0 ``` @@ -228,7 +225,7 @@ important. When the policy is set to `other`, `nice`, or `batch`, the following will appear: -```conf +```text [global] process nice level = 19 ``` @@ -262,7 +259,7 @@ Run `systemctl daemon-reload` to reload these changes. Now, tell Netdata to keep these settings, as set by systemd, by editing `netdata.conf` and setting: -```conf +```text [global] process scheduling policy = keep ``` @@ -275,24 +272,20 @@ will be maintained by netdata. On a system that is not based on systemd, to make Netdata run with nice level -1 (a little bit higher to the default for all programs), edit `netdata.conf` and set: -```conf +```text [global] process scheduling policy = other process nice level = -1 ``` -then execute this to [restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation): - -```sh -sudo systemctl restart netdata -``` +then [restart Netdata](/docs/netdata-agent/start-stop-restart.md): #### Example 2: Netdata with nice -1 on systemd systems On a system that is based on systemd, to make Netdata run with nice level -1 (a little bit higher to the default for all programs), edit `netdata.conf` and set: -```conf +```text [global] process scheduling policy = keep ``` @@ -335,7 +328,7 @@ will roughly get the number of threads running. The system does this for speed. Having a separate memory arena for each thread, allows the threads to run in parallel in multi-core systems, without any locks between them. -This behaviour is system specific. For example, the chart above when running +This behavior is system specific. For example, the chart above when running Netdata on Alpine Linux (that uses **musl** instead of **glibc**) is this: ![image](https://cloud.githubusercontent.com/assets/2662304/19013807/7cf5878e-87e4-11e6-9651-082e68701eab.png) @@ -367,9 +360,9 @@ accounts the whole pages, even if parts of them are actually used). When you compile Netdata with debugging: -1. compiler optimizations for your CPU are disabled (Netdata will run somewhat slower) +1. compiler optimizations for your CPU are disabled (Netdata will run somewhat slower) -2. a lot of code is added all over netdata, to log debug messages to `/var/log/netdata/debug.log`. However, nothing is +2. a lot of code is added all over netdata, to log debug messages to `/var/log/netdata/debug.log`. However, nothing is printed by default. Netdata allows you to select which sections of Netdata you want to trace. Tracing is activated via the config option `debug flags`. It accepts a hex number, to enable or disable specific sections. You can find the options supported at [log.h](https://raw.githubusercontent.com/netdata/netdata/master/src/libnetdata/log/log.h). @@ -407,9 +400,9 @@ To provide stack traces, **you need to have Netdata compiled with debugging**. T Then you need to be in one of the following 2 cases: -1. Netdata crashes and you have a core dump +1. Netdata crashes and you have a core dump -2. you can reproduce the crash +2. you can reproduce the crash If you are not on these cases, you need to find a way to be (i.e. if your system does not produce core dumps, check your distro documentation to enable them). diff --git a/src/daemon/analytics.c b/src/daemon/analytics.c index 0e5c221c4..cebfdeb70 100644 --- a/src/daemon/analytics.c +++ b/src/daemon/analytics.c @@ -334,7 +334,7 @@ void analytics_alarms_notifications(void) if (instance) { char line[200 + 1]; - while (fgets(line, 200, instance->child_stdout_fp) != NULL) { + while (fgets(line, 200, spawn_popen_stdout(instance)) != NULL) { char *end = line; while (*end && *end != '\n') end++; @@ -375,7 +375,6 @@ static void analytics_get_install_type(struct rrdhost_system_info *system_info) void analytics_https(void) { BUFFER *b = buffer_create(30, NULL); -#ifdef ENABLE_HTTPS analytics_exporting_connectors_ssl(b); buffer_strcat(b, netdata_ssl_streaming_sender_ctx && @@ -383,9 +382,6 @@ void analytics_https(void) SSL_connection(&localhost->sender->ssl) ? "streaming|" : "|"); buffer_strcat(b, netdata_ssl_web_server_ctx ? "web" : ""); -#else - buffer_strcat(b, "||"); -#endif analytics_set_data_str(&analytics_data.netdata_config_https_available, (char *)buffer_tostring(b)); buffer_free(b); @@ -468,13 +464,8 @@ void analytics_alarms(void) */ void analytics_misc(void) { -#ifdef ENABLE_ACLK analytics_set_data(&analytics_data.netdata_host_cloud_available, "true"); analytics_set_data_str(&analytics_data.netdata_host_aclk_implementation, "Next Generation"); -#else - analytics_set_data(&analytics_data.netdata_host_cloud_available, "false"); - analytics_set_data_str(&analytics_data.netdata_host_aclk_implementation, ""); -#endif analytics_data.exporting_enabled = appconfig_get_boolean(&exporting_config, CONFIG_SECTION_EXPORTING, "enabled", CONFIG_BOOLEAN_NO); analytics_set_data(&analytics_data.netdata_config_exporting_enabled, analytics_data.exporting_enabled ? "true" : "false"); @@ -495,13 +486,11 @@ void analytics_misc(void) void analytics_aclk(void) { -#ifdef ENABLE_ACLK - if (aclk_connected) { + if (aclk_online()) { analytics_set_data(&analytics_data.netdata_host_aclk_available, "true"); analytics_set_data_str(&analytics_data.netdata_host_aclk_protocol, "New"); } else -#endif analytics_set_data(&analytics_data.netdata_host_aclk_available, "false"); } @@ -533,11 +522,9 @@ void analytics_gather_mutable_meta_data(void) analytics_alarms_notifications(); analytics_set_data( - &analytics_data.netdata_config_is_parent, (rrdhost_hosts_available() > 1 || configured_as_parent()) ? "true" : "false"); + &analytics_data.netdata_config_is_parent, (rrdhost_hosts_available() > 1 || stream_conf_configured_as_parent()) ? "true" : "false"); - char *claim_id = get_agent_claimid(); - analytics_set_data(&analytics_data.netdata_host_agent_claimed, claim_id ? "true" : "false"); - freez(claim_id); + analytics_set_data(&analytics_data.netdata_host_agent_claimed, is_agent_claimed() ? "true" : "false"); { char b[21]; @@ -582,14 +569,13 @@ void *analytics_main(void *ptr) CLEANUP_FUNCTION_REGISTER(analytics_main_cleanup) cleanup_ptr = ptr; unsigned int sec = 0; heartbeat_t hb; - heartbeat_init(&hb); - usec_t step_ut = USEC_PER_SEC; + heartbeat_init(&hb, USEC_PER_SEC); netdata_log_debug(D_ANALYTICS, "Analytics thread starts"); - //first delay after agent start + // first delay after agent start while (service_running(SERVICE_ANALYTICS) && likely(sec <= ANALYTICS_INIT_SLEEP_SEC)) { - heartbeat_next(&hb, step_ut); + heartbeat_next(&hb); sec++; } @@ -605,8 +591,8 @@ void *analytics_main(void *ptr) sec = 0; while (1) { - heartbeat_next(&hb, step_ut * 2); - sec += 2; + heartbeat_next(&hb); + sec++; if (unlikely(!service_running(SERVICE_ANALYTICS))) break; @@ -627,46 +613,15 @@ cleanup: return NULL; } -static const char *verify_required_directory(const char *dir) -{ - if (chdir(dir) == -1) - fatal("Cannot change directory to '%s'", dir); - - DIR *d = opendir(dir); - if (!d) - fatal("Cannot examine the contents of directory '%s'", dir); - closedir(d); - - return dir; -} - -static const char *verify_or_create_required_directory(const char *dir) { - int result; - - result = mkdir(dir, 0755); - - if (result != 0 && errno != EEXIST) - fatal("Cannot create required directory '%s'", dir); - - return verify_required_directory(dir); -} - /* * This is called after the rrdinit * These values will be sent on the START event */ -void set_late_global_environment(struct rrdhost_system_info *system_info) +void set_late_analytics_variables(struct rrdhost_system_info *system_info) { - analytics_set_data(&analytics_data.netdata_config_stream_enabled, default_rrdpush_enabled ? "true" : "false"); + analytics_set_data(&analytics_data.netdata_config_stream_enabled, stream_conf_send_enabled ? "true" : "false"); analytics_set_data_str(&analytics_data.netdata_config_memory_mode, (char *)rrd_memory_mode_name(default_rrd_memory_mode)); - -#ifdef DISABLE_CLOUD - analytics_set_data(&analytics_data.netdata_host_cloud_enabled, "false"); -#else - analytics_set_data( - &analytics_data.netdata_host_cloud_enabled, - appconfig_get_boolean_ondemand(&cloud_config, CONFIG_SECTION_GLOBAL, "enabled", netdata_cloud_enabled) ? "true" : "false"); -#endif + analytics_set_data(&analytics_data.netdata_host_cloud_enabled, "true"); #ifdef ENABLE_DBENGINE { @@ -679,11 +634,7 @@ void set_late_global_environment(struct rrdhost_system_info *system_info) } #endif -#ifdef ENABLE_HTTPS analytics_set_data(&analytics_data.netdata_config_https_enabled, "true"); -#else - analytics_set_data(&analytics_data.netdata_config_https_enabled, "false"); -#endif if (web_server_mode == WEB_SERVER_MODE_NONE) analytics_set_data(&analytics_data.netdata_config_web_enabled, "false"); @@ -831,119 +782,6 @@ void get_system_timezone(void) } } -void set_global_environment() { - { - char b[16]; - snprintfz(b, sizeof(b) - 1, "%d", default_rrd_update_every); - setenv("NETDATA_UPDATE_EVERY", b, 1); - } - - setenv("NETDATA_VERSION", NETDATA_VERSION, 1); - setenv("NETDATA_HOSTNAME", netdata_configured_hostname, 1); - setenv("NETDATA_CONFIG_DIR", verify_required_directory(netdata_configured_user_config_dir), 1); - setenv("NETDATA_USER_CONFIG_DIR", verify_required_directory(netdata_configured_user_config_dir), 1); - setenv("NETDATA_STOCK_CONFIG_DIR", verify_required_directory(netdata_configured_stock_config_dir), 1); - setenv("NETDATA_PLUGINS_DIR", verify_required_directory(netdata_configured_primary_plugins_dir), 1); - setenv("NETDATA_WEB_DIR", verify_required_directory(netdata_configured_web_dir), 1); - setenv("NETDATA_CACHE_DIR", verify_or_create_required_directory(netdata_configured_cache_dir), 1); - setenv("NETDATA_LIB_DIR", verify_or_create_required_directory(netdata_configured_varlib_dir), 1); - setenv("NETDATA_LOCK_DIR", verify_or_create_required_directory(netdata_configured_lock_dir), 1); - setenv("NETDATA_LOG_DIR", verify_or_create_required_directory(netdata_configured_log_dir), 1); - setenv("NETDATA_HOST_PREFIX", netdata_configured_host_prefix, 1); - - { - BUFFER *user_plugins_dirs = buffer_create(FILENAME_MAX, NULL); - - for (size_t i = 1; i < PLUGINSD_MAX_DIRECTORIES && plugin_directories[i]; i++) { - if (i > 1) - buffer_strcat(user_plugins_dirs, " "); - buffer_strcat(user_plugins_dirs, plugin_directories[i]); - } - - setenv("NETDATA_USER_PLUGINS_DIRS", buffer_tostring(user_plugins_dirs), 1); - - buffer_free(user_plugins_dirs); - } - - analytics_data.data_length = 0; - analytics_set_data(&analytics_data.netdata_config_stream_enabled, "null"); - analytics_set_data(&analytics_data.netdata_config_memory_mode, "null"); - analytics_set_data(&analytics_data.netdata_config_exporting_enabled, "null"); - analytics_set_data(&analytics_data.netdata_exporting_connectors, "null"); - analytics_set_data(&analytics_data.netdata_allmetrics_prometheus_used, "null"); - analytics_set_data(&analytics_data.netdata_allmetrics_shell_used, "null"); - analytics_set_data(&analytics_data.netdata_allmetrics_json_used, "null"); - analytics_set_data(&analytics_data.netdata_dashboard_used, "null"); - analytics_set_data(&analytics_data.netdata_collectors, "null"); - analytics_set_data(&analytics_data.netdata_collectors_count, "null"); - analytics_set_data(&analytics_data.netdata_buildinfo, "null"); - analytics_set_data(&analytics_data.netdata_config_page_cache_size, "null"); - analytics_set_data(&analytics_data.netdata_config_multidb_disk_quota, "null"); - analytics_set_data(&analytics_data.netdata_config_https_enabled, "null"); - analytics_set_data(&analytics_data.netdata_config_web_enabled, "null"); - analytics_set_data(&analytics_data.netdata_config_release_channel, "null"); - analytics_set_data(&analytics_data.netdata_mirrored_host_count, "null"); - analytics_set_data(&analytics_data.netdata_mirrored_hosts_reachable, "null"); - analytics_set_data(&analytics_data.netdata_mirrored_hosts_unreachable, "null"); - analytics_set_data(&analytics_data.netdata_notification_methods, "null"); - analytics_set_data(&analytics_data.netdata_alarms_normal, "null"); - analytics_set_data(&analytics_data.netdata_alarms_warning, "null"); - analytics_set_data(&analytics_data.netdata_alarms_critical, "null"); - analytics_set_data(&analytics_data.netdata_charts_count, "null"); - analytics_set_data(&analytics_data.netdata_metrics_count, "null"); - analytics_set_data(&analytics_data.netdata_config_is_parent, "null"); - analytics_set_data(&analytics_data.netdata_config_hosts_available, "null"); - analytics_set_data(&analytics_data.netdata_host_cloud_available, "null"); - analytics_set_data(&analytics_data.netdata_host_aclk_implementation, "null"); - analytics_set_data(&analytics_data.netdata_host_aclk_available, "null"); - analytics_set_data(&analytics_data.netdata_host_aclk_protocol, "null"); - analytics_set_data(&analytics_data.netdata_host_agent_claimed, "null"); - analytics_set_data(&analytics_data.netdata_host_cloud_enabled, "null"); - analytics_set_data(&analytics_data.netdata_config_https_available, "null"); - analytics_set_data(&analytics_data.netdata_install_type, "null"); - analytics_set_data(&analytics_data.netdata_config_is_private_registry, "null"); - analytics_set_data(&analytics_data.netdata_config_use_private_registry, "null"); - analytics_set_data(&analytics_data.netdata_config_oom_score, "null"); - analytics_set_data(&analytics_data.netdata_prebuilt_distro, "null"); - analytics_set_data(&analytics_data.netdata_fail_reason, "null"); - - analytics_data.prometheus_hits = 0; - analytics_data.shell_hits = 0; - analytics_data.json_hits = 0; - analytics_data.dashboard_hits = 0; - analytics_data.charts_count = 0; - analytics_data.metrics_count = 0; - analytics_data.exporting_enabled = false; - - char *default_port = appconfig_get(&netdata_config, CONFIG_SECTION_WEB, "default port", NULL); - int clean = 0; - if (!default_port) { - default_port = strdupz("19999"); - clean = 1; - } - - setenv("NETDATA_LISTEN_PORT", default_port, 1); - if (clean) - freez(default_port); - - // set the path we need - char path[4096], *p = getenv("PATH"); - if (!p) p = "/bin:/usr/bin"; - snprintfz(path, sizeof(path), "%s:%s", p, "/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin"); - setenv("PATH", config_get(CONFIG_SECTION_ENV_VARS, "PATH", path), 1); - - // python options - p = getenv("PYTHONPATH"); - if (!p) p = ""; - setenv("PYTHONPATH", config_get(CONFIG_SECTION_ENV_VARS, "PYTHONPATH", p), 1); - - // disable buffering for python plugins - setenv("PYTHONUNBUFFERED", "1", 1); - - // switch to standard locale for plugins - setenv("LC_ALL", "C", 1); -} - void analytics_statistic_send(const analytics_statistic_t *statistic) { if (!statistic) return; @@ -1053,7 +891,7 @@ void analytics_statistic_send(const analytics_statistic_t *statistic) { POPEN_INSTANCE *instance = spawn_popen_run(command_to_run); if (instance) { char buffer[4 + 1]; - char *s = fgets(buffer, 4, instance->child_stdout_fp); + char *s = fgets(buffer, 4, spawn_popen_stdout(instance)); int exit_code = spawn_popen_wait(instance); if (exit_code) @@ -1075,6 +913,58 @@ void analytics_statistic_send(const analytics_statistic_t *statistic) { freez(command_to_run); } +void analytics_reset(void) { + analytics_data.data_length = 0; + analytics_set_data(&analytics_data.netdata_config_stream_enabled, "null"); + analytics_set_data(&analytics_data.netdata_config_memory_mode, "null"); + analytics_set_data(&analytics_data.netdata_config_exporting_enabled, "null"); + analytics_set_data(&analytics_data.netdata_exporting_connectors, "null"); + analytics_set_data(&analytics_data.netdata_allmetrics_prometheus_used, "null"); + analytics_set_data(&analytics_data.netdata_allmetrics_shell_used, "null"); + analytics_set_data(&analytics_data.netdata_allmetrics_json_used, "null"); + analytics_set_data(&analytics_data.netdata_dashboard_used, "null"); + analytics_set_data(&analytics_data.netdata_collectors, "null"); + analytics_set_data(&analytics_data.netdata_collectors_count, "null"); + analytics_set_data(&analytics_data.netdata_buildinfo, "null"); + analytics_set_data(&analytics_data.netdata_config_page_cache_size, "null"); + analytics_set_data(&analytics_data.netdata_config_multidb_disk_quota, "null"); + analytics_set_data(&analytics_data.netdata_config_https_enabled, "null"); + analytics_set_data(&analytics_data.netdata_config_web_enabled, "null"); + analytics_set_data(&analytics_data.netdata_config_release_channel, "null"); + analytics_set_data(&analytics_data.netdata_mirrored_host_count, "null"); + analytics_set_data(&analytics_data.netdata_mirrored_hosts_reachable, "null"); + analytics_set_data(&analytics_data.netdata_mirrored_hosts_unreachable, "null"); + analytics_set_data(&analytics_data.netdata_notification_methods, "null"); + analytics_set_data(&analytics_data.netdata_alarms_normal, "null"); + analytics_set_data(&analytics_data.netdata_alarms_warning, "null"); + analytics_set_data(&analytics_data.netdata_alarms_critical, "null"); + analytics_set_data(&analytics_data.netdata_charts_count, "null"); + analytics_set_data(&analytics_data.netdata_metrics_count, "null"); + analytics_set_data(&analytics_data.netdata_config_is_parent, "null"); + analytics_set_data(&analytics_data.netdata_config_hosts_available, "null"); + analytics_set_data(&analytics_data.netdata_host_cloud_available, "null"); + analytics_set_data(&analytics_data.netdata_host_aclk_implementation, "null"); + analytics_set_data(&analytics_data.netdata_host_aclk_available, "null"); + analytics_set_data(&analytics_data.netdata_host_aclk_protocol, "null"); + analytics_set_data(&analytics_data.netdata_host_agent_claimed, "null"); + analytics_set_data(&analytics_data.netdata_host_cloud_enabled, "null"); + analytics_set_data(&analytics_data.netdata_config_https_available, "null"); + analytics_set_data(&analytics_data.netdata_install_type, "null"); + analytics_set_data(&analytics_data.netdata_config_is_private_registry, "null"); + analytics_set_data(&analytics_data.netdata_config_use_private_registry, "null"); + analytics_set_data(&analytics_data.netdata_config_oom_score, "null"); + analytics_set_data(&analytics_data.netdata_prebuilt_distro, "null"); + analytics_set_data(&analytics_data.netdata_fail_reason, "null"); + + analytics_data.prometheus_hits = 0; + analytics_data.shell_hits = 0; + analytics_data.json_hits = 0; + analytics_data.dashboard_hits = 0; + analytics_data.charts_count = 0; + analytics_data.metrics_count = 0; + analytics_data.exporting_enabled = false; +} + void analytics_init(void) { spinlock_init(&analytics_data.spinlock); diff --git a/src/daemon/analytics.h b/src/daemon/analytics.h index 747cf6070..b1d3c1386 100644 --- a/src/daemon/analytics.h +++ b/src/daemon/analytics.h @@ -76,9 +76,9 @@ struct analytics_data { bool exporting_enabled; }; -void set_late_global_environment(struct rrdhost_system_info *system_info); +struct rrdhost_system_info; +void set_late_analytics_variables(struct rrdhost_system_info *system_info); void analytics_free_data(void); -void set_global_environment(void); void analytics_log_shell(void); void analytics_log_json(void); void analytics_log_prometheus(void); @@ -86,6 +86,7 @@ void analytics_log_dashboard(void); void analytics_gather_mutable_meta_data(void); void analytics_report_oom_score(long long int score); void get_system_timezone(void); +void analytics_reset(void); void analytics_init(void); typedef struct { diff --git a/src/daemon/buildinfo.c b/src/daemon/buildinfo.c index ace96199a..3cbbe9035 100644 --- a/src/daemon/buildinfo.c +++ b/src/daemon/buildinfo.c @@ -1069,18 +1069,8 @@ __attribute__((constructor)) void initialize_build_info(void) { #endif #endif -#ifdef ENABLE_ACLK build_info_set_status(BIB_FEATURE_CLOUD, true); build_info_set_status(BIB_CONNECTIVITY_ACLK, true); -#else - build_info_set_status(BIB_FEATURE_CLOUD, false); -#ifdef DISABLE_CLOUD - build_info_set_value(BIB_FEATURE_CLOUD, "disabled"); -#else - build_info_set_value(BIB_FEATURE_CLOUD, "unavailable"); -#endif -#endif - build_info_set_status(BIB_FEATURE_HEALTH, true); build_info_set_status(BIB_FEATURE_STREAMING, true); build_info_set_status(BIB_FEATURE_BACKFILLING, true); @@ -1126,9 +1116,7 @@ __attribute__((constructor)) void initialize_build_info(void) { #ifdef ENABLE_WEBRTC build_info_set_status(BIB_CONNECTIVITY_WEBRTC, true); #endif -#ifdef ENABLE_HTTPS build_info_set_status(BIB_CONNECTIVITY_NATIVE_HTTPS, true); -#endif #if defined(HAVE_X509_VERIFY_PARAM_set1_host) && HAVE_X509_VERIFY_PARAM_set1_host == 1 build_info_set_status(BIB_CONNECTIVITY_TLS_HOST_VERIFY, true); #endif @@ -1162,9 +1150,7 @@ __attribute__((constructor)) void initialize_build_info(void) { #ifdef HAVE_LIBDATACHANNEL build_info_set_status(BIB_LIB_LIBDATACHANNEL, true); #endif -#ifdef ENABLE_OPENSSL build_info_set_status(BIB_LIB_OPENSSL, true); -#endif #ifdef ENABLE_JSONC build_info_set_status(BIB_LIB_JSONC, true); #endif @@ -1345,7 +1331,8 @@ char *get_value_from_key(char *buffer, char *key) { return s; } -void get_install_type(char **install_type, char **prebuilt_arch, char **prebuilt_dist) { +void get_install_type(char **install_type, char **prebuilt_arch __maybe_unused, char **prebuilt_dist __maybe_unused) { +#ifndef OS_WINDOWS char *install_type_filename; int install_type_filename_len = (strlen(netdata_configured_user_config_dir) + strlen(".install-type") + 3); @@ -1368,6 +1355,9 @@ void get_install_type(char **install_type, char **prebuilt_arch, char **prebuilt fclose(fp); } freez(install_type_filename); +#else + *install_type = strdupz("netdata_installer.exe"); +#endif } static struct { diff --git a/src/daemon/commands.c b/src/daemon/commands.c index f0637ad31..9d716d932 100644 --- a/src/daemon/commands.c +++ b/src/daemon/commands.c @@ -47,9 +47,7 @@ static cmd_status_t cmd_ping_execute(char *args, char **message); static cmd_status_t cmd_aclk_state(char *args, char **message); static cmd_status_t cmd_version(char *args, char **message); static cmd_status_t cmd_dumpconfig(char *args, char **message); -#ifdef ENABLE_ACLK static cmd_status_t cmd_remove_node(char *args, char **message); -#endif static command_info_t command_info_array[] = { {"help", cmd_help_execute, CMD_TYPE_HIGH_PRIORITY}, // show help menu @@ -65,9 +63,7 @@ static command_info_t command_info_array[] = { {"aclk-state", cmd_aclk_state, CMD_TYPE_ORTHOGONAL}, {"version", cmd_version, CMD_TYPE_ORTHOGONAL}, {"dumpconfig", cmd_dumpconfig, CMD_TYPE_ORTHOGONAL}, -#ifdef ENABLE_ACLK {"remove-stale-node", cmd_remove_node, CMD_TYPE_ORTHOGONAL} -#endif }; /* Mutexes for commands of type CMD_TYPE_ORTHOGONAL */ @@ -119,8 +115,6 @@ static cmd_status_t cmd_help_execute(char *args, char **message) " Reload health configuration.\n\n" "reload-labels\n" " Reload all labels.\n\n" - "save-database\n" - " Save internal DB to disk for memory mode save.\n\n" "reopen-logs\n" " Close and reopen log files.\n\n" "shutdown-agent\n" @@ -135,10 +129,8 @@ static cmd_status_t cmd_help_execute(char *args, char **message) " Returns current state of ACLK and Cloud connection. (optionally in json).\n\n" "dumpconfig\n" " Returns the current netdata.conf on stdout.\n\n" -#ifdef ENABLE_ACLK "remove-stale-node <node_id | machine_guid | hostname | ALL_NODES>\n" " Unregisters and removes a node from the cloud.\n\n" -#endif "version\n" " Returns the netdata version.\n", MAX_COMMAND_LENGTH - 1); @@ -193,17 +185,42 @@ static cmd_status_t cmd_fatal_execute(char *args, char **message) return CMD_STATUS_SUCCESS; } -static cmd_status_t cmd_reload_claiming_state_execute(char *args, char **message) -{ - (void)args; - (void)message; -#if defined(DISABLE_CLOUD) || !defined(ENABLE_ACLK) - netdata_log_info("The claiming feature has been explicitly disabled"); - *message = strdupz("This agent cannot be claimed, it was built without support for Cloud"); - return CMD_STATUS_FAILURE; -#endif - netdata_log_info("COMMAND: Reloading Agent Claiming configuration."); - claim_reload_all(); +static cmd_status_t cmd_reload_claiming_state_execute(char *args __maybe_unused, char **message) { + char msg[1024]; + + CLOUD_STATUS status = claim_reload_and_wait_online(); + switch(status) { + case CLOUD_STATUS_ONLINE: + snprintfz(msg, sizeof(msg), + "Netdata Agent is claimed to Netdata Cloud and is currently online."); + break; + + case CLOUD_STATUS_BANNED: + snprintfz(msg, sizeof(msg), + "Netdata Agent is claimed to Netdata Cloud, but it is banned."); + break; + + default: + case CLOUD_STATUS_AVAILABLE: + snprintfz(msg, sizeof(msg), + "Netdata Agent is not claimed to Netdata Cloud: %s", + claim_agent_failure_reason_get()); + break; + + case CLOUD_STATUS_OFFLINE: + snprintfz(msg, sizeof(msg), + "Netdata Agent is claimed to Netdata Cloud, but it is currently offline: %s", + cloud_status_aclk_offline_reason()); + break; + + case CLOUD_STATUS_INDIRECT: + snprintfz(msg, sizeof(msg), + "Netdata Agent is not claimed to Netdata Cloud, but it is currently online via parent."); + break; + } + + *message = strdupz(msg); + return CMD_STATUS_SUCCESS; } @@ -242,9 +259,8 @@ static cmd_status_t cmd_read_config_execute(char *args, char **message) const char *conf_file = temp; /* "cloud" is cloud.conf, otherwise netdata.conf */ struct config *tmp_config = strcmp(conf_file, "cloud") ? &netdata_config : &cloud_config; - char *value = appconfig_get(tmp_config, temp + offset + 1, temp + offset2 + 1, NULL); - if (value == NULL) - { + const char *value = appconfig_get(tmp_config, temp + offset + 1, temp + offset2 + 1, NULL); + if (value == NULL) { netdata_log_error("Cannot execute read-config conf_file=%s section=%s / key=%s because no value set", conf_file, temp + offset + 1, @@ -252,13 +268,11 @@ static cmd_status_t cmd_read_config_execute(char *args, char **message) freez(temp); return CMD_STATUS_FAILURE; } - else - { + else { (*message) = strdupz(value); freez(temp); return CMD_STATUS_SUCCESS; } - } static cmd_status_t cmd_write_config_execute(char *args, char **message) @@ -306,17 +320,10 @@ static cmd_status_t cmd_ping_execute(char *args, char **message) static cmd_status_t cmd_aclk_state(char *args, char **message) { netdata_log_info("COMMAND: Reopening aclk/cloud state."); -#ifdef ENABLE_ACLK if (strstr(args, "json")) *message = aclk_state_json(); else *message = aclk_state(); -#else - if (strstr(args, "json")) - *message = strdupz("{\"aclk-available\":false}"); - else - *message = strdupz("ACLK Available: No"); -#endif return CMD_STATUS_SUCCESS; } @@ -338,14 +345,12 @@ static cmd_status_t cmd_dumpconfig(char *args, char **message) (void)args; BUFFER *wb = buffer_create(1024, NULL); - config_generate(wb, 0); + netdata_conf_generate(wb, 0); *message = strdupz(buffer_tostring(wb)); buffer_free(wb); return CMD_STATUS_SUCCESS; } -#ifdef ENABLE_ACLK - static int remove_ephemeral_host(BUFFER *wb, RRDHOST *host, bool report_error) { if (host == localhost) { @@ -362,11 +367,10 @@ static int remove_ephemeral_host(BUFFER *wb, RRDHOST *host, bool report_error) if (!rrdhost_option_check(host, RRDHOST_OPTION_EPHEMERAL_HOST)) { rrdhost_option_set(host, RRDHOST_OPTION_EPHEMERAL_HOST); - sql_set_host_label(&host->host_uuid, "_is_ephemeral", "true"); + sql_set_host_label(&host->host_id.uuid, "_is_ephemeral", "true"); aclk_host_state_update(host, 0, 0); unregister_node(host->machine_guid); - freez(host->node_id); - host->node_id = NULL; + host->node_id = UUID_ZERO; buffer_sprintf(wb, "Unregistering node with machine guid %s, hostname = %s", host->machine_guid, rrdhost_hostname(host)); rrd_wrlock(); rrdhost_free___while_having_rrd_wrlock(host, true); @@ -438,7 +442,6 @@ done: buffer_free(wb); return CMD_STATUS_SUCCESS; } -#endif static void cmd_lock_exclusive(unsigned index) { @@ -509,7 +512,7 @@ static void pipe_write_cb(uv_write_t* req, int status) static inline void add_char_to_command_reply(BUFFER *reply_string, unsigned *reply_string_size, char character) { - buffer_fast_charcat(reply_string, character); + buffer_putc(reply_string, character); *reply_string_size +=1; } diff --git a/src/daemon/commands.h b/src/daemon/commands.h index 14c2ec49e..8327d28d2 100644 --- a/src/daemon/commands.h +++ b/src/daemon/commands.h @@ -20,9 +20,7 @@ typedef enum cmd { CMD_ACLK_STATE, CMD_VERSION, CMD_DUMPCONFIG, -#ifdef ENABLE_ACLK CMD_REMOVE_NODE, -#endif CMD_TOTAL_COMMANDS } cmd_t; diff --git a/src/daemon/common.c b/src/daemon/common.c deleted file mode 100644 index 6c824eec6..000000000 --- a/src/daemon/common.c +++ /dev/null @@ -1,197 +0,0 @@ -// SPDX-License-Identifier: GPL-3.0-or-later - -#include "common.h" - -char *netdata_configured_hostname = NULL; -char *netdata_configured_user_config_dir = CONFIG_DIR; -char *netdata_configured_stock_config_dir = LIBCONFIG_DIR; -char *netdata_configured_log_dir = LOG_DIR; -char *netdata_configured_primary_plugins_dir = PLUGINS_DIR; -char *netdata_configured_web_dir = WEB_DIR; -char *netdata_configured_cache_dir = CACHE_DIR; -char *netdata_configured_varlib_dir = VARLIB_DIR; -char *netdata_configured_lock_dir = VARLIB_DIR "/lock"; -char *netdata_configured_home_dir = VARLIB_DIR; -char *netdata_configured_host_prefix = NULL; -char *netdata_configured_timezone = NULL; -char *netdata_configured_abbrev_timezone = NULL; -int32_t netdata_configured_utc_offset = 0; - -bool netdata_ready = false; - -#if defined( DISABLE_CLOUD ) || !defined( ENABLE_ACLK ) -int netdata_cloud_enabled = CONFIG_BOOLEAN_NO; -#else -int netdata_cloud_enabled = CONFIG_BOOLEAN_AUTO; -#endif - -long get_netdata_cpus(void) { - static long processors = 0; - - if(processors) - return processors; - - long cores_proc_stat = os_get_system_cpus_cached(false, true); - long cores_cpuset_v1 = (long)os_read_cpuset_cpus("/sys/fs/cgroup/cpuset/cpuset.cpus", cores_proc_stat); - long cores_cpuset_v2 = (long)os_read_cpuset_cpus("/sys/fs/cgroup/cpuset.cpus", cores_proc_stat); - - if(cores_cpuset_v2) - processors = cores_cpuset_v2; - else if(cores_cpuset_v1) - processors = cores_cpuset_v1; - else - processors = cores_proc_stat; - - long cores_user_configured = config_get_number(CONFIG_SECTION_GLOBAL, "cpu cores", processors); - - errno_clear(); - internal_error(true, - "System CPUs: %ld, (" - "system: %ld, cgroups cpuset v1: %ld, cgroups cpuset v2: %ld, netdata.conf: %ld" - ")" - , processors - , cores_proc_stat - , cores_cpuset_v1 - , cores_cpuset_v2 - , cores_user_configured - ); - - processors = cores_user_configured; - - if(processors < 1) - processors = 1; - - return processors; -} - -const char *cloud_status_to_string(CLOUD_STATUS status) { - switch(status) { - default: - case CLOUD_STATUS_UNAVAILABLE: - return "unavailable"; - - case CLOUD_STATUS_AVAILABLE: - return "available"; - - case CLOUD_STATUS_DISABLED: - return "disabled"; - - case CLOUD_STATUS_BANNED: - return "banned"; - - case CLOUD_STATUS_OFFLINE: - return "offline"; - - case CLOUD_STATUS_ONLINE: - return "online"; - } -} - -CLOUD_STATUS cloud_status(void) { -#ifdef ENABLE_ACLK - if(aclk_disable_runtime) - return CLOUD_STATUS_BANNED; - - if(aclk_connected) - return CLOUD_STATUS_ONLINE; - - if(netdata_cloud_enabled == CONFIG_BOOLEAN_YES) { - char *agent_id = get_agent_claimid(); - bool claimed = agent_id != NULL; - freez(agent_id); - - if(claimed) - return CLOUD_STATUS_OFFLINE; - } - - if(netdata_cloud_enabled != CONFIG_BOOLEAN_NO) - return CLOUD_STATUS_AVAILABLE; - - return CLOUD_STATUS_DISABLED; -#else - return CLOUD_STATUS_UNAVAILABLE; -#endif -} - -time_t cloud_last_change(void) { -#ifdef ENABLE_ACLK - time_t ret = MAX(last_conn_time_mqtt, last_disconnect_time); - if(!ret) ret = netdata_start_time; - return ret; -#else - return netdata_start_time; -#endif -} - -time_t cloud_next_connection_attempt(void) { -#ifdef ENABLE_ACLK - return next_connection_attempt; -#else - return 0; -#endif -} - -size_t cloud_connection_id(void) { -#ifdef ENABLE_ACLK - return aclk_connection_counter; -#else - return 0; -#endif -} - -const char *cloud_offline_reason() { -#ifdef ENABLE_ACLK - if(!netdata_cloud_enabled) - return "disabled"; - - if(aclk_disable_runtime) - return "banned"; - - return aclk_status_to_string(); -#else - return "disabled"; -#endif -} - -const char *cloud_base_url() { -#ifdef ENABLE_ACLK - return aclk_cloud_base_url; -#else - return NULL; -#endif -} - -CLOUD_STATUS buffer_json_cloud_status(BUFFER *wb, time_t now_s) { - CLOUD_STATUS status = cloud_status(); - - buffer_json_member_add_object(wb, "cloud"); - { - size_t id = cloud_connection_id(); - time_t last_change = cloud_last_change(); - time_t next_connect = cloud_next_connection_attempt(); - buffer_json_member_add_uint64(wb, "id", id); - buffer_json_member_add_string(wb, "status", cloud_status_to_string(status)); - buffer_json_member_add_time_t(wb, "since", last_change); - buffer_json_member_add_time_t(wb, "age", now_s - last_change); - - if (status != CLOUD_STATUS_ONLINE) - buffer_json_member_add_string(wb, "reason", cloud_offline_reason()); - - if (status == CLOUD_STATUS_OFFLINE && next_connect > now_s) { - buffer_json_member_add_time_t(wb, "next_check", next_connect); - buffer_json_member_add_time_t(wb, "next_in", next_connect - now_s); - } - - if (cloud_base_url()) - buffer_json_member_add_string(wb, "url", cloud_base_url()); - - char *claim_id = get_agent_claimid(); - if(claim_id) { - buffer_json_member_add_string(wb, "claim_id", claim_id); - freez(claim_id); - } - } - buffer_json_object_close(wb); // cloud - - return status; -} diff --git a/src/daemon/common.h b/src/daemon/common.h index 1dea19c5b..9f6efa3ef 100644 --- a/src/daemon/common.h +++ b/src/daemon/common.h @@ -4,36 +4,13 @@ #define NETDATA_COMMON_H 1 #include "libnetdata/libnetdata.h" -#include "event_loop.h" - -// ---------------------------------------------------------------------------- -// shortcuts for the default netdata configuration - -#define config_load(filename, overwrite_used, section) appconfig_load(&netdata_config, filename, overwrite_used, section) -#define config_get(section, name, default_value) appconfig_get(&netdata_config, section, name, default_value) -#define config_get_number(section, name, value) appconfig_get_number(&netdata_config, section, name, value) -#define config_get_float(section, name, value) appconfig_get_float(&netdata_config, section, name, value) -#define config_get_boolean(section, name, value) appconfig_get_boolean(&netdata_config, section, name, value) -#define config_get_boolean_ondemand(section, name, value) appconfig_get_boolean_ondemand(&netdata_config, section, name, value) -#define config_get_duration(section, name, value) appconfig_get_duration(&netdata_config, section, name, value) - -#define config_set(section, name, default_value) appconfig_set(&netdata_config, section, name, default_value) -#define config_set_default(section, name, value) appconfig_set_default(&netdata_config, section, name, value) -#define config_set_number(section, name, value) appconfig_set_number(&netdata_config, section, name, value) -#define config_set_float(section, name, value) appconfig_set_float(&netdata_config, section, name, value) -#define config_set_boolean(section, name, value) appconfig_set_boolean(&netdata_config, section, name, value) - -#define config_exists(section, name) appconfig_exists(&netdata_config, section, name) -#define config_move(section_old, name_old, section_new, name_new) appconfig_move(&netdata_config, section_old, name_old, section_new, name_new) - -#define config_generate(buffer, only_changed) appconfig_generate(&netdata_config, buffer, only_changed) - -#define config_section_destroy(section) appconfig_section_destroy_non_loaded(&netdata_config, section) -#define config_section_option_destroy(section, name) appconfig_section_option_destroy_non_loaded(&netdata_config, section, name) +#include "libuv_workers.h" // ---------------------------------------------------------------------------- // netdata include files +#include "web/api/maps/maps.h" + #include "daemon/config/dyncfg.h" #include "global_statistics.h" @@ -55,7 +32,6 @@ // streaming metrics between netdata servers #include "streaming/rrdpush.h" - // anomaly detection #include "ml/ml.h" @@ -94,45 +70,28 @@ #include "analytics.h" // global netdata daemon variables -extern char *netdata_configured_hostname; -extern char *netdata_configured_user_config_dir; -extern char *netdata_configured_stock_config_dir; -extern char *netdata_configured_log_dir; -extern char *netdata_configured_primary_plugins_dir; -extern char *netdata_configured_web_dir; -extern char *netdata_configured_cache_dir; -extern char *netdata_configured_varlib_dir; -extern char *netdata_configured_lock_dir; -extern char *netdata_configured_home_dir; -extern char *netdata_configured_host_prefix; -extern char *netdata_configured_timezone; -extern char *netdata_configured_abbrev_timezone; +extern const char *netdata_configured_hostname; +extern const char *netdata_configured_user_config_dir; +extern const char *netdata_configured_stock_config_dir; +extern const char *netdata_configured_log_dir; +extern const char *netdata_configured_primary_plugins_dir; +extern const char *netdata_configured_web_dir; +extern const char *netdata_configured_cache_dir; +extern const char *netdata_configured_varlib_dir; +extern const char *netdata_configured_lock_dir; +extern const char *netdata_configured_cloud_dir; +extern const char *netdata_configured_home_dir; +extern const char *netdata_configured_host_prefix; +extern const char *netdata_configured_timezone; +extern const char *netdata_configured_abbrev_timezone; extern int32_t netdata_configured_utc_offset; extern int netdata_anonymous_statistics_enabled; extern bool netdata_ready; -extern int netdata_cloud_enabled; - extern time_t netdata_start_time; long get_netdata_cpus(void); -typedef enum __attribute__((packed)) { - CLOUD_STATUS_UNAVAILABLE = 0, // cloud and aclk functionality is not available on this agent - CLOUD_STATUS_AVAILABLE, // cloud and aclk functionality is available, but the agent is not claimed - CLOUD_STATUS_DISABLED, // cloud and aclk functionality is available, but it is disabled - CLOUD_STATUS_BANNED, // the agent has been banned from cloud - CLOUD_STATUS_OFFLINE, // the agent tries to connect to cloud, but cannot do it - CLOUD_STATUS_ONLINE, // the agent is connected to cloud -} CLOUD_STATUS; - -const char *cloud_status_to_string(CLOUD_STATUS status); -CLOUD_STATUS cloud_status(void); -time_t cloud_last_change(void); -time_t cloud_next_connection_attempt(void); -size_t cloud_connection_id(void); -const char *cloud_offline_reason(void); -const char *cloud_base_url(void); -CLOUD_STATUS buffer_json_cloud_status(BUFFER *wb, time_t now_s); +void set_environment_for_plugins_and_scripts(void); #endif /* NETDATA_COMMON_H */ diff --git a/src/daemon/config/README.md b/src/daemon/config/README.md index 3c0912fba..7217ec4ea 100644 --- a/src/daemon/config/README.md +++ b/src/daemon/config/README.md @@ -1,13 +1,3 @@ -<!-- -title: "Daemon configuration" -description: "The Netdata Agent's daemon is installed preconfigured to collect thousands of metrics every second, but is highly configurable for real-world workloads." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/daemon/config/README.md" -sidebar_label: "Daemon" -learn_status: "Published" -learn_rel_path: "Configuration" -learn_doc_purpose: "Explain the daemon options, the log files, the process scheduling, virtual memory, explain how the netdata.conf is used and backlink to the netdata.conf file reference" ---> - # Daemon configuration <details> @@ -53,7 +43,7 @@ comment on settings it does not currently use. ## Applying changes -After `netdata.conf` has been modified, Netdata needs to be [restarted](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for +After `netdata.conf` has been modified, Netdata needs to be [restarted](/docs/netdata-agent/start-stop-restart.md) for changes to apply: ```bash @@ -86,24 +76,22 @@ Please note that your data history will be lost if you have modified `history` p ### [db] section options -| setting | default | info | -|:---------------------------------------------:|:----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| mode | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `dbengine page cache size MB` and `dbengine disk space MB`. <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`alloc`: Similar to `ram`, but can significantly reduce memory usage, when combined with a low retention and does not support KSM. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. Not to be used together with streaming. | -| retention | `3600` | Used with `mode = ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/src/database/README.md) for more information. | -| storage tiers | `3` | The number of storage tiers you want to have in your dbengine. Check the tiering mechanism in the [dbengine's reference](/src/database/engine/README.md#tiering). You can have up to 5 tiers of data (including the _Tier 0_). This number ranges between 1 and 5. | -| dbengine page cache size MB | `32` | Determines the amount of RAM in MiB that is dedicated to caching for _Tier 0_ Netdata metric values. | -| dbengine tier **`N`** page cache size MB | `32` | Determines the amount of RAM in MiB that is dedicated for caching Netdata metric values of the **`N`** tier. <br /> `N belongs to [1..4]` | -| dbengine disk space MB | `256` | Determines the amount of disk space in MiB that is dedicated to storing _Tier 0_ Netdata metric values and all related metadata describing them. This option is available **only for legacy configuration** (`Agent v1.23.2 and prior`). | -| dbengine multihost disk space MB | `256` | Same functionality as `dbengine disk space MB`, but includes support for storing metrics streamed to a parent node by its children. Can be used in single-node environments as well. This setting is only for _Tier 0_ metrics. | -| dbengine tier **`N`** multihost disk space MB | `256` | Same functionality as `dbengine multihost disk space MB`, but stores metrics of the **`N`** tier (both parent node and its children). Can be used in single-node environments as well. <br /> `N belongs to [1..4]` | -| update every | `1` | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md). These metrics stored as _Tier 0_ data. Explore the tiering mechanism in the [dbengine's reference](/src/database/engine/README.md#tiering). | -| dbengine tier **`N`** update every iterations | `60` | The down sampling value of each tier from the previous one. For each Tier, the greater by one Tier has N (equal to 60 by default) less data points of any metric it collects. This setting can take values from `2` up to `255`. <br /> `N belongs to [1..4]` | -| dbengine tier **`N`** back fill | `New` | Specifies the strategy of recreating missing data on each Tier from the exact lower Tier. <br /> `New`: Sees the latest point on each Tier and save new points to it only if the exact lower Tier has available points for it's observation window (`dbengine tier N update every iterations` window). <br /> `none`: No back filling is applied. <br /> `N belongs to [1..4]` | -| memory deduplication (ksm) | `yes` | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/src/database/README.md#ksm) | -| cleanup obsolete charts after secs | `3600` | See [monitoring ephemeral containers](/src/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions | -| gap when lost iterations above | `1` | | -| cleanup orphan hosts after secs | `3600` | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data. | -| enable zero metrics | `no` | Set to `yes` to show charts when all their metrics are zero. | +| setting | default | info | +|:---------------------------------------------:|:-------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| mode | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `dbengine page cache size` and `dbengine tier X retention size`. <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`alloc`: Similar to `ram`, but can significantly reduce memory usage, when combined with a low retention and does not support KSM. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. Not to be used together with streaming. | +| retention | `3600` | Used with `mode = ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/src/database/README.md) for more information. | +| storage tiers | `3` | The number of storage tiers you want to have in your dbengine. Check the tiering mechanism in the [dbengine's reference](/src/database/engine/README.md#tiering). You can have up to 5 tiers of data (including the _Tier 0_). This number ranges between 1 and 5. | +| dbengine page cache size | `32MiB` | Determines the amount of RAM in MiB that is dedicated to caching for _Tier 0_ Netdata metric values. | +| dbengine tier **`N`** retention size | `1GiB` | The disk space dedicated to metrics storage, per tier. Can be used in single-node environments as well. <br /> `N belongs to [1..4]` | +| dbengine tier **`N`** retention time | `14d`, `3mo`, `1y`, `1y`, `1y` | The database retention, expressed in time. Can be used in single-node environments as well. <br /> `N belongs to [1..4]` | +| update every | `1` | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md). These metrics stored as _Tier 0_ data. Explore the tiering mechanism in the [dbengine's reference](/src/database/engine/README.md#tiering). | +| dbengine tier **`N`** update every iterations | `60` | The down sampling value of each tier from the previous one. For each Tier, the greater by one Tier has N (equal to 60 by default) less data points of any metric it collects. This setting can take values from `2` up to `255`. <br /> `N belongs to [1..4]` | +| dbengine tier back fill | `new` | Specifies the strategy of recreating missing data on higher database Tiers.<br /> `new`: Sees the latest point on each Tier and save new points to it only if the exact lower Tier has available points for it's observation window (`dbengine tier N update every iterations` window). <br /> `none`: No back filling is applied. <br /> `N belongs to [1..4]` | +| memory deduplication (ksm) | `yes` | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/src/database/README.md#ksm) | +| cleanup obsolete charts after | `1h` | See [monitoring ephemeral containers](/src/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions | +| gap when lost iterations above | `1` | | +| cleanup orphan hosts after | `1h` | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data. | +| enable zero metrics | `no` | Set to `yes` to show charts when all their metrics are zero. | > ### Info > @@ -140,7 +128,7 @@ There are additional configuration options for the logs. For more info, see [Net | health | `journal` | The filename to save the log of Netdata health collectors. You can also set it to `syslog` to send the access log to syslog, or `off` to disable this log. Defaults to `Journal` if using systemd. | | daemon | `journal` | The filename to save the log of Netdata daemon. You can also set it to `syslog` to send the access log to syslog, or `off` to disable this log. Defaults to `Journal` if using systemd. | | facility | `daemon` | A facility keyword is used to specify the type of system that is logging the message. | -| logs flood protection period | `60` | Length of period (in sec) during which the number of errors should not exceed the `errors to trigger flood protection`. | +| logs flood protection period | `1m` | Length of period during which the number of errors should not exceed the `errors to trigger flood protection`. | | logs to trigger flood protection | `1000` | Number of errors written to the log in `errors flood protection period` sec before flood protection is activated. | | level | `info` | Controls which log messages are logged, with error being the most important. Supported values: `info` and `error`. | @@ -172,15 +160,15 @@ monitoring](/src/health/README.md). [Alert notifications](/src/health/notifications/README.md) are configured in `health_alarm_notify.conf`. -| setting | default | info | -|:----------------------------------------------:|:------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| enabled | `yes` | Set to `no` to disable all alerts and notifications | -| in memory max health log entries | 1000 | Size of the alert history held in RAM | -| script to execute on alarm | `/usr/libexec/netdata/plugins.d/alarm-notify.sh` | The script that sends alert notifications. Note that in versions before 1.16, the plugins.d directory may be installed in a different location in certain OSs (e.g. under `/usr/lib/netdata`). | -| run at least every seconds | `10` | Controls how often all alert conditions should be evaluated. | -| postpone alarms during hibernation for seconds | `60` | Prevents false alerts. May need to be increased if you get alerts during hibernation. | -| health log history | `432000` | Specifies the history of alert events (in seconds) kept in the agent's sqlite database. | -| enabled alarms | * | Defines which alerts to load from both user and stock directories. This is a [simple pattern](/src/libnetdata/simple_pattern/README.md) list of alert or template names. Can be used to disable specific alerts. For example, `enabled alarms = !oom_kill *` will load all alerts except `oom_kill`. | +| setting | default | info | +|:--------------------------------------:|:------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| enabled | `yes` | Set to `no` to disable all alerts and notifications | +| in memory max health log entries | 1000 | Size of the alert history held in RAM | +| script to execute on alarm | `/usr/libexec/netdata/plugins.d/alarm-notify.sh` | The script that sends alert notifications. Note that in versions before 1.16, the plugins.d directory may be installed in a different location in certain OSs (e.g. under `/usr/lib/netdata`). | +| run at least every | `10s` | Controls how often all alert conditions should be evaluated. | +| postpone alarms during hibernation for | `1m` | Prevents false alerts. May need to be increased if you get alerts during hibernation. | +| health log retention | `5d` | Specifies the history of alert events (in seconds) kept in the agent's sqlite database. | +| enabled alarms | * | Defines which alerts to load from both user and stock directories. This is a [simple pattern](/src/libnetdata/simple_pattern/README.md) list of alert or template names. Can be used to disable specific alerts. For example, `enabled alarms = !oom_kill *` will load all alerts except `oom_kill`. | ### [web] section options diff --git a/src/daemon/config/dyncfg-echo.c b/src/daemon/config/dyncfg-echo.c index 95d40a025..f6eb48c35 100644 --- a/src/daemon/config/dyncfg-echo.c +++ b/src/daemon/config/dyncfg-echo.c @@ -96,7 +96,7 @@ void dyncfg_echo(const DICTIONARY_ITEM *item, DYNCFG *df, const char *id __maybe dyncfg_echo_cb, e, NULL, NULL, NULL, NULL, - NULL, string2str(df->dyncfg.source)); + NULL, string2str(df->dyncfg.source), false); } // ---------------------------------------------------------------------------- @@ -129,7 +129,7 @@ void dyncfg_echo_update(const DICTIONARY_ITEM *item, DYNCFG *df, const char *id) dyncfg_echo_cb, e, NULL, NULL, NULL, NULL, - df->dyncfg.payload, string2str(df->dyncfg.source)); + df->dyncfg.payload, string2str(df->dyncfg.source), false); } // ---------------------------------------------------------------------------- @@ -164,7 +164,7 @@ static void dyncfg_echo_payload_add(const DICTIONARY_ITEM *item_template __maybe dyncfg_echo_cb, e, NULL, NULL, NULL, NULL, - df_job->dyncfg.payload, string2str(df_job->dyncfg.source)); + df_job->dyncfg.payload, string2str(df_job->dyncfg.source), false); } void dyncfg_echo_add(const DICTIONARY_ITEM *item_template, const DICTIONARY_ITEM *item_job, DYNCFG *df_template, DYNCFG *df_job, const char *template_id, const char *job_name) { diff --git a/src/daemon/config/dyncfg-intercept.c b/src/daemon/config/dyncfg-intercept.c index 65f8383ed..b302d72aa 100644 --- a/src/daemon/config/dyncfg-intercept.c +++ b/src/daemon/config/dyncfg-intercept.c @@ -216,7 +216,7 @@ int dyncfg_function_intercept_cb(struct rrd_function_execute *rfe, void *data __ memcpy(buf, rfe->function, sizeof(buf)); char *words[20]; - size_t num_words = quoted_strings_splitter_pluginsd(buf, words, 20); + size_t num_words = quoted_strings_splitter_whitespace(buf, words, 20); size_t i = 0; char *config = get_word(words, num_words, i++); diff --git a/src/daemon/config/dyncfg-tree.c b/src/daemon/config/dyncfg-tree.c index 77d031fa0..4bad2f30f 100644 --- a/src/daemon/config/dyncfg-tree.c +++ b/src/daemon/config/dyncfg-tree.c @@ -71,12 +71,10 @@ static void dyncfg_tree_for_host(RRDHOST *host, BUFFER *wb, const char *path, co if(id && *id) template = string_strdupz(id); - ND_UUID host_uuid = uuid2UUID(host->host_uuid); - size_t path_len = strlen(path); DYNCFG *df; dfe_start_read(dyncfg_globals.nodes, df) { - if(!UUIDeq(df->host_uuid, host_uuid)) + if(!UUIDeq(df->host_uuid, host->host_id)) continue; if(strncmp(string2str(df->path), path, path_len) != 0) @@ -162,7 +160,7 @@ static int dyncfg_config_execute_cb(struct rrd_function_execute *rfe, void *data memcpy(buf, rfe->function, sizeof(buf)); char *words[MAX_FUNCTION_PARAMETERS]; // an array of pointers for the words in this line - size_t num_words = quoted_strings_splitter_pluginsd(buf, words, MAX_FUNCTION_PARAMETERS); + size_t num_words = quoted_strings_splitter_whitespace(buf, words, MAX_FUNCTION_PARAMETERS); const char *config = get_word(words, num_words, 0); const char *action = get_word(words, num_words, 1); @@ -266,7 +264,7 @@ static int dyncfg_config_execute_cb(struct rrd_function_execute *rfe, void *data rrd_call_function_error( rfe->result.wb, - "unknown config id given", code); + "Unknown config id given.", code); } cleanup: @@ -286,7 +284,7 @@ void dyncfg_host_init(RRDHOST *host) { // This function needs to be async, although it is internal. // The reason is that it can call by itself another function that may or may not be internal (sync). - rrd_function_add(host, NULL, PLUGINSD_FUNCTION_CONFIG, 120, - 1000, "Dynamic configuration", "config", HTTP_ACCESS_ANONYMOUS_DATA, + rrd_function_add(host, NULL, PLUGINSD_FUNCTION_CONFIG, 120, 1000, DYNCFG_FUNCTIONS_VERSION, + "Dynamic configuration", "config", HTTP_ACCESS_ANONYMOUS_DATA, false, dyncfg_config_execute_cb, host); } diff --git a/src/daemon/config/dyncfg-unittest.c b/src/daemon/config/dyncfg-unittest.c index 775dc7cbd..763451501 100644 --- a/src/daemon/config/dyncfg-unittest.c +++ b/src/daemon/config/dyncfg-unittest.c @@ -195,7 +195,7 @@ static int dyncfg_unittest_execute_cb(struct rrd_function_execute *rfe, void *da memcpy(buf, rfe->function, sizeof(buf)); char *words[MAX_FUNCTION_PARAMETERS]; // an array of pointers for the words in this line - size_t num_words = quoted_strings_splitter_pluginsd(buf, words, MAX_FUNCTION_PARAMETERS); + size_t num_words = quoted_strings_splitter_whitespace(buf, words, MAX_FUNCTION_PARAMETERS); const char *config = get_word(words, num_words, 0); const char *id = get_word(words, num_words, 1); @@ -426,7 +426,7 @@ static int dyncfg_unittest_run(const char *cmd, BUFFER *wb, const char *payload, memcpy(buf, cmd, sizeof(buf)); char *words[MAX_FUNCTION_PARAMETERS]; // an array of pointers for the words in this line - size_t num_words = quoted_strings_splitter_pluginsd(buf, words, MAX_FUNCTION_PARAMETERS); + size_t num_words = quoted_strings_splitter_whitespace(buf, words, MAX_FUNCTION_PARAMETERS); // const char *config = get_word(words, num_words, 0); const char *id = get_word(words, num_words, 1); @@ -473,7 +473,7 @@ static int dyncfg_unittest_run(const char *cmd, BUFFER *wb, const char *payload, NULL, NULL, NULL, NULL, NULL, NULL, - pld, source); + pld, source, false); if(!DYNCFG_RESP_SUCCESS(rc)) { nd_log(NDLS_DAEMON, NDLP_ERR, "DYNCFG UNITTEST: failed to run: %s; returned code %d", cmd, rc); dyncfg_unittest_register_error(NULL, NULL); diff --git a/src/daemon/config/dyncfg.c b/src/daemon/config/dyncfg.c index 2f484d1ed..e6c1768cc 100644 --- a/src/daemon/config/dyncfg.c +++ b/src/daemon/config/dyncfg.c @@ -192,7 +192,7 @@ const DICTIONARY_ITEM *dyncfg_add_internal(RRDHOST *host, const char *id, const rrd_function_execute_cb_t execute_cb, void *execute_cb_data, bool overwrite_cb) { DYNCFG tmp = { - .host_uuid = uuid2UUID(host->host_uuid), + .host_uuid = host->host_id, .path = string_strdupz(path), .cmds = cmds, .type = type, @@ -358,6 +358,7 @@ bool dyncfg_add_low_level(RRDHOST *host, const char *id, const char *path, string2str(df->function), 120, 1000, + DYNCFG_FUNCTIONS_VERSION, "Dynamic configuration", "config", (view_access & edit_access), diff --git a/src/daemon/config/dyncfg.h b/src/daemon/config/dyncfg.h index 539eddbfb..84fab07d2 100644 --- a/src/daemon/config/dyncfg.h +++ b/src/daemon/config/dyncfg.h @@ -7,6 +7,8 @@ #include "database/rrd.h" #include "database/rrdfunctions.h" +#define DYNCFG_FUNCTIONS_VERSION 0 + void dyncfg_add_streaming(BUFFER *wb); bool dyncfg_available_for_rrdhost(RRDHOST *host); void dyncfg_host_init(RRDHOST *host); diff --git a/src/daemon/daemon.c b/src/daemon/daemon.c index 2392d4cc1..d3ddf027d 100644 --- a/src/daemon/daemon.c +++ b/src/daemon/daemon.c @@ -3,34 +3,24 @@ #include "common.h" #include <sched.h> -char pidfile[FILENAME_MAX + 1] = ""; -char claiming_directory[FILENAME_MAX + 1]; -char netdata_exe_path[FILENAME_MAX + 1]; -char netdata_exe_file[FILENAME_MAX + 1]; +char *pidfile = NULL; +char *netdata_exe_path = NULL; void get_netdata_execution_path(void) { - int ret; - size_t exepath_size = 0; - struct passwd *passwd = NULL; - char *user = NULL; - - passwd = getpwuid(getuid()); - user = (passwd && passwd->pw_name) ? passwd->pw_name : ""; - - exepath_size = sizeof(netdata_exe_file) - 1; - ret = uv_exepath(netdata_exe_file, &exepath_size); - if (0 != ret) { - netdata_log_error("uv_exepath(\"%s\", %u) (user: %s) failed (%s).", netdata_exe_file, (unsigned)exepath_size, user, - uv_strerror(ret)); - fatal("Cannot start netdata without getting execution path."); + struct passwd *passwd = getpwuid(getuid()); + char *user = (passwd && passwd->pw_name) ? passwd->pw_name : ""; + + char b[FILENAME_MAX + 1]; + size_t b_size = sizeof(b) - 1; + int ret = uv_exepath(b, &b_size); + if (ret != 0) { + fatal("Cannot start netdata without getting execution path. " + "(uv_exepath(\"%s\", %zu), user: '%s', failed: %s).", + b, b_size, user, uv_strerror(ret)); } + b[b_size] = '\0'; - netdata_exe_file[exepath_size] = '\0'; - - // macOS's dirname(3) does not modify passed string - char *tmpdir = strdupz(netdata_exe_file); - strcpy(netdata_exe_path, dirname(tmpdir)); - freez(tmpdir); + netdata_exe_path = strdupz(b); } static void fix_directory_file_permissions(const char *dirname, uid_t uid, gid_t gid, bool recursive) @@ -68,7 +58,7 @@ static void change_dir_ownership(const char *dir, uid_t uid, gid_t gid, bool rec fix_directory_file_permissions(dir, uid, gid, recursive); } -static void clean_directory(char *dirname) +static void clean_directory(const char *dirname) { DIR *dir = opendir(dirname); if(!dir) return; @@ -89,7 +79,7 @@ static void prepare_required_directories(uid_t uid, gid_t gid) { change_dir_ownership(netdata_configured_varlib_dir, uid, gid, false); change_dir_ownership(netdata_configured_lock_dir, uid, gid, false); change_dir_ownership(netdata_configured_log_dir, uid, gid, false); - change_dir_ownership(claiming_directory, uid, gid, false); + change_dir_ownership(netdata_configured_cloud_dir, uid, gid, false); char filename[FILENAME_MAX + 1]; snprintfz(filename, FILENAME_MAX, "%s/registry", netdata_configured_varlib_dir); @@ -112,7 +102,7 @@ static int become_user(const char *username, int pid_fd) { prepare_required_directories(uid, gid); - if(pidfile[0]) { + if(pidfile && *pidfile) { if(chown(pidfile, uid, gid) == -1) netdata_log_error("Cannot chown '%s' to %u:%u", pidfile, (unsigned int)uid, (unsigned int)gid); } @@ -198,7 +188,7 @@ static void oom_score_adj(void) { } // check the environment - char *s = getenv("OOMScoreAdjust"); + const char *s = getenv("OOMScoreAdjust"); if(!s || !*s) { snprintfz(buf, sizeof(buf) - 1, "%d", (int)wanted_score); s = buf; @@ -442,9 +432,8 @@ int become_daemon(int dont_fork, const char *user) perror("cannot fork"); exit(1); } - if(i != 0) { - exit(0); // the parent - } + if(i != 0) exit(0); // the parent + gettid_uncached(); // become session leader if (setsid() < 0) { @@ -458,14 +447,13 @@ int become_daemon(int dont_fork, const char *user) perror("cannot fork"); exit(1); } - if(i != 0) { - exit(0); // the parent - } + if(i != 0) exit(0); // the parent + gettid_uncached(); } // generate our pid file int pidfd = -1; - if(pidfile[0]) { + if(pidfile && *pidfile) { pidfd = open(pidfile, O_WRONLY | O_CREAT | O_CLOEXEC, 0644); if(pidfd >= 0) { if(ftruncate(pidfd, 0) != 0) @@ -490,9 +478,6 @@ int become_daemon(int dont_fork, const char *user) // never become a problem sched_setscheduler_set(); - // Set claiming directory based on user config directory with correct ownership - snprintfz(claiming_directory, FILENAME_MAX, "%s/cloud.d", netdata_configured_varlib_dir); - if(user && *user) { if(become_user(user, pidfd) != 0) { netdata_log_error("Cannot become user '%s'. Continuing as we are.", user); diff --git a/src/daemon/daemon.h b/src/daemon/daemon.h index 1f8837fd6..13ef1f647 100644 --- a/src/daemon/daemon.h +++ b/src/daemon/daemon.h @@ -9,8 +9,7 @@ void netdata_cleanup_and_exit(int ret, const char *action, const char *action_re void get_netdata_execution_path(void); -extern char pidfile[]; -extern char netdata_exe_file[]; -extern char netdata_exe_path[]; +extern char *pidfile; +extern char *netdata_exe_path; #endif /* NETDATA_DAEMON_H */ diff --git a/src/daemon/environment.c b/src/daemon/environment.c new file mode 100644 index 000000000..2822278d3 --- /dev/null +++ b/src/daemon/environment.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "common.h" + +static const char *verify_required_directory(const char *dir) +{ + if (chdir(dir) == -1) + fatal("Cannot change directory to '%s'", dir); + + DIR *d = opendir(dir); + if (!d) + fatal("Cannot examine the contents of directory '%s'", dir); + closedir(d); + + return dir; +} + +static const char *verify_or_create_required_directory(const char *dir) { + errno_clear(); + + if (mkdir(dir, 0755) != 0 && errno != EEXIST) + fatal("Cannot create required directory '%s'", dir); + + return verify_required_directory(dir); +} + +static const char *verify_or_create_required_private_directory(const char *dir) { + errno_clear(); + + if (mkdir(dir, 0770) != 0 && errno != EEXIST) + fatal("Cannot create required directory '%s'", dir); + + return verify_required_directory(dir); +} + +void set_environment_for_plugins_and_scripts(void) { + { + char b[16]; + snprintfz(b, sizeof(b) - 1, "%d", default_rrd_update_every); + nd_setenv("NETDATA_UPDATE_EVERY", b, 1); + } + + nd_setenv("NETDATA_VERSION", NETDATA_VERSION, 1); + nd_setenv("NETDATA_HOSTNAME", netdata_configured_hostname, 1); + nd_setenv("NETDATA_CONFIG_DIR", verify_required_directory(netdata_configured_user_config_dir), 1); + nd_setenv("NETDATA_USER_CONFIG_DIR", verify_required_directory(netdata_configured_user_config_dir), 1); + nd_setenv("NETDATA_STOCK_CONFIG_DIR", verify_required_directory(netdata_configured_stock_config_dir), 1); + nd_setenv("NETDATA_PLUGINS_DIR", verify_required_directory(netdata_configured_primary_plugins_dir), 1); + nd_setenv("NETDATA_WEB_DIR", verify_required_directory(netdata_configured_web_dir), 1); + nd_setenv("NETDATA_CACHE_DIR", verify_or_create_required_directory(netdata_configured_cache_dir), 1); + nd_setenv("NETDATA_LIB_DIR", verify_or_create_required_directory(netdata_configured_varlib_dir), 1); + nd_setenv("NETDATA_LOCK_DIR", verify_or_create_required_directory(netdata_configured_lock_dir), 1); + nd_setenv("NETDATA_LOG_DIR", verify_or_create_required_directory(netdata_configured_log_dir), 1); + nd_setenv("NETDATA_HOST_PREFIX", netdata_configured_host_prefix, 1); + + nd_setenv("CLAIMING_DIR", verify_or_create_required_private_directory(netdata_configured_cloud_dir), 1); + + { + BUFFER *user_plugins_dirs = buffer_create(FILENAME_MAX, NULL); + + for (size_t i = 1; i < PLUGINSD_MAX_DIRECTORIES && plugin_directories[i]; i++) { + if (i > 1) + buffer_strcat(user_plugins_dirs, " "); + buffer_strcat(user_plugins_dirs, plugin_directories[i]); + } + + nd_setenv("NETDATA_USER_PLUGINS_DIRS", buffer_tostring(user_plugins_dirs), 1); + + buffer_free(user_plugins_dirs); + } + + const char *default_port = appconfig_get(&netdata_config, CONFIG_SECTION_WEB, "default port", NULL); + int clean = 0; + if (!default_port) { + default_port = strdupz("19999"); + clean = 1; + } + + nd_setenv("NETDATA_LISTEN_PORT", default_port, 1); + if (clean) + freez((char *)default_port); + + // set the path we need + char path[4096], *p = getenv("PATH"); + if (!p) p = "/bin:/usr/bin"; + snprintfz(path, sizeof(path), "%s:%s", p, "/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin"); + setenv("PATH", config_get(CONFIG_SECTION_ENV_VARS, "PATH", path), 1); + + // python options + p = getenv("PYTHONPATH"); + if (!p) p = ""; + setenv("PYTHONPATH", config_get(CONFIG_SECTION_ENV_VARS, "PYTHONPATH", p), 1); + + // disable buffering for python plugins + setenv("PYTHONUNBUFFERED", "1", 1); + + // switch to standard locale for plugins + setenv("LC_ALL", "C", 1); +} diff --git a/src/daemon/global_statistics.c b/src/daemon/global_statistics.c index 17fd53761..236298a59 100644 --- a/src/daemon/global_statistics.c +++ b/src/daemon/global_statistics.c @@ -3502,8 +3502,7 @@ static struct worker_utilization all_workers_utilization[] = { { .name = "DBENGINE", .family = "workers dbengine instances", .priority = 1000000 }, { .name = "LIBUV", .family = "workers libuv threadpool", .priority = 1000000 }, { .name = "WEB", .family = "workers web server", .priority = 1000000 }, - { .name = "ACLKQUERY", .family = "workers aclk query", .priority = 1000000 }, - { .name = "ACLKSYNC", .family = "workers aclk host sync", .priority = 1000000 }, + { .name = "ACLKSYNC", .family = "workers aclk sync", .priority = 1000000 }, { .name = "METASYNC", .family = "workers metadata sync", .priority = 1000000 }, { .name = "PLUGINSD", .family = "workers plugins.d", .priority = 1000000 }, { .name = "STATSD", .family = "workers plugin statsd", .priority = 1000000 }, @@ -4222,13 +4221,15 @@ void *global_statistics_main(void *ptr) global_statistics_register_workers(); int update_every = - (int)config_get_number(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", localhost->rrd_update_every); - if (update_every < localhost->rrd_update_every) + (int)config_get_duration_seconds(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", localhost->rrd_update_every); + if (update_every < localhost->rrd_update_every) { update_every = localhost->rrd_update_every; + config_set_duration_seconds(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", update_every); + } usec_t step = update_every * USEC_PER_SEC; heartbeat_t hb; - heartbeat_init(&hb); + heartbeat_init(&hb, USEC_PER_SEC); usec_t real_step = USEC_PER_SEC; // keep the randomness at zero @@ -4237,7 +4238,7 @@ void *global_statistics_main(void *ptr) while (service_running(SERVICE_COLLECTORS)) { worker_is_idle(); - heartbeat_next(&hb, USEC_PER_SEC); + heartbeat_next(&hb); if (real_step < step) { real_step += USEC_PER_SEC; continue; @@ -4278,18 +4279,20 @@ void *global_statistics_extended_main(void *ptr) global_statistics_register_workers(); int update_every = - (int)config_get_number(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", localhost->rrd_update_every); - if (update_every < localhost->rrd_update_every) + (int)config_get_duration_seconds(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", localhost->rrd_update_every); + if (update_every < localhost->rrd_update_every) { update_every = localhost->rrd_update_every; + config_set_duration_seconds(CONFIG_SECTION_GLOBAL_STATISTICS, "update every", update_every); + } usec_t step = update_every * USEC_PER_SEC; heartbeat_t hb; - heartbeat_init(&hb); + heartbeat_init(&hb, USEC_PER_SEC); usec_t real_step = USEC_PER_SEC; while (service_running(SERVICE_COLLECTORS)) { worker_is_idle(); - heartbeat_next(&hb, USEC_PER_SEC); + heartbeat_next(&hb); if (real_step < step) { real_step += USEC_PER_SEC; continue; diff --git a/src/daemon/h2o-common.c b/src/daemon/h2o-common.c new file mode 100644 index 000000000..aa7a3c581 --- /dev/null +++ b/src/daemon/h2o-common.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "common.h" + +const char *netdata_configured_hostname = NULL; +const char *netdata_configured_user_config_dir = CONFIG_DIR; +const char *netdata_configured_stock_config_dir = LIBCONFIG_DIR; +const char *netdata_configured_log_dir = LOG_DIR; +const char *netdata_configured_primary_plugins_dir = PLUGINS_DIR; +const char *netdata_configured_web_dir = WEB_DIR; +const char *netdata_configured_cache_dir = CACHE_DIR; +const char *netdata_configured_varlib_dir = VARLIB_DIR; +const char *netdata_configured_lock_dir = VARLIB_DIR "/lock"; +const char *netdata_configured_cloud_dir = VARLIB_DIR "/cloud.d"; +const char *netdata_configured_home_dir = VARLIB_DIR; +const char *netdata_configured_host_prefix = NULL; +const char *netdata_configured_timezone = NULL; +const char *netdata_configured_abbrev_timezone = NULL; +int32_t netdata_configured_utc_offset = 0; + +bool netdata_ready = false; + +long get_netdata_cpus(void) { + static long processors = 0; + + if(processors) + return processors; + + long cores_proc_stat = os_get_system_cpus_cached(false, true); + long cores_cpuset_v1 = (long)os_read_cpuset_cpus("/sys/fs/cgroup/cpuset/cpuset.cpus", cores_proc_stat); + long cores_cpuset_v2 = (long)os_read_cpuset_cpus("/sys/fs/cgroup/cpuset.cpus", cores_proc_stat); + + if(cores_cpuset_v2) + processors = cores_cpuset_v2; + else if(cores_cpuset_v1) + processors = cores_cpuset_v1; + else + processors = cores_proc_stat; + + long cores_user_configured = config_get_number(CONFIG_SECTION_GLOBAL, "cpu cores", processors); + + errno_clear(); + internal_error(true, + "System CPUs: %ld, (" + "system: %ld, cgroups cpuset v1: %ld, cgroups cpuset v2: %ld, netdata.conf: %ld" + ")" + , processors + , cores_proc_stat + , cores_cpuset_v1 + , cores_cpuset_v2 + , cores_user_configured + ); + + processors = cores_user_configured; + + if(processors < 1) + processors = 1; + + return processors; +} diff --git a/src/daemon/event_loop.c b/src/daemon/libuv_workers.c index d1908ec15..441002d06 100644 --- a/src/daemon/event_loop.c +++ b/src/daemon/libuv_workers.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-3.0-or-later #include <daemon/main.h> -#include "event_loop.h" +#include "libuv_workers.h" // Register workers void register_libuv_worker_jobs() { diff --git a/src/daemon/event_loop.h b/src/daemon/libuv_workers.h index c1821c646..c1821c646 100644 --- a/src/daemon/event_loop.h +++ b/src/daemon/libuv_workers.h diff --git a/src/daemon/main.c b/src/daemon/main.c index 17fef8449..03ae7e003 100644 --- a/src/daemon/main.c +++ b/src/daemon/main.c @@ -6,6 +6,7 @@ #include "static_threads.h" #include "database/engine/page_test.h" +#include <curl/curl.h> #ifdef OS_WINDOWS #include "win_system-info.h" @@ -27,18 +28,7 @@ bool ieee754_doubles = false; time_t netdata_start_time = 0; struct netdata_static_thread *static_threads; -struct config netdata_config = { - .first_section = NULL, - .last_section = NULL, - .mutex = NETDATA_MUTEX_INITIALIZER, - .index = { - .avl_tree = { - .root = NULL, - .compar = appconfig_section_compare - }, - .rwlock = AVL_LOCK_INITIALIZER - } -}; +struct config netdata_config = APPCONFIG_INITIALIZER; typedef struct service_thread { pid_t tid; @@ -326,6 +316,7 @@ void web_client_cache_destroy(void); void netdata_cleanup_and_exit(int ret, const char *action, const char *action_result, const char *action_data) { netdata_exit = 1; + usec_t shutdown_start_time = now_monotonic_usec(); watcher_shutdown_begin(); nd_log_limits_unlimited(); @@ -361,7 +352,7 @@ void netdata_cleanup_and_exit(int ret, const char *action, const char *action_re watcher_step_complete(WATCHER_STEP_ID_CLOSE_WEBRTC_CONNECTIONS); service_signal_exit(SERVICE_MAINTENANCE | ABILITY_DATA_QUERIES | ABILITY_WEB_REQUESTS | - ABILITY_STREAMING_CONNECTIONS | SERVICE_ACLK | SERVICE_ACLKSYNC); + ABILITY_STREAMING_CONNECTIONS | SERVICE_ACLK); watcher_step_complete(WATCHER_STEP_ID_DISABLE_MAINTENANCE_NEW_QUERIES_NEW_WEB_REQUESTS_NEW_STREAMING_CONNECTIONS_AND_ACLK); service_wait_exit(SERVICE_MAINTENANCE, 3 * USEC_PER_SEC); @@ -474,21 +465,22 @@ void netdata_cleanup_and_exit(int ret, const char *action, const char *action_re #endif } + // Don't register a shutdown event if we crashed + if (!ret) + add_agent_event(EVENT_AGENT_SHUTDOWN_TIME, (int64_t)(now_monotonic_usec() - shutdown_start_time)); sqlite_close_databases(); watcher_step_complete(WATCHER_STEP_ID_CLOSE_SQL_DATABASES); sqlite_library_shutdown(); // unlink the pid - if(pidfile[0]) { + if(pidfile && *pidfile) { if(unlink(pidfile) != 0) netdata_log_error("EXIT: cannot unlink pidfile '%s'.", pidfile); } watcher_step_complete(WATCHER_STEP_ID_REMOVE_PID_FILE); -#ifdef ENABLE_HTTPS netdata_ssl_cleanup(); -#endif watcher_step_complete(WATCHER_STEP_ID_FREE_OPENSSL_STRUCTURES); (void) unlink(agent_incomplete_shutdown_file); @@ -496,6 +488,7 @@ void netdata_cleanup_and_exit(int ret, const char *action, const char *action_re watcher_shutdown_end(); watcher_thread_stop(); + curl_global_cleanup(); #ifdef OS_WINDOWS return; @@ -527,12 +520,12 @@ void web_server_threading_selection(void) { int make_dns_decision(const char *section_name, const char *config_name, const char *default_value, SIMPLE_PATTERN *p) { - char *value = config_get(section_name,config_name,default_value); + const char *value = config_get(section_name,config_name,default_value); if(!strcmp("yes",value)) return 1; if(!strcmp("no",value)) return 0; - if(strcmp("heuristic",value)) + if(strcmp("heuristic",value) != 0) netdata_log_error("Invalid configuration option '%s' for '%s'/'%s'. Valid options are 'yes', 'no' and 'heuristic'. Proceeding with 'heuristic'", value, section_name, config_name); @@ -542,11 +535,13 @@ int make_dns_decision(const char *section_name, const char *config_name, const c void web_server_config_options(void) { web_client_timeout = - (int)config_get_number(CONFIG_SECTION_WEB, "disconnect idle clients after seconds", web_client_timeout); + (int)config_get_duration_seconds(CONFIG_SECTION_WEB, "disconnect idle clients after", web_client_timeout); + web_client_first_request_timeout = - (int)config_get_number(CONFIG_SECTION_WEB, "timeout for first request", web_client_first_request_timeout); + (int)config_get_duration_seconds(CONFIG_SECTION_WEB, "timeout for first request", web_client_first_request_timeout); + web_client_streaming_rate_t = - config_get_number(CONFIG_SECTION_WEB, "accept a streaming request every seconds", web_client_streaming_rate_t); + config_get_duration_seconds(CONFIG_SECTION_WEB, "accept a streaming request every", web_client_streaming_rate_t); respect_web_browser_do_not_track_policy = config_get_boolean(CONFIG_SECTION_WEB, "respect do not track policy", respect_web_browser_do_not_track_policy); @@ -595,7 +590,7 @@ void web_server_config_options(void) web_enable_gzip = config_get_boolean(CONFIG_SECTION_WEB, "enable gzip compression", web_enable_gzip); - char *s = config_get(CONFIG_SECTION_WEB, "gzip compression strategy", "default"); + const char *s = config_get(CONFIG_SECTION_WEB, "gzip compression strategy", "default"); if(!strcmp(s, "default")) web_gzip_strategy = Z_DEFAULT_STRATEGY; else if(!strcmp(s, "filtered")) @@ -807,8 +802,6 @@ int help(int exitcode) { " are enabled or not, in JSON format.\n\n" " -W simple-pattern pattern string\n" " Check if string matches pattern and exit.\n\n" - " -W \"claim -token=TOKEN -rooms=ROOM1,ROOM2\"\n" - " Claim the agent to the workspace rooms pointed to by TOKEN and ROOM*.\n\n" #ifdef OS_WINDOWS " -W perflibdump [key]\n" " Dump the Windows Performance Counters Registry in JSON.\n\n" @@ -825,7 +818,6 @@ int help(int exitcode) { return exitcode; } -#ifdef ENABLE_HTTPS static void security_init(){ char filename[FILENAME_MAX + 1]; snprintfz(filename, FILENAME_MAX, "%s/ssl/key.pem",netdata_configured_user_config_dir); @@ -839,14 +831,13 @@ static void security_init(){ netdata_ssl_initialize_openssl(); } -#endif static void log_init(void) { nd_log_set_facility(config_get(CONFIG_SECTION_LOGS, "facility", "daemon")); time_t period = ND_LOG_DEFAULT_THROTTLE_PERIOD; size_t logs = ND_LOG_DEFAULT_THROTTLE_LOGS; - period = config_get_number(CONFIG_SECTION_LOGS, "logs flood protection period", period); + period = config_get_duration_seconds(CONFIG_SECTION_LOGS, "logs flood protection period", period); logs = (unsigned long)config_get_number(CONFIG_SECTION_LOGS, "logs to trigger flood protection", (long long int)logs); nd_log_set_flood_protection(logs, period); @@ -856,50 +847,75 @@ static void log_init(void) { nd_log_set_priority_level(config_get(CONFIG_SECTION_LOGS, "level", netdata_log_level)); char filename[FILENAME_MAX + 1]; + char* os_default_method = NULL; +#if defined(OS_LINUX) + os_default_method = is_stderr_connected_to_journal() /* || nd_log_journal_socket_available() */ ? "journal" : NULL; +#elif defined(OS_WINDOWS) +#if defined(HAVE_ETW) + os_default_method = "etw"; +#elif defined(HAVE_WEL) + os_default_method = "wel"; +#endif +#endif + +#if defined(OS_WINDOWS) + // on windows, debug log goes to windows events + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); +#else snprintfz(filename, FILENAME_MAX, "%s/debug.log", netdata_configured_log_dir); +#endif + nd_log_set_user_settings(NDLS_DEBUG, config_get(CONFIG_SECTION_LOGS, "debug", filename)); - bool with_journal = is_stderr_connected_to_journal() /* || nd_log_journal_socket_available() */; - if(with_journal) - snprintfz(filename, FILENAME_MAX, "journal"); + if(os_default_method) + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); else snprintfz(filename, FILENAME_MAX, "%s/daemon.log", netdata_configured_log_dir); nd_log_set_user_settings(NDLS_DAEMON, config_get(CONFIG_SECTION_LOGS, "daemon", filename)); - if(with_journal) - snprintfz(filename, FILENAME_MAX, "journal"); + if(os_default_method) + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); else snprintfz(filename, FILENAME_MAX, "%s/collector.log", netdata_configured_log_dir); nd_log_set_user_settings(NDLS_COLLECTORS, config_get(CONFIG_SECTION_LOGS, "collector", filename)); +#if defined(OS_WINDOWS) + // on windows, access log goes to windows events + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); +#else snprintfz(filename, FILENAME_MAX, "%s/access.log", netdata_configured_log_dir); +#endif nd_log_set_user_settings(NDLS_ACCESS, config_get(CONFIG_SECTION_LOGS, "access", filename)); - if(with_journal) - snprintfz(filename, FILENAME_MAX, "journal"); + if(os_default_method) + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); else snprintfz(filename, FILENAME_MAX, "%s/health.log", netdata_configured_log_dir); nd_log_set_user_settings(NDLS_HEALTH, config_get(CONFIG_SECTION_LOGS, "health", filename)); -#ifdef ENABLE_ACLK aclklog_enabled = config_get_boolean(CONFIG_SECTION_CLOUD, "conversation log", CONFIG_BOOLEAN_NO); if (aclklog_enabled) { +#if defined(OS_WINDOWS) + // on windows, aclk log goes to windows events + snprintfz(filename, FILENAME_MAX, "%s", os_default_method); +#else snprintfz(filename, FILENAME_MAX, "%s/aclk.log", netdata_configured_log_dir); +#endif nd_log_set_user_settings(NDLS_ACLK, config_get(CONFIG_SECTION_CLOUD, "conversation log file", filename)); } -#endif + + aclk_config_get_query_scope(); } -char *initialize_lock_directory_path(char *prefix) -{ +static const char *get_varlib_subdir_from_config(const char *prefix, const char *dir) { char filename[FILENAME_MAX + 1]; - snprintfz(filename, FILENAME_MAX, "%s/lock", prefix); - - return config_get(CONFIG_SECTION_DIRECTORIES, "lock", filename); + snprintfz(filename, FILENAME_MAX, "%s/%s", prefix, dir); + return config_get(CONFIG_SECTION_DIRECTORIES, dir, filename); } static void backwards_compatible_config() { // move [global] options to the [web] section + config_move(CONFIG_SECTION_GLOBAL, "http port listen backlog", CONFIG_SECTION_WEB, "listen backlog"); @@ -1003,7 +1019,10 @@ static void backwards_compatible_config() { CONFIG_SECTION_PLUGINS, "statsd"); config_move(CONFIG_SECTION_GLOBAL, "memory mode", - CONFIG_SECTION_DB, "mode"); + CONFIG_SECTION_DB, "db"); + + config_move(CONFIG_SECTION_DB, "mode", + CONFIG_SECTION_DB, "db"); config_move(CONFIG_SECTION_GLOBAL, "history", CONFIG_SECTION_DB, "retention"); @@ -1012,7 +1031,13 @@ static void backwards_compatible_config() { CONFIG_SECTION_DB, "update every"); config_move(CONFIG_SECTION_GLOBAL, "page cache size", - CONFIG_SECTION_DB, "dbengine page cache size MB"); + CONFIG_SECTION_DB, "dbengine page cache size"); + + config_move(CONFIG_SECTION_DB, "dbengine page cache size MB", + CONFIG_SECTION_DB, "dbengine page cache size"); + + config_move(CONFIG_SECTION_DB, "dbengine extent cache size MB", + CONFIG_SECTION_DB, "dbengine extent cache size"); config_move(CONFIG_SECTION_DB, "page cache size", CONFIG_SECTION_DB, "dbengine page cache size MB"); @@ -1023,30 +1048,6 @@ static void backwards_compatible_config() { config_move(CONFIG_SECTION_DB, "page cache with malloc", CONFIG_SECTION_DB, "dbengine page cache with malloc"); - config_move(CONFIG_SECTION_GLOBAL, "dbengine disk space", - CONFIG_SECTION_DB, "dbengine disk space MB"); - - config_move(CONFIG_SECTION_GLOBAL, "dbengine multihost disk space", - CONFIG_SECTION_DB, "dbengine multihost disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine disk space MB", - CONFIG_SECTION_DB, "dbengine multihost disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine multihost disk space MB", - CONFIG_SECTION_DB, "dbengine tier 0 disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine tier 1 multihost disk space MB", - CONFIG_SECTION_DB, "dbengine tier 1 disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine tier 2 multihost disk space MB", - CONFIG_SECTION_DB, "dbengine tier 2 disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine tier 3 multihost disk space MB", - CONFIG_SECTION_DB, "dbengine tier 3 disk space MB"); - - config_move(CONFIG_SECTION_DB, "dbengine tier 4 multihost disk space MB", - CONFIG_SECTION_DB, "dbengine tier 4 disk space MB"); - config_move(CONFIG_SECTION_GLOBAL, "memory deduplication (ksm)", CONFIG_SECTION_DB, "memory deduplication (ksm)"); @@ -1060,17 +1061,67 @@ static void backwards_compatible_config() { CONFIG_SECTION_DB, "dbengine pages per extent"); config_move(CONFIG_SECTION_GLOBAL, "cleanup obsolete charts after seconds", - CONFIG_SECTION_DB, "cleanup obsolete charts after secs"); + CONFIG_SECTION_DB, "cleanup obsolete charts after"); + + config_move(CONFIG_SECTION_DB, "cleanup obsolete charts after secs", + CONFIG_SECTION_DB, "cleanup obsolete charts after"); config_move(CONFIG_SECTION_GLOBAL, "gap when lost iterations above", CONFIG_SECTION_DB, "gap when lost iterations above"); config_move(CONFIG_SECTION_GLOBAL, "cleanup orphan hosts after seconds", - CONFIG_SECTION_DB, "cleanup orphan hosts after secs"); + CONFIG_SECTION_DB, "cleanup orphan hosts after"); + + config_move(CONFIG_SECTION_DB, "cleanup orphan hosts after secs", + CONFIG_SECTION_DB, "cleanup orphan hosts after"); + + config_move(CONFIG_SECTION_DB, "cleanup ephemeral hosts after secs", + CONFIG_SECTION_DB, "cleanup ephemeral hosts after"); + + config_move(CONFIG_SECTION_DB, "seconds to replicate", + CONFIG_SECTION_DB, "replication period"); + + config_move(CONFIG_SECTION_DB, "seconds per replication step", + CONFIG_SECTION_DB, "replication step"); config_move(CONFIG_SECTION_GLOBAL, "enable zero metrics", CONFIG_SECTION_DB, "enable zero metrics"); + // ---------------------------------------------------------------------------------------------------------------- + + config_move(CONFIG_SECTION_GLOBAL, "dbengine disk space", + CONFIG_SECTION_DB, "dbengine tier 0 retention size"); + + config_move(CONFIG_SECTION_GLOBAL, "dbengine multihost disk space", + CONFIG_SECTION_DB, "dbengine tier 0 retention size"); + + config_move(CONFIG_SECTION_DB, "dbengine disk space MB", + CONFIG_SECTION_DB, "dbengine tier 0 retention size"); + + for(size_t tier = 0; tier < RRD_STORAGE_TIERS ;tier++) { + char old_config[128], new_config[128]; + + snprintfz(old_config, sizeof(old_config), "dbengine tier %zu retention days", tier); + snprintfz(new_config, sizeof(new_config), "dbengine tier %zu retention time", tier); + config_move(CONFIG_SECTION_DB, old_config, + CONFIG_SECTION_DB, new_config); + + if(tier == 0) + snprintfz(old_config, sizeof(old_config), "dbengine multihost disk space MB"); + else + snprintfz(old_config, sizeof(old_config), "dbengine tier %zu multihost disk space MB", tier); + snprintfz(new_config, sizeof(new_config), "dbengine tier %zu retention size", tier); + config_move(CONFIG_SECTION_DB, old_config, + CONFIG_SECTION_DB, new_config); + + snprintfz(old_config, sizeof(old_config), "dbengine tier %zu disk space MB", tier); + snprintfz(new_config, sizeof(new_config), "dbengine tier %zu retention size", tier); + config_move(CONFIG_SECTION_DB, old_config, + CONFIG_SECTION_DB, new_config); + } + + // ---------------------------------------------------------------------------------------------------------------- + config_move(CONFIG_SECTION_LOGS, "error", CONFIG_SECTION_LOGS, "daemon"); @@ -1082,11 +1133,42 @@ static void backwards_compatible_config() { config_move(CONFIG_SECTION_LOGS, "errors flood protection period", CONFIG_SECTION_LOGS, "logs flood protection period"); + config_move(CONFIG_SECTION_HEALTH, "is ephemeral", CONFIG_SECTION_GLOBAL, "is ephemeral node"); config_move(CONFIG_SECTION_HEALTH, "has unstable connection", CONFIG_SECTION_GLOBAL, "has unstable connection"); + + config_move(CONFIG_SECTION_HEALTH, "run at least every seconds", + CONFIG_SECTION_HEALTH, "run at least every"); + + config_move(CONFIG_SECTION_HEALTH, "postpone alarms during hibernation for seconds", + CONFIG_SECTION_HEALTH, "postpone alarms during hibernation for"); + + config_move(CONFIG_SECTION_HEALTH, "health log history", + CONFIG_SECTION_HEALTH, "health log retention"); + + config_move(CONFIG_SECTION_REGISTRY, "registry expire idle persons days", + CONFIG_SECTION_REGISTRY, "registry expire idle persons"); + + config_move(CONFIG_SECTION_WEB, "disconnect idle clients after seconds", + CONFIG_SECTION_WEB, "disconnect idle clients after"); + + config_move(CONFIG_SECTION_WEB, "accept a streaming request every seconds", + CONFIG_SECTION_WEB, "accept a streaming request every"); + + config_move(CONFIG_SECTION_STATSD, "set charts as obsolete after secs", + CONFIG_SECTION_STATSD, "set charts as obsolete after"); + + config_move(CONFIG_SECTION_STATSD, "disconnect idle tcp clients after seconds", + CONFIG_SECTION_STATSD, "disconnect idle tcp clients after"); + + config_move("plugin:idlejitter", "loop time in ms", + "plugin:idlejitter", "loop time"); + + config_move("plugin:proc:/sys/class/infiniband", "refresh ports state every seconds", + "plugin:proc:/sys/class/infiniband", "refresh ports state every"); } static int get_hostname(char *buf, size_t buf_size) { @@ -1119,7 +1201,7 @@ static void get_netdata_configured_variables() // get the hostname netdata_configured_host_prefix = config_get(CONFIG_SECTION_GLOBAL, "host access prefix", ""); - verify_netdata_host_prefix(true); + (void) verify_netdata_host_prefix(true); char buf[HOSTNAME_MAX + 1]; if (get_hostname(buf, HOSTNAME_MAX)) @@ -1131,22 +1213,22 @@ static void get_netdata_configured_variables() // ------------------------------------------------------------------------ // get default database update frequency - default_rrd_update_every = (int) config_get_number(CONFIG_SECTION_DB, "update every", UPDATE_EVERY); + default_rrd_update_every = (int) config_get_duration_seconds(CONFIG_SECTION_DB, "update every", UPDATE_EVERY); if(default_rrd_update_every < 1 || default_rrd_update_every > 600) { netdata_log_error("Invalid data collection frequency (update every) %d given. Defaulting to %d.", default_rrd_update_every, UPDATE_EVERY); default_rrd_update_every = UPDATE_EVERY; - config_set_number(CONFIG_SECTION_DB, "update every", default_rrd_update_every); + config_set_duration_seconds(CONFIG_SECTION_DB, "update every", default_rrd_update_every); } // ------------------------------------------------------------------------ - // get default memory mode for the database + // get the database selection { - const char *mode = config_get(CONFIG_SECTION_DB, "mode", rrd_memory_mode_name(default_rrd_memory_mode)); + const char *mode = config_get(CONFIG_SECTION_DB, "db", rrd_memory_mode_name(default_rrd_memory_mode)); default_rrd_memory_mode = rrd_memory_mode_id(mode); if(strcmp(mode, rrd_memory_mode_name(default_rrd_memory_mode)) != 0) { netdata_log_error("Invalid memory mode '%s' given. Using '%s'", mode, rrd_memory_mode_name(default_rrd_memory_mode)); - config_set(CONFIG_SECTION_DB, "mode", rrd_memory_mode_name(default_rrd_memory_mode)); + config_set(CONFIG_SECTION_DB, "db", rrd_memory_mode_name(default_rrd_memory_mode)); } } @@ -1175,7 +1257,8 @@ static void get_netdata_configured_variables() netdata_configured_cache_dir = config_get(CONFIG_SECTION_DIRECTORIES, "cache", netdata_configured_cache_dir); netdata_configured_varlib_dir = config_get(CONFIG_SECTION_DIRECTORIES, "lib", netdata_configured_varlib_dir); - netdata_configured_lock_dir = initialize_lock_directory_path(netdata_configured_varlib_dir); + netdata_configured_lock_dir = get_varlib_subdir_from_config(netdata_configured_varlib_dir, "lock"); + netdata_configured_cloud_dir = get_varlib_subdir_from_config(netdata_configured_varlib_dir, "cloud.d"); { pluginsd_initialize_plugin_directories(); @@ -1199,17 +1282,19 @@ static void get_netdata_configured_variables() // ------------------------------------------------------------------------ // get default Database Engine page cache size in MiB - default_rrdeng_page_cache_mb = (int) config_get_number(CONFIG_SECTION_DB, "dbengine page cache size MB", default_rrdeng_page_cache_mb); - default_rrdeng_extent_cache_mb = (int) config_get_number(CONFIG_SECTION_DB, "dbengine extent cache size MB", default_rrdeng_extent_cache_mb); + default_rrdeng_page_cache_mb = (int) config_get_size_mb(CONFIG_SECTION_DB, "dbengine page cache size", default_rrdeng_page_cache_mb); + default_rrdeng_extent_cache_mb = (int) config_get_size_mb(CONFIG_SECTION_DB, "dbengine extent cache size", default_rrdeng_extent_cache_mb); db_engine_journal_check = config_get_boolean(CONFIG_SECTION_DB, "dbengine enable journal integrity check", CONFIG_BOOLEAN_NO); - if(default_rrdeng_extent_cache_mb < 0) + if(default_rrdeng_extent_cache_mb < 0) { default_rrdeng_extent_cache_mb = 0; + config_set_size_mb(CONFIG_SECTION_DB, "dbengine extent cache size", default_rrdeng_extent_cache_mb); + } if(default_rrdeng_page_cache_mb < RRDENG_MIN_PAGE_CACHE_SIZE_MB) { netdata_log_error("Invalid page cache size %d given. Defaulting to %d.", default_rrdeng_page_cache_mb, RRDENG_MIN_PAGE_CACHE_SIZE_MB); default_rrdeng_page_cache_mb = RRDENG_MIN_PAGE_CACHE_SIZE_MB; - config_set_number(CONFIG_SECTION_DB, "dbengine page cache size MB", default_rrdeng_page_cache_mb); + config_set_size_mb(CONFIG_SECTION_DB, "dbengine page cache size", default_rrdeng_page_cache_mb); } // ------------------------------------------------------------------------ @@ -1242,28 +1327,24 @@ static void get_netdata_configured_variables() // get KSM settings #ifdef MADV_MERGEABLE - enable_ksm = config_get_boolean(CONFIG_SECTION_DB, "memory deduplication (ksm)", enable_ksm); + enable_ksm = config_get_boolean_ondemand(CONFIG_SECTION_DB, "memory deduplication (ksm)", enable_ksm); #endif // -------------------------------------------------------------------- - // metric correlations - enable_metric_correlations = config_get_boolean(CONFIG_SECTION_GLOBAL, "enable metric correlations", enable_metric_correlations); - default_metric_correlations_method = weights_string_to_method(config_get( - CONFIG_SECTION_GLOBAL, "metric correlations method", - weights_method_to_string(default_metric_correlations_method))); + rrdhost_free_ephemeral_time_s = + config_get_duration_seconds(CONFIG_SECTION_DB, "cleanup ephemeral hosts after", rrdhost_free_ephemeral_time_s); - // -------------------------------------------------------------------- + rrdset_free_obsolete_time_s = + config_get_duration_seconds(CONFIG_SECTION_DB, "cleanup obsolete charts after", rrdset_free_obsolete_time_s); - rrdset_free_obsolete_time_s = config_get_number(CONFIG_SECTION_DB, "cleanup obsolete charts after secs", rrdset_free_obsolete_time_s); - rrdhost_free_ephemeral_time_s = config_get_number(CONFIG_SECTION_DB, "cleanup ephemeral hosts after secs", rrdhost_free_ephemeral_time_s); // Current chart locking and invalidation scheme doesn't prevent Netdata from segmentation faults if a short // cleanup delay is set. Extensive stress tests showed that 10 seconds is quite a safe delay. Look at // https://github.com/netdata/netdata/pull/11222#issuecomment-868367920 for more information. if (rrdset_free_obsolete_time_s < 10) { rrdset_free_obsolete_time_s = 10; - netdata_log_info("The \"cleanup obsolete charts after seconds\" option was set to 10 seconds."); - config_set_number(CONFIG_SECTION_DB, "cleanup obsolete charts after secs", rrdset_free_obsolete_time_s); + netdata_log_info("The \"cleanup obsolete charts after\" option was set to 10 seconds."); + config_set_duration_seconds(CONFIG_SECTION_DB, "cleanup obsolete charts after", rrdset_free_obsolete_time_s); } gap_when_lost_iterations_above = (int)config_get_number(CONFIG_SECTION_DB, "gap when lost iterations above", gap_when_lost_iterations_above); @@ -1276,14 +1357,13 @@ static void get_netdata_configured_variables() // -------------------------------------------------------------------- // get various system parameters - os_get_system_HZ(); os_get_system_cpus_uncached(); os_get_system_pid_max(); } -static void post_conf_load(char **user) +static void post_conf_load(const char **user) { // -------------------------------------------------------------------- // get the user we should run @@ -1298,7 +1378,7 @@ static void post_conf_load(char **user) } } -static bool load_netdata_conf(char *filename, char overwrite_used, char **user) { +static bool load_netdata_conf(char *filename, char overwrite_used, const char **user) { errno_clear(); int ret = 0; @@ -1309,14 +1389,14 @@ static bool load_netdata_conf(char *filename, char overwrite_used, char **user) netdata_log_error("CONFIG: cannot load config file '%s'.", filename); } else { - filename = strdupz_path_subpath(netdata_configured_user_config_dir, "netdata.conf"); + filename = filename_from_path_entry_strdupz(netdata_configured_user_config_dir, "netdata.conf"); ret = config_load(filename, overwrite_used, NULL); if(!ret) { netdata_log_info("CONFIG: cannot load user config '%s'. Will try the stock version.", filename); freez(filename); - filename = strdupz_path_subpath(netdata_configured_stock_config_dir, "netdata.conf"); + filename = filename_from_path_entry_strdupz(netdata_configured_stock_config_dir, "netdata.conf"); ret = config_load(filename, overwrite_used, NULL); if(!ret) netdata_log_info("CONFIG: cannot load stock config '%s'. Running with internal defaults.", filename); @@ -1351,7 +1431,7 @@ int get_system_info(struct rrdhost_system_info *system_info) { char line[200 + 1]; // Removed the double strlens, if the Coverity tainted string warning reappears I'll revert. // One time init code, but I'm curious about the warning... - while (fgets(line, 200, instance->child_stdout_fp) != NULL) { + while (fgets(line, 200, spawn_popen_stdout(instance)) != NULL) { char *value=line; while (*value && *value != '=') value++; if (*value=='=') { @@ -1366,7 +1446,7 @@ int get_system_info(struct rrdhost_system_info *system_info) { if(unlikely(rrdhost_set_system_info_variable(system_info, line, value))) { netdata_log_error("Unexpected environment variable %s=%s", line, value); } else { - setenv(line, value, 1); + nd_setenv(line, value, 1); } } } @@ -1405,12 +1485,13 @@ int unittest_rrdpush_compressions(void); int uuid_unittest(void); int progress_unittest(void); int dyncfg_unittest(void); +bool netdata_random_session_id_generate(void); #ifdef OS_WINDOWS int windows_perflib_dump(const char *key); #endif -int unittest_prepare_rrd(char **user) { +int unittest_prepare_rrd(const char **user) { post_conf_load(user); get_netdata_configured_variables(); default_rrd_update_every = 1; @@ -1422,13 +1503,12 @@ int unittest_prepare_rrd(char **user) { fprintf(stderr, "rrd_init failed for unittest\n"); return 1; } - default_rrdpush_enabled = 0; + stream_conf_send_enabled = 0; return 0; } int netdata_main(int argc, char **argv) { - clocks_init(); string_init(); analytics_init(); @@ -1441,7 +1521,7 @@ int netdata_main(int argc, char **argv) { int config_loaded = 0; bool close_open_fds = true; size_t default_stacksize; - char *user = NULL; + const char *user = NULL; #ifdef OS_WINDOWS int dont_fork = 1; @@ -1455,6 +1535,8 @@ int netdata_main(int argc, char **argv) { // set the name for logging program_name = "netdata"; + curl_global_init(CURL_GLOBAL_ALL); + // parse options { int num_opts = sizeof(option_definitions) / sizeof(struct option_def); @@ -1483,7 +1565,7 @@ int netdata_main(int argc, char **argv) { } else { netdata_log_debug(D_OPTIONS, "Configuration loaded from %s.", optarg); - load_cloud_conf(1); + cloud_conf_load(1); config_loaded = 1; } break; @@ -1499,8 +1581,7 @@ int netdata_main(int argc, char **argv) { config_set(CONFIG_SECTION_WEB, "bind to", optarg); break; case 'P': - strncpy(pidfile, optarg, FILENAME_MAX); - pidfile[FILENAME_MAX] = '\0'; + pidfile = strdupz(optarg); break; case 'p': config_set(CONFIG_SECTION_GLOBAL, "default port", optarg); @@ -1522,7 +1603,6 @@ int netdata_main(int argc, char **argv) { { char* stacksize_string = "stacksize="; char* debug_flags_string = "debug_flags="; - char* claim_string = "claim"; #ifdef ENABLE_DBENGINE char* createdataset_string = "createdataset="; char* stresstest_string = "stresstest="; @@ -1791,7 +1871,7 @@ int netdata_main(int argc, char **argv) { // so the caller can use -c netdata.conf before or // after this parameter to prevent or allow overwriting // variables at netdata.conf - config_set_default(section, key, value); + config_set_default_raw_value(section, key, value); // fprintf(stderr, "SET section '%s', key '%s', value '%s'\n", section, key, value); } @@ -1824,7 +1904,7 @@ int netdata_main(int argc, char **argv) { // so the caller can use -c netdata.conf before or // after this parameter to prevent or allow overwriting // variables at netdata.conf - appconfig_set_default(tmp_config, section, key, value); + appconfig_set_default_raw_value(tmp_config, section, key, value); // fprintf(stderr, "SET section '%s', key '%s', value '%s'\n", section, key, value); } @@ -1870,7 +1950,7 @@ int netdata_main(int argc, char **argv) { if(!config_loaded) { fprintf(stderr, "warning: no configuration file has been loaded. Use -c CONFIG_FILE, before -W get. Using default config.\n"); load_netdata_conf(NULL, 0, &user); - load_cloud_conf(1); + cloud_conf_load(1); } get_netdata_configured_variables(); @@ -1884,10 +1964,6 @@ int netdata_main(int argc, char **argv) { printf("%s\n", value); return 0; } - else if(strncmp(optarg, claim_string, strlen(claim_string)) == 0) { - /* will trigger a claiming attempt when the agent is initialized */ - claiming_pending_arguments = optarg + strlen(claim_string); - } else if(strcmp(optarg, "buildinfo") == 0) { print_build_info(); return 0; @@ -1919,18 +1995,18 @@ int netdata_main(int argc, char **argv) { if (close_open_fds == true) { // close all open file descriptors, except the standard ones // the caller may have left open files (lxc-attach has this issue) - os_close_all_non_std_open_fds_except(NULL, 0); + os_close_all_non_std_open_fds_except(NULL, 0, 0); } if(!config_loaded) { load_netdata_conf(NULL, 0, &user); - load_cloud_conf(0); + cloud_conf_load(0); } // ------------------------------------------------------------------------ // initialize netdata { - char *pmax = config_get(CONFIG_SECTION_GLOBAL, "glibc malloc arena max for plugins", "1"); + const char *pmax = config_get(CONFIG_SECTION_GLOBAL, "glibc malloc arena max for plugins", "1"); if(pmax && *pmax) setenv("MALLOC_ARENA_MAX", pmax, 1); @@ -1970,7 +2046,8 @@ int netdata_main(int argc, char **argv) { // prepare configuration environment variables for the plugins get_netdata_configured_variables(); - set_global_environment(); + set_environment_for_plugins_and_scripts(); + analytics_reset(); // work while we are cd into config_dir // to allow the plugins refer to their config @@ -1986,8 +2063,8 @@ int netdata_main(int argc, char **argv) { // -------------------------------------------------------------------- // get the debugging flags from the configuration file - char *flags = config_get(CONFIG_SECTION_LOGS, "debug flags", "0x0000000000000000"); - setenv("NETDATA_DEBUG_FLAGS", flags, 1); + const char *flags = config_get(CONFIG_SECTION_LOGS, "debug flags", "0x0000000000000000"); + nd_setenv("NETDATA_DEBUG_FLAGS", flags, 1); debug_flags = strtoull(flags, NULL, 0); netdata_log_debug(D_OPTIONS, "Debug flags set to '0x%" PRIX64 "'.", debug_flags); @@ -2013,16 +2090,10 @@ int netdata_main(int argc, char **argv) { nd_log_initialize(); netdata_log_info("Netdata agent version '%s' is starting", NETDATA_VERSION); - ieee754_doubles = is_system_ieee754_double(); - if(!ieee754_doubles) - globally_disabled_capabilities |= STREAM_CAP_IEEE754; - - aral_judy_init(); + check_local_streaming_capabilities(); get_system_timezone(); - bearer_tokens_init(); - replication_initialize(); rrd_functions_inflight_init(); @@ -2030,9 +2101,7 @@ int netdata_main(int argc, char **argv) { // -------------------------------------------------------------------- // get the certificate and start security -#ifdef ENABLE_HTTPS security_init(); -#endif // -------------------------------------------------------------------- // This is the safest place to start the SILENCERS structure @@ -2040,12 +2109,6 @@ int netdata_main(int argc, char **argv) { health_set_silencers_filename(); health_initialize_global_silencers(); -// // -------------------------------------------------------------------- -// // Initialize ML configuration -// -// delta_startup_time("initialize ML"); -// ml_init(); - // -------------------------------------------------------------------- // setup process signals @@ -2053,8 +2116,7 @@ int netdata_main(int argc, char **argv) { // this causes the threads to block signals. delta_startup_time("initialize signals"); - signals_block(); - signals_init(); // setup the signals we want to use + nd_initialize_signals(); // setup the signals we want to use // -------------------------------------------------------------------- // check which threads are enabled and initialize them @@ -2086,7 +2148,7 @@ int netdata_main(int argc, char **argv) { st->init_routine(); if(st->env_name) - setenv(st->env_name, st->enabled?"YES":"NO", 1); + nd_setenv(st->env_name, st->enabled?"YES":"NO", 1); if(st->global_variable) *st->global_variable = (st->enabled) ? true : false; @@ -2097,7 +2159,7 @@ int netdata_main(int argc, char **argv) { delta_startup_time("initialize web server"); - web_client_api_v1_init(); + nd_web_api_init(); web_server_threading_selection(); if(web_server_mode != WEB_SERVER_MODE_NONE) { @@ -2165,7 +2227,7 @@ int netdata_main(int argc, char **argv) { netdata_configured_home_dir = config_get(CONFIG_SECTION_DIRECTORIES, "home", pw->pw_dir); } - setenv("HOME", netdata_configured_home_dir, 1); + nd_setenv("HOME", netdata_configured_home_dir, 1); dyncfg_init(true); @@ -2173,11 +2235,12 @@ int netdata_main(int argc, char **argv) { delta_startup_time("initialize threads after fork"); - netdata_threads_init_after_fork((size_t)config_get_number(CONFIG_SECTION_GLOBAL, "pthread stack size", (long)default_stacksize)); + netdata_threads_init_after_fork((size_t)config_get_size_bytes(CONFIG_SECTION_GLOBAL, "pthread stack size", default_stacksize)); // initialize internal registry delta_startup_time("initialize registry"); registry_init(); + cloud_conf_init_after_registry(); netdata_random_session_id_generate(); // ------------------------------------------------------------------------ @@ -2203,7 +2266,7 @@ int netdata_main(int argc, char **argv) { delta_startup_time("initialize RRD structures"); if(rrd_init(netdata_configured_hostname, system_info, false)) { - set_late_global_environment(system_info); + set_late_analytics_variables(system_info); fatal("Cannot initialize localhost instance with name '%s'.", netdata_configured_hostname); } @@ -2219,15 +2282,10 @@ int netdata_main(int argc, char **argv) { if (fd >= 0) close(fd); - // ------------------------------------------------------------------------ // Claim netdata agent to a cloud endpoint delta_startup_time("collect claiming info"); - - if (claiming_pending_arguments) - claim_agent(claiming_pending_arguments, false, NULL); - load_claiming_state(); // ------------------------------------------------------------------------ @@ -2242,11 +2300,13 @@ int netdata_main(int argc, char **argv) { // ------------------------------------------------------------------------ // spawn the threads + bearer_tokens_init(); + delta_startup_time("start the static threads"); web_server_config_options(); - set_late_global_environment(system_info); + set_late_analytics_variables(system_info); for (i = 0; static_threads[i].name != NULL ; i++) { struct netdata_static_thread *st = &static_threads[i]; @@ -2269,7 +2329,13 @@ int netdata_main(int argc, char **argv) { delta_startup_time("ready"); usec_t ready_ut = now_monotonic_usec(); - netdata_log_info("NETDATA STARTUP: completed in %llu ms. Enjoy real-time performance monitoring!", (ready_ut - started_ut) / USEC_PER_MS); + add_agent_event(EVENT_AGENT_START_TIME, (int64_t ) (ready_ut - started_ut)); + usec_t median_start_time = get_agent_event_time_median(EVENT_AGENT_START_TIME); + netdata_log_info( + "NETDATA STARTUP: completed in %llu ms (median start up time is %llu ms). Enjoy real-time performance monitoring!", + (ready_ut - started_ut) / USEC_PER_MS, median_start_time / USEC_PER_MS); + + cleanup_agent_event_log(); netdata_ready = true; analytics_statistic_t start_statistic = { "START", "-", "-" }; @@ -2295,28 +2361,7 @@ int netdata_main(int argc, char **argv) { } } - // ------------------------------------------------------------------------ - // Report ACLK build failure -#ifndef ENABLE_ACLK - netdata_log_error("This agent doesn't have ACLK."); - char filename[FILENAME_MAX + 1]; - snprintfz(filename, FILENAME_MAX, "%s/.aclk_report_sent", netdata_configured_varlib_dir); - if (netdata_anonymous_statistics_enabled > 0 && access(filename, F_OK)) { // -1 -> not initialized - analytics_statistic_t statistic = { "ACLK_DISABLED", "-", "-" }; - analytics_statistic_send(&statistic); - - int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 444); - if (fd == -1) - netdata_log_error("Cannot create file '%s'. Please fix this.", filename); - else - close(fd); - } -#endif - webrtc_initialize(); - - signals_unblock(); - return 10; } @@ -2327,7 +2372,7 @@ int main(int argc, char *argv[]) if (rc != 10) return rc; - signals_handle(); + nd_process_signals(); return 1; } #endif diff --git a/src/daemon/main.h b/src/daemon/main.h index 3188623b6..f5da3feb6 100644 --- a/src/daemon/main.h +++ b/src/daemon/main.h @@ -23,8 +23,7 @@ typedef enum { SERVICE_CONTEXT = (1 << 10), SERVICE_ANALYTICS = (1 << 11), SERVICE_EXPORTERS = (1 << 12), - SERVICE_ACLKSYNC = (1 << 13), - SERVICE_HTTPD = (1 << 14) + SERVICE_HTTPD = (1 << 13) } SERVICE_TYPE; typedef enum { diff --git a/src/daemon/service.c b/src/daemon/service.c index ead633445..f209cb470 100644 --- a/src/daemon/service.c +++ b/src/daemon/service.c @@ -203,7 +203,7 @@ static void svc_rrd_cleanup_obsolete_charts_from_all_hosts() { if (host == localhost) continue; - netdata_mutex_lock(&host->receiver_lock); + spinlock_lock(&host->receiver_lock); time_t now = now_realtime_sec(); @@ -215,7 +215,7 @@ static void svc_rrd_cleanup_obsolete_charts_from_all_hosts() { host->trigger_chart_obsoletion_check = 0; } - netdata_mutex_unlock(&host->receiver_lock); + spinlock_unlock(&host->receiver_lock); } rrd_rdunlock(); @@ -247,14 +247,12 @@ restart_after_removal: } worker_is_busy(WORKER_JOB_FREE_HOST); -#ifdef ENABLE_ACLK // in case we have cloud connection we inform cloud // a child disconnected - if (netdata_cloud_enabled && force) { + if (force) { aclk_host_state_update(host, 0, 0); unregister_node(host->machine_guid); } -#endif rrdhost_free___while_having_rrd_wrlock(host, force); goto restart_after_removal; } @@ -299,7 +297,7 @@ void *service_main(void *ptr) CLEANUP_FUNCTION_REGISTER(service_main_cleanup) cleanup_ptr = ptr; heartbeat_t hb; - heartbeat_init(&hb); + heartbeat_init(&hb, USEC_PER_SEC); usec_t step = USEC_PER_SEC * SERVICE_HEARTBEAT; usec_t real_step = USEC_PER_SEC; @@ -307,7 +305,7 @@ void *service_main(void *ptr) while (service_running(SERVICE_MAINTENANCE)) { worker_is_idle(); - heartbeat_next(&hb, USEC_PER_SEC); + heartbeat_next(&hb); if (real_step < step) { real_step += USEC_PER_SEC; continue; diff --git a/src/daemon/signals.c b/src/daemon/signals.c index 4e4d7c4d4..163f92ad8 100644 --- a/src/daemon/signals.c +++ b/src/daemon/signals.c @@ -2,12 +2,6 @@ #include "common.h" -/* - * IMPORTANT: Libuv uv_spawn() uses SIGCHLD internally: - * https://github.com/libuv/libuv/blob/cc51217a317e96510fbb284721d5e6bc2af31e33/src/unix/process.c#L485 - * Extreme care is needed when mixing and matching POSIX and libuv. - */ - typedef enum signal_action { NETDATA_SIGNAL_END_OF_LIST, NETDATA_SIGNAL_IGNORE, @@ -56,24 +50,33 @@ static void signal_handler(int signo) { } } -void signals_block(void) { +// Mask all signals, to ensure they will only be unmasked at the threads that can handle them. +// This means that all third party libraries (including libuv) cannot use signals anymore. +// The signals they are interested must be unblocked at their corresponding event loops. +static void posix_mask_all_signals(void) { sigset_t sigset; sigfillset(&sigset); - if(pthread_sigmask(SIG_BLOCK, &sigset, NULL) == -1) - netdata_log_error("SIGNAL: Could not block signals for threads"); + if(pthread_sigmask(SIG_BLOCK, &sigset, NULL) != 0) + netdata_log_error("SIGNAL: cannot mask all signals"); } -void signals_unblock(void) { +// Unmask all signals the netdata main signal handler uses. +// All other signals remain masked. +static void posix_unmask_my_signals(void) { sigset_t sigset; - sigfillset(&sigset); + sigemptyset(&sigset); - if(pthread_sigmask(SIG_UNBLOCK, &sigset, NULL) == -1) { - netdata_log_error("SIGNAL: Could not unblock signals for threads"); - } + for (int i = 0; signals_waiting[i].action != NETDATA_SIGNAL_END_OF_LIST; i++) + sigaddset(&sigset, signals_waiting[i].signo); + + if (pthread_sigmask(SIG_UNBLOCK, &sigset, NULL) != 0) + netdata_log_error("SIGNAL: cannot unmask netdata signals"); } -void signals_init(void) { +void nd_initialize_signals(void) { + posix_mask_all_signals(); // block all signals for all threads + // Catch signals which we want to use struct sigaction sa; sa.sa_flags = 0; @@ -97,22 +100,10 @@ void signals_init(void) { } } -void signals_reset(void) { - struct sigaction sa; - sigemptyset(&sa.sa_mask); - sa.sa_handler = SIG_DFL; - sa.sa_flags = 0; - - int i; - for (i = 0; signals_waiting[i].action != NETDATA_SIGNAL_END_OF_LIST; i++) { - if(sigaction(signals_waiting[i].signo, &sa, NULL) == -1) - netdata_log_error("SIGNAL: Failed to reset signal handler for: %s", signals_waiting[i].name); - } -} +void nd_process_signals(void) { + posix_unmask_my_signals(); -void signals_handle(void) { while(1) { - // pause() causes the calling process (or thread) to sleep until a signal // is delivered that either terminates the process or causes the invocation // of a signal-catching function. diff --git a/src/daemon/signals.h b/src/daemon/signals.h index 26dbc6dcd..897b2b7f0 100644 --- a/src/daemon/signals.h +++ b/src/daemon/signals.h @@ -3,10 +3,7 @@ #ifndef NETDATA_SIGNALS_H #define NETDATA_SIGNALS_H 1 -void signals_init(void); -void signals_block(void); -void signals_unblock(void); -void signals_reset(void); -void signals_handle(void) NORETURN; +void nd_initialize_signals(void); +void nd_process_signals(void) NORETURN; #endif //NETDATA_SIGNALS_H diff --git a/src/daemon/static_threads.c b/src/daemon/static_threads.c index c6ec79956..3e5b7e350 100644 --- a/src/daemon/static_threads.c +++ b/src/daemon/static_threads.c @@ -133,7 +133,6 @@ const struct netdata_static_thread static_threads_common[] = { }, #endif -#ifdef ENABLE_ACLK { .name = "ACLK_MAIN", .config_section = NULL, @@ -143,7 +142,6 @@ const struct netdata_static_thread static_threads_common[] = { .init_routine = NULL, .start_routine = aclk_main }, -#endif { .name = "RRDCONTEXT", diff --git a/src/daemon/unit_test.c b/src/daemon/unit_test.c index 0f15f67d7..46166d673 100644 --- a/src/daemon/unit_test.c +++ b/src/daemon/unit_test.c @@ -1437,8 +1437,8 @@ int check_strdupz_path_subpath() { size_t i; for(i = 0; checks[i].result ; i++) { - char *s = strdupz_path_subpath(checks[i].path, checks[i].subpath); - fprintf(stderr, "strdupz_path_subpath(\"%s\", \"%s\") = \"%s\": ", checks[i].path, checks[i].subpath, s); + char *s = filename_from_path_entry_strdupz(checks[i].path, checks[i].subpath); + fprintf(stderr, "filename_from_path_entry_strdupz(\"%s\", \"%s\") = \"%s\": ", checks[i].path, checks[i].subpath, s); if(!s || strcmp(s, checks[i].result) != 0) { freez(s); fprintf(stderr, "FAILED\n"); diff --git a/src/daemon/win_system-info.c b/src/daemon/win_system-info.c index 2d67862fb..517692dff 100644 --- a/src/daemon/win_system-info.c +++ b/src/daemon/win_system-info.c @@ -108,10 +108,11 @@ static void netdata_windows_get_mem(struct rrdhost_system_info *systemInfo) { ULONGLONG size; char memSize[256]; + // The amount of physically installed RAM, in kilobytes. if (!GetPhysicallyInstalledSystemMemory(&size)) size = 0; else - (void)snprintf(memSize, 255, "%llu", size); + (void)snprintf(memSize, 255, "%llu", size * 1024); // to bytes (void)rrdhost_set_system_info_variable(systemInfo, "NETDATA_SYSTEM_TOTAL_RAM", @@ -220,32 +221,25 @@ static void netdata_windows_discover_os_version(char *os, size_t length, DWORD b } // We are not testing older, because it is not supported anymore by Microsoft - (void)snprintf(os, length, "Microsoft Windows Version %s, Build %d (Name: Windows %s)", versionName, build, version); + (void)snprintf(os, length, "Microsoft Windows Version %s, Build %d", version, build); } -static void netdata_windows_os_version(char *out, DWORD length) +static void netdata_windows_os_kernel_version(char *out, DWORD length, DWORD build) { - if (netdata_registry_get_string(out, - length, + DWORD major, minor; + if (!netdata_registry_get_dword(&major, HKEY_LOCAL_MACHINE, "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion", - "ProductName")) - return; + "CurrentMajorVersionNumber")) + major = 0; - (void)snprintf(out, length, "%s", NETDATA_DEFAULT_SYSTEM_INFO_VALUE_UNKNOWN); -} - -static void netdata_windows_os_kernel_version(char *out, DWORD length, DWORD build) -{ - char version[8]; - if (!netdata_registry_get_string(version, - 7, + if (!netdata_registry_get_dword(&minor, HKEY_LOCAL_MACHINE, "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion", - "CurrentVersion")) - version[0] = '\0'; + "CurrentMinorVersionNumber")) + minor = 0; - (void)snprintf(out, length, "%s (build: %u)", version, build); + (void)snprintf(out, length, "Windows %u.%u.%u Build: %u", major, minor, build, build); } static void netdata_windows_host(struct rrdhost_system_info *systemInfo) @@ -261,7 +255,6 @@ static void netdata_windows_host(struct rrdhost_system_info *systemInfo) (void)rrdhost_set_system_info_variable( systemInfo, "NETDATA_HOST_OS_ID_LIKE", NETDATA_DEFAULT_SYSTEM_INFO_VALUE_UNKNOWN); - netdata_windows_os_version(osVersion, 4095); (void)rrdhost_set_system_info_variable(systemInfo, "NETDATA_HOST_OS_VERSION", osVersion); (void)rrdhost_set_system_info_variable(systemInfo, "NETDATA_HOST_OS_VERSION_ID", osVersion); @@ -306,6 +299,11 @@ static void netdata_windows_container(struct rrdhost_system_info *systemInfo) systemInfo, "NETDATA_CONTAINER_IS_OFFICIAL_IMAGE", NETDATA_DEFAULT_SYSTEM_INFO_VALUE_FALSE); } +static void netdata_windows_install_type(struct rrdhost_system_info *systemInfo) +{ + (void)rrdhost_set_system_info_variable(systemInfo, "NETDATA_INSTALL_TYPE", "netdata-installer.exe"); +} + void netdata_windows_get_system_info(struct rrdhost_system_info *systemInfo) { netdata_windows_cloud(systemInfo); @@ -314,5 +312,6 @@ void netdata_windows_get_system_info(struct rrdhost_system_info *systemInfo) netdata_windows_get_cpu(systemInfo); netdata_windows_get_mem(systemInfo); netdata_windows_get_total_disk_size(systemInfo); + netdata_windows_install_type(systemInfo); } #endif diff --git a/src/daemon/winsvc.cc b/src/daemon/winsvc.cc index 9c5eb49ff..a56f5eb7c 100644 --- a/src/daemon/winsvc.cc +++ b/src/daemon/winsvc.cc @@ -4,12 +4,10 @@ extern "C" { #include "libnetdata/libnetdata.h" int netdata_main(int argc, char *argv[]); -void signals_handle(void); +void nd_process_signals(void); } -#include <windows.h> - __attribute__((format(printf, 1, 2))) static void netdata_service_log(const char *fmt, ...) { @@ -74,7 +72,7 @@ static HANDLE CreateEventHandle(const char *msg) if (!h) { - netdata_service_log(msg); + netdata_service_log("%s", msg); if (!ReportSvcStatus(SERVICE_STOPPED, GetLastError(), 1000, 0)) { @@ -219,7 +217,11 @@ static bool update_path() { int main(int argc, char *argv[]) { +#if defined(OS_WINDOWS) && defined(RUN_UNDER_CLION) + bool tty = true; +#else bool tty = isatty(fileno(stdin)) == 1; +#endif if (!update_path()) { return 1; @@ -231,7 +233,7 @@ int main(int argc, char *argv[]) if (rc != 10) return rc; - signals_handle(); + nd_process_signals(); return 1; } else |