diff options
Diffstat (limited to 'claim')
-rw-r--r-- | claim/Makefile.am | 21 | ||||
-rw-r--r-- | claim/README.md | 394 | ||||
-rw-r--r-- | claim/claim.c | 194 | ||||
-rw-r--r-- | claim/claim.h | 16 | ||||
-rwxr-xr-x | claim/netdata-claim.sh.in | 401 |
5 files changed, 1026 insertions, 0 deletions
diff --git a/claim/Makefile.am b/claim/Makefile.am new file mode 100644 index 00000000..c838db9b --- /dev/null +++ b/claim/Makefile.am @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +CLEANFILES = \ + netdata-claim.sh \ + $(NULL) + +include $(top_srcdir)/build/subst.inc +SUFFIXES = .in + +sbin_SCRIPTS = \ + netdata-claim.sh \ + $(NULL) + +dist_noinst_DATA = \ + netdata-claim.sh.in \ + README.md \ + $(NULL) + diff --git a/claim/README.md b/claim/README.md new file mode 100644 index 00000000..ade6a221 --- /dev/null +++ b/claim/README.md @@ -0,0 +1,394 @@ +<!-- +title: "Agent claiming" +description: "Agent claiming allows a Netdata Agent, running on a distributed node, to securely connect to Netdata Cloud via the encrypted Agent-Cloud link (ACLK)." +custom_edit_url: https://github.com/netdata/netdata/edit/master/claim/README.md +--> + +# Agent claiming + +Agent claiming allows a Netdata Agent, running on a distributed node, to securely connect to Netdata Cloud. A Space's +administrator creates a **claiming token**, which is used to add an Agent to their Space via the [Agent-Cloud link +(ACLK)](/aclk/README.md). + +Are you just starting out with Netdata Cloud? See our [get started with +Cloud](https://learn.netdata.cloud/docs/cloud/get-started) guide for a walkthrough of the process and simplified +instructions. + +Claiming nodes is a security feature in Netdata Cloud. Through the process of claiming, you demonstrate in a few ways +that you have administrative access to that node and the configuration settings for its Agent. By logging into the node, +you prove you have access, and by using the claiming script or the Netdata command line, you prove you have write access +and administrative privileges. + +Only the administrators of a Space in Netdata Cloud can view the claiming token and accompanying script generated by +Netdata Cloud. + +> The claiming process ensures no third party can add your node, and then view your node's metrics, in a Cloud account, +> Space, or War Room that you did not authorize. + +By claiming a node, you opt-in to sending data from your Agent to Netdata Cloud via the [ACLK](/aclk/README.md). This +data is encrypted by TLS while it is in transit. We use the RSA keypair created during claiming to authenticate the +identity of the Agent when it connects to the Cloud. While the data does flow through Netdata Cloud servers on its way +from Agents to the browser, we do not store or log it. + +You can claim a node during the Cloud onboarding process, or after you created a Space by clicking on the **USER's +Space** dropdown, then **Manage claimed nodes**. + +There are two important notes regarding claiming: + +- _You can only claim any given node in a single Space_. You can, however, add that claimed node to multiple War Rooms + within that one Space. +- You must repeat the claiming process on every node you want to add to Netdata Cloud. + +## How to claim a node + +To claim a node, select which War Rooms you want to add this node to with the dropdown, then copy and paste the script +given by Cloud into your node's terminal. Hit **Enter**. + +```bash +sudo netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud +``` + +The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if you don't see +the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). If you prefer not to +use root privileges via `sudo` to run the claiming script, see the next section. + +Repeat this process with every node you want to add to Cloud during onboarding. You can also add more nodes once you've +finished onboarding. + +### Claim an agent without root privileges + +If you don't want to run the claiming script with root privileges, you can discover which user is running the Agent, +switch to that user, and run the claiming script. + +Use `grep` to search your `netdata.conf` file, which is typically located at `/etc/netdata/netdata.conf`, for the `run +as user` setting. For example: + +```bash +grep "run as user" /etc/netdata/netdata.conf + # run as user = netdata +``` + +The default user is `netdata`. Yours may be different, so pay attention to the output from `grep`. Switch to that user +and run the claiming script. + +```bash +netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud +``` + +Hit **Enter**. The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +### Claim an Agent running in Docker + +The claiming process works with Agents running inside of Docker containers. You can use `docker exec` to run the +claiming script on containers already running, or append the claiming script to `docker run` to create a new container +and immediately claim it. + +#### Running Agent containers + +Claim a _running Agent container_ by appending the script offered by Cloud to a `docker exec ...` command, replacing +`netdata` with the name of your running container: + +```bash +docker exec -it netdata netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud +``` + +The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +#### New/ephemeral Agent containers + +Claim a newly-created container with `docker run ...`. + +In the example below, the last line calls the [daemon binary](/daemon/README.md), sets essential variables, and then +executes claiming using the information after `-W "claim... `. You should copy the relevant token, rooms, and URL from +Cloud. + +```bash +docker run -d --name=netdata \ + -p 19999:19999 \ + -v netdatalib:/var/lib/netdata \ + -v netdatacache:/var/cache/netdata \ + -v /etc/passwd:/host/etc/passwd:ro \ + -v /etc/group:/host/etc/group:ro \ + -v /proc:/host/proc:ro \ + -v /sys:/host/sys:ro \ + -v /etc/os-release:/host/etc/os-release:ro \ + --restart unless-stopped \ + --cap-add SYS_PTRACE \ + --security-opt apparmor=unconfined \ + netdata/netdata \ + -W set2 cloud global enabled true -W set2 cloud global "cloud base url" "https://app.netdata.cloud" -W "claim \ + -token=TOKEN \ + -rooms=ROOM1,ROOM2 \ + -url=https://app.netdata.cloud" +``` + +The container runs in detached mode, so you won't see any output. If the node does not appear in your Space, you can run +the following to find any error output and use that to guide your [troubleshooting](#troubleshooting). Replace `netdata` +with the name of your container if different. + +```bash +docker logs netdata 2>&1 | grep -E --line-buffered 'ACLK|claim|cloud' +``` + +### Claim a Kubernetes cluster's parent Netdata pod + +Read our [Kubernetes installation](/packaging/installer/methods/kubernetes.md#claim-a-kubernetes-clusters-parent-pod) +for details on claiming a parent Netdata pod. + +### Claim through a proxy + +A Space's administrator can claim a node through a SOCKS5 or HTTP(S) proxy. + +You should first configure the proxy in the `[cloud]` section of `netdata.conf`. The proxy settings you specify here +will also be used to tunnel the ACLK. The default `proxy` setting is `none`. + +```conf +[cloud] + proxy = none +``` + +The `proxy` setting can take one of the following values: + +- `none`: Do not use a proxy, even if the system configured otherwise. +- `env`: Try to read proxy settings from set environment variables `http_proxy`/`socks_proxy`. +- `socks5[h]://[user:pass@]host:ip`: The ACLK and claiming will use the specified SOCKS5 proxy. +- `http://[user:pass@]host:ip`: The ACLK and claiming will use the specified HTTP(S) proxy. + +For example, a SOCKS5 proxy setting may look like the following: + +```conf +[cloud] + proxy = socks5h://203.0.113.0:1080 # With an IP address + proxy = socks5h://proxy.example.com:1080 # With a URL +``` + +You can now move on to claiming. When you claim with the `netdata-claim.sh` script, add the `-proxy=` parameter and +append the same proxy setting you added to `netdata.conf`. + +```bash +sudo netdata-claim.sh -token=MYTOKEN1234567 -rooms=room1,room2 -url=https://app.netdata.cloud -proxy=socks5h://203.0.113.0:1080 +``` + +Hit **Enter**. The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +### Troubleshooting + +If you're having trouble claiming a node, this may be because the [ACLK](/aclk/README.md) cannot connect to Cloud. + +With the Netdata Agent running, visit `http://NODE:19999/api/v1/info` in your browser, replacing `NODE` with the IP +address or hostname of your Agent. The returned JSON contains four keys that will be helpful to diagnose any issues you +might be having with the ACLK or claiming process. + +```json + "cloud-enabled" + "cloud-available" + "agent-claimed" + "aclk-available" +``` + +Use these keys and the information below to troubleshoot the ACLK. + +#### bash: netdata-claim.sh: command not found + +If you run the claiming script and see a `command not found` error, you either installed Netdata in a non-standard +location or are using an unsupported package. If you installed Netdata in a non-standard path using the `--install` +option, you need to update your `$PATH` or run `netdata-claim.sh` using the full path. For example, if you installed +Netdata to `/opt/netdata`, use `/opt/netdata/bin/netdata-claim.sh` to run the claiming script. + +If you are using an unsupported package, such as a third-party `.deb`/`.rpm` package provided by your distribution, +please remove that package and reinstall using our [recommended kickstart +script](/docs/get/README.md#install-the-netdata-agent). + +#### Claiming on older distributions (Ubuntu 14.04, Debian 8, CentOS 6) + +If you're running an older Linux distribution or one that has reached EOL, such as Ubuntu 14.04 LTS, Debian 8, or CentOS +6, your Agent may not be able to securely connect to Netdata Cloud due to an outdated version of OpenSSL. These old +versions of OpenSSL cannot perform [hostname validation](https://wiki.openssl.org/index.php/Hostname_validation), which +helps securely encrypt SSL connections. + +We recommend you reinstall Netdata with a [static build](/packaging/installer/methods/kickstart-64.md), which uses an +up-to-date version of OpenSSL with hostname validation enabled. + +If you choose to continue using the outdated version of OpenSSL, your node will still connect to Netdata Cloud, albeit +with hostname verification disabled. Without verification, your Netdata Cloud connection could be vulnerable to +man-in-the-middle attacks. + +#### cloud-enabled is false + +If `cloud-enabled` is `false`, you probably ran the installer with `--disable-cloud` option. + +Additionally, check that the `enabled` setting in `var/lib/netdata/cloud.d/cloud.conf` is set to `true`: + +```conf +[global] + enabled = true +``` + +To fix this issue, reinstall Netdata using your [preferred method](/packaging/installer/README.md) and do not add the +`--disable-cloud` option. + +#### cloud-available is false + +If `cloud-available` is `false` after you verified Cloud is enabled in the previous step, the most likely issue is that +Cloud features failed to build during installation. + +If Cloud features fail to build, the installer continues and finishes the process without Cloud functionality as opposed +to failing the installation altogether. We do this to ensure the Agent will always finish installing. + +If you can't see an explicit error in the installer's output, you can run the installer with the `--require-cloud` +option. This option causes the installation to fail if Cloud functionality can't be built and enabled, and the +installer's output should give you more error details. + +You may see one of the following error messages during installation: + +- Failed to build libmosquitto. The install process will continue, but you will not be able to connect this node to + Netdata Cloud. +- Unable to fetch sources for libmosquitto. The install process will continue, but you will not be able to connect + this node to Netdata Cloud. +- Failed to build libwebsockets. The install process will continue, but you may not be able to connect this node to + Netdata Cloud. +- Unable to fetch sources for libwebsockets. The install process will continue, but you may not be able to connect + this node to Netdata Cloud. +- Could not find cmake, which is required to build libwebsockets. The install process will continue, but you may not + be able to connect this node to Netdata Cloud. +- Could not find cmake, which is required to build JSON-C. The install process will continue, but Netdata Cloud + support will be disabled. +- Failed to build JSON-C. Netdata Cloud support will be disabled. +- Unable to fetch sources for JSON-C. Netdata Cloud support will be disabled. + +One common cause of the installer failing to build Cloud features is not having one of the following dependencies on +your system: `cmake` and OpenSSL, including the `devel` package. + +You can also look for error messages in `/var/log/netdata/error.log`. Try one of the following two commands to search +for ACLK-related errors. + +```bash +less /var/log/netdata/error.log +grep -i ACLK /var/log/netdata/error.log +``` + +If the installer's output does not help you enable Cloud features, contact us by [creating an issue on +GitHub](https://github.com/netdata/netdata/issues/new?labels=bug%2C+needs+triage%2C+ACLK&template=bug_report.md&title=The+installer+failed+to+prepare+the+required+dependencies+for+Netdata+Cloud+functionality) +with details about your system and relevant output from `error.log`. + +#### agent-claimed is false + +You must [claim your node](#how-to-claim-a-node). + +#### aclk-available is false + +If `aclk-available` is `false` and all other keys are `true`, your Agent is having trouble connecting to the Cloud +through the ACLK. Please check your system's firewall. + +If your Agent needs to use a proxy to access the internet, you must [set up a proxy for +claiming](#claim-through-a-proxy). + +If you are certain firewall and proxy settings are not the issue, you should consult the Agent's `error.log` at +`/var/log/netdata/error.log` and contact us by [creating an issue on +GitHub](https://github.com/netdata/netdata/issues/new?labels=bug%2C+needs+triage%2C+ACLK&template=bug_report.md&title=ACLK-available-is-false) +with details about your system and relevant output from `error.log`. + +### Remove and reclaim a node + +To remove a node from your Space in Netdata Cloud, delete the `cloud.d/` directory in your Netdata library directory. + +```bash +cd /var/lib/netdata # Replace with your Netdata library directory, if not /var/lib/netdata/ +sudo rm -rf cloud.d/ +``` + +This node no longer has access to the credentials it was claimed with and cannot connect to Netdata Cloud via the ACLK. +You will still be able to see this node in your War Rooms in an **unreachable** state. + +If you want to reclaim this node into a different Space, you need to create a new identity by adding `-id=$(uuidgen)` to +the claiming script parameters. Make sure that you have the `uuidgen-runtime` package installed, as it is used to run the command `uuidgen`. For example, using the default claiming script: + +```bash +sudo netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud -id=$(uuidgen) +``` + +The agent _must be restarted_ after this change. + +## Claiming reference + +In the sections below, you can find reference material for the claiming script, claiming via the Agent's command line +tool, and details about the files found in `cloud.d`. + +### The `cloud.conf` file + +This section defines how and whether your Agent connects to [Netdata Cloud](https://learn.netdata.cloud/docs/cloud/) +using the [ACLK](/aclk/README.md). + +| setting | default | info | +|:-------------- |:------------------------- |:-------------------------------------------------------------------------------------------------------------------------------------- | +| cloud base url | https://app.netdata.cloud | The URL for the Netdata Cloud web application. You should not change this. If you want to disable Cloud, change the `enabled` setting. | +| enabled | yes | The runtime option to disable the [Agent-Cloud link](/aclk/README.md) and prevent your Agent from connecting to Netdata Cloud. | + +### Claiming script + +A Space's administrator can claim an Agent by directly calling the `netdata-claim.sh` script either with root privileges +using `sudo`, or as the user running the Agent (typically `netdata`), and passing the following arguments: + +```sh +-token=TOKEN + where TOKEN is the Space's claiming token. +-rooms=ROOM1,ROOM2,... + where ROOMX is the War Room this node should be added to. This list is optional. +-url=URL_BASE + where URL_BASE is the Netdata Cloud endpoint base URL. By default, this is https://app.netdata.cloud. +-id=AGENT_ID + where AGENT_ID is the unique identifier of the Agent. This is the Agent's MACHINE_GUID by default. +-hostname=HOSTNAME + where HOSTNAME is the result of the hostname command by default. +-proxy=PROXY_URL + where PROXY_URL is the endpoint of a SOCKS5 proxy. +``` + +For example, the following command claims an Agent and adds it to rooms `room1` and `room2`: + +```sh +netdata-claim.sh -token=MYTOKEN1234567 -rooms=room1,room2 +``` + +You should then update the `netdata` service about the result with `netdatacli`: + +```sh +netdatacli reload-claiming-state +``` + +This reloads the Agent claiming state from disk. + +### Netdata Agent command line + +If a Netdata Agent is running, the Space's administrator can claim a node using the `netdata` service binary with +additional command line parameters: + +```sh +-W "claim -token=TOKEN -rooms=ROOM1,ROOM2" +``` + +For example: + +```sh +/usr/sbin/netdata -D -W "claim -token=MYTOKEN1234567 -rooms=room1,room2" +``` + +If need be, the user can override the Agent's defaults by providing additional arguments like those described +[here](#claiming-script). + +### Claiming directory + +Netdata stores the Agent's claiming-related state in the Netdata library directory under `cloud.d`. For a default +installation, this directory exists at `/var/lib/netdata/cloud.d`. The directory and its files should be owned by the +user that runs the Agent, which is typically the `netdata` user. + +The `cloud.d/token` file should contain the claiming-token and the `cloud.d/rooms` file should contain the list of War +Rooms you added that node to. + +The user can also put the Cloud endpoint's full certificate chain in `cloud.d/cloud_fullchain.pem` so that the Agent +can trust the endpoint if necessary. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fclaim%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/claim/claim.c b/claim/claim.c new file mode 100644 index 00000000..bb931608 --- /dev/null +++ b/claim/claim.c @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "claim.h" +#include "../registry/registry_internals.h" +#include "../aclk/legacy/aclk_common.h" + +char *claiming_pending_arguments = NULL; + +static char *claiming_errors[] = { + "Agent claimed successfully", // 0 + "Unknown argument", // 1 + "Problems with claiming working directory", // 2 + "Missing dependencies", // 3 + "Failure to connect to endpoint", // 4 + "The CLI didn't work", // 5 + "Wrong user", // 6 + "Unknown HTTP error message", // 7 + "invalid node id", // 8 + "invalid node name", // 9 + "invalid room id", // 10 + "invalid public key", // 11 + "token expired/token not found/invalid token", // 12 + "already claimed", // 13 + "processing claiming", // 14 + "Internal Server Error", // 15 + "Gateway Timeout", // 16 + "Service Unavailable", // 17 + "Agent Unique Id Not Readable" // 18 +}; + +/* Retrieve the claim id for the agent. + * Caller owns the string. +*/ +char *is_agent_claimed() +{ + char *result; + rrdhost_aclk_state_lock(localhost); + result = (localhost->aclk_state.claimed_id == NULL) ? NULL : strdupz(localhost->aclk_state.claimed_id); + rrdhost_aclk_state_unlock(localhost); + return result; +} + +#define CLAIMING_COMMAND_LENGTH 16384 +#define CLAIMING_PROXY_LENGTH CLAIMING_COMMAND_LENGTH/4 + +extern struct registry registry; + +/* rrd_init() and post_conf_load() must have been called before this function */ +void claim_agent(char *claiming_arguments) +{ + if (!netdata_cloud_setting) { + error("Refusing to claim agent -> cloud functionality has been disabled"); + return; + } + +#ifndef DISABLE_CLOUD + int exit_code; + pid_t command_pid; + char command_buffer[CLAIMING_COMMAND_LENGTH + 1]; + FILE *fp; + + // This is guaranteed to be set early in main via post_conf_load() + char *cloud_base_url = appconfig_get(&cloud_config, CONFIG_SECTION_GLOBAL, "cloud base url", NULL); + if (cloud_base_url == NULL) + fatal("Do not move the cloud base url out of post_conf_load!!"); + const char *proxy_str; + ACLK_PROXY_TYPE proxy_type; + char proxy_flag[CLAIMING_PROXY_LENGTH] = "-noproxy"; + + proxy_str = aclk_get_proxy(&proxy_type); + + if (proxy_type == PROXY_TYPE_SOCKS5 || proxy_type == PROXY_TYPE_HTTP) + snprintf(proxy_flag, CLAIMING_PROXY_LENGTH, "-proxy=\"%s\"", proxy_str); + + snprintfz(command_buffer, + CLAIMING_COMMAND_LENGTH, + "exec netdata-claim.sh %s -hostname=%s -id=%s -url=%s -noreload %s", + + proxy_flag, + netdata_configured_hostname, + localhost->machine_guid, + cloud_base_url, + claiming_arguments); + + info("Executing agent claiming command 'netdata-claim.sh'"); + fp = mypopen(command_buffer, &command_pid); + if(!fp) { + error("Cannot popen(\"%s\").", command_buffer); + return; + } + info("Waiting for claiming command to finish."); + while (fgets(command_buffer, CLAIMING_COMMAND_LENGTH, fp) != NULL) {;} + exit_code = mypclose(fp, command_pid); + info("Agent claiming command returned with code %d", exit_code); + if (0 == exit_code) { + load_claiming_state(); + return; + } + if (exit_code < 0) { + error("Agent claiming command failed to complete its run."); + return; + } + errno = 0; + unsigned maximum_known_exit_code = sizeof(claiming_errors) / sizeof(claiming_errors[0]) - 1; + + if ((unsigned)exit_code > maximum_known_exit_code) { + error("Agent failed to be claimed with an unknown error."); + return; + } + error("Agent failed to be claimed with the following error message:"); + error("\"%s\"", claiming_errors[exit_code]); +#else + UNUSED(claiming_arguments); + UNUSED(claiming_errors); +#endif +} + +#ifdef ENABLE_ACLK +extern int aclk_connected, aclk_kill_link, aclk_disable_runtime; +#endif + +/* Change the claimed state of the agent. + * + * This only happens when the user has explicitly requested it: + * - via the cli tool by reloading the claiming state + * - after spawning the claim because of a command-line argument + * If this happens with the ACLK active under an old claim then we MUST KILL THE LINK + */ +void load_claiming_state(void) +{ + // -------------------------------------------------------------------- + // Check if the cloud is enabled +#if defined( DISABLE_CLOUD ) || !defined( ENABLE_ACLK ) + netdata_cloud_setting = 0; +#else + uuid_t uuid; + rrdhost_aclk_state_lock(localhost); + if (localhost->aclk_state.claimed_id) { + freez(localhost->aclk_state.claimed_id); + localhost->aclk_state.claimed_id = NULL; + } + if (aclk_connected) + { + info("Agent was already connected to Cloud - forcing reconnection under new credentials"); + aclk_kill_link = 1; + } + aclk_disable_runtime = 0; + + // Propagate into aclk and registry. Be kind of atomic... + appconfig_get(&cloud_config, CONFIG_SECTION_GLOBAL, "cloud base url", DEFAULT_CLOUD_BASE_URL); + + char filename[FILENAME_MAX + 1]; + snprintfz(filename, FILENAME_MAX, "%s/cloud.d/claimed_id", netdata_configured_varlib_dir); + + long bytes_read; + char *claimed_id = read_by_filename(filename, &bytes_read); + if(claimed_id && uuid_parse(claimed_id, uuid)) { + error("claimed_id \"%s\" doesn't look like valid UUID", claimed_id); + freez(claimed_id); + claimed_id = NULL; + } + localhost->aclk_state.claimed_id = claimed_id; + rrdhost_aclk_state_unlock(localhost); + if (!claimed_id) { + info("Unable to load '%s', setting state to AGENT_UNCLAIMED", filename); + return; + } + + info("File '%s' was found. Setting state to AGENT_CLAIMED.", filename); + netdata_cloud_setting = appconfig_get_boolean(&cloud_config, CONFIG_SECTION_GLOBAL, "enabled", 1); +#endif +} + +struct config cloud_config = { .first_section = NULL, + .last_section = NULL, + .mutex = NETDATA_MUTEX_INITIALIZER, + .index = { .avl_tree = { .root = NULL, .compar = appconfig_section_compare }, + .rwlock = AVL_LOCK_INITIALIZER } }; + +void load_cloud_conf(int silent) +{ + char *filename; + errno = 0; + + int ret = 0; + + filename = strdupz_path_subpath(netdata_configured_varlib_dir, "cloud.d/cloud.conf"); + + ret = appconfig_load(&cloud_config, filename, 1, NULL); + if(!ret && !silent) { + info("CONFIG: cannot load cloud config '%s'. Running with internal defaults.", filename); + } + freez(filename); +} diff --git a/claim/claim.h b/claim/claim.h new file mode 100644 index 00000000..2fd8d3e9 --- /dev/null +++ b/claim/claim.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_CLAIM_H +#define NETDATA_CLAIM_H 1 + +#include "../daemon/common.h" + +extern char *claiming_pending_arguments; +extern struct config cloud_config; + +void claim_agent(char *claiming_arguments); +char *is_agent_claimed(void); +void load_claiming_state(void); +void load_cloud_conf(int silent); + +#endif //NETDATA_CLAIM_H diff --git a/claim/netdata-claim.sh.in b/claim/netdata-claim.sh.in new file mode 100755 index 00000000..7ac91d7f --- /dev/null +++ b/claim/netdata-claim.sh.in @@ -0,0 +1,401 @@ +#!/usr/bin/env bash +# netdata +# real-time performance and health monitoring, done right! +# (C) 2017 Costa Tsaousis <costa@tsaousis.gr> +# SPDX-License-Identifier: GPL-3.0-or-later + +# Exit code: 0 - Success +# Exit code: 1 - Unknown argument +# Exit code: 2 - Problems with claiming working directory +# Exit code: 3 - Missing dependencies +# Exit code: 4 - Failure to connect to endpoint +# Exit code: 5 - The CLI didn't work +# Exit code: 6 - Wrong user +# Exit code: 7 - Unknown HTTP error message +# +# OK: Agent claimed successfully +# HTTP Status code: 204 +# Exit code: 0 +# +# Unknown HTTP error message +# HTTP Status code: 422 +# Exit code: 7 +ERROR_KEYS[7]="None" +ERROR_MESSAGES[7]="Unknown HTTP error message" + +# Error: The agent id is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 8 +ERROR_KEYS[8]="ErrInvalidNodeID" +ERROR_MESSAGES[8]="invalid node id" + +# Error: The agent hostname is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 9 +ERROR_KEYS[9]="ErrInvalidNodeName" +ERROR_MESSAGES[9]="invalid node name" + +# Error: At least one of the given rooms ids is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 10 +ERROR_KEYS[10]="ErrInvalidRoomID" +ERROR_MESSAGES[10]="invalid room id" + +# Error: Invalid public key; the public key is empty or not present +# HTTP Status code: 422 +# Exit code: 11 +ERROR_KEYS[11]="ErrInvalidPublicKey" +ERROR_MESSAGES[11]="invalid public key" +# +# Error: Expired, missing or invalid token +# HTTP Status code: 403 +# Exit code: 12 +ERROR_KEYS[12]="ErrForbidden" +ERROR_MESSAGES[12]="token expired/token not found/invalid token" + +# Error: Duplicate agent id; an agent with the same id is already registered in the cloud +# HTTP Status code: 409 +# Exit code: 13 +ERROR_KEYS[13]="ErrAlreadyClaimed" +ERROR_MESSAGES[13]="already claimed" + +# Error: The node claiming process is still in progress. +# HTTP Status code: 102 +# Exit code: 14 +ERROR_KEYS[14]="ErrProcessingClaim" +ERROR_MESSAGES[14]="processing claiming" + +# Error: Internal server error. Any other unexpected error (DB problems, etc.) +# HTTP Status code: 500 +# Exit code: 15 +ERROR_KEYS[15]="ErrInternalServerError" +ERROR_MESSAGES[15]="Internal Server Error" + +# Error: There was a timout processing the claim. +# HTTP Status code: 504 +# Exit code: 16 +ERROR_KEYS[16]="ErrGatewayTimeout" +ERROR_MESSAGES[16]="Gateway Timeout" + +# Error: The service cannot handle the claiming request at this time. +# HTTP Status code: 503 +# Exit code: 17 +ERROR_KEYS[17]="ErrServiceUnavailable" +ERROR_MESSAGES[17]="Service Unavailable" + +# Exit code: 18 - Agent unique id is not generated yet. + +get_config_value() { + conf_file="${1}" + section="${2}" + key_name="${3}" + config_result=$(@sbindir_POST@/netdatacli 2>/dev/null read-config "$conf_file|$section|$key_name"; exit $?) + # shellcheck disable=SC2181 + if [ "$?" != "0" ]; then + echo >&2 "cli failed, assume netdata is not running and query the on-disk config" + config_result=$(@sbindir_POST@/netdata 2>/dev/null -W get2 "$conf_file" "$section" "$key_name" unknown_default) + fi + echo "$config_result" +} +if command -v curl >/dev/null 2>&1 ; then + URLTOOL="curl" +elif command -v wget >/dev/null 2>&1 ; then + URLTOOL="wget" +else + echo >&2 "I need curl or wget to proceed, but neither is available on this system." + exit 3 +fi +if ! command -v openssl >/dev/null 2>&1 ; then + echo >&2 "I need openssl to proceed, but it is not available on this system." + exit 3 +fi + +# shellcheck disable=SC2050 +if [ "@enable_cloud_POST@" = "no" ]; then + echo >&2 "This agent was built with --disable-cloud and cannot be claimed" + exit 3 +fi +# shellcheck disable=SC2050 +if [ "@can_enable_aclk_POST@" != "yes" ]; then + echo >&2 "This agent was built without the dependencies for Cloud and cannot be claimed" + exit 3 +fi + +# ----------------------------------------------------------------------------- +# defaults to allow running this script by hand + +[ -z "${NETDATA_VARLIB_DIR}" ] && NETDATA_VARLIB_DIR="@varlibdir_POST@" +MACHINE_GUID_FILE="@registrydir_POST@/netdata.public.unique.id" +CLAIMING_DIR="${NETDATA_VARLIB_DIR}/cloud.d" +TOKEN="unknown" +URL_BASE=$(get_config_value cloud global "cloud base url") +[ -z "$URL_BASE" ] && URL_BASE="https://app.netdata.cloud" # Cover post-install with --dont-start +ID="unknown" +ROOMS="" +[ -z "$HOSTNAME" ] && HOSTNAME=$(hostname) +CLOUD_CERTIFICATE_FILE="${CLAIMING_DIR}/cloud_fullchain.pem" +VERBOSE=0 +INSECURE=0 +RELOAD=1 +NETDATA_USER=$(get_config_value netdata global "run as user") +[ -z "$EUID" ] && EUID="$(id -u)" + + +# get the MACHINE_GUID by default +if [ -r "${MACHINE_GUID_FILE}" ]; then + ID="$(cat "${MACHINE_GUID_FILE}")" + MGUID=$ID +else + echo >&2 "netdata.public.unique.id is not generated yet or not readable. Please run agent at least once before attempting to claim. Agent generates this file on first startup. If the ID is generated already make sure you have rights to read it (Filename: ${MACHINE_GUID_FILE})." + exit 18 +fi + +# get token from file +if [ -r "${CLAIMING_DIR}/token" ]; then + TOKEN="$(cat "${CLAIMING_DIR}/token")" +fi + +# get rooms from file +if [ -r "${CLAIMING_DIR}/rooms" ]; then + ROOMS="$(cat "${CLAIMING_DIR}/rooms")" +fi + +for arg in "$@" +do + case $arg in + -token=*) TOKEN=${arg:7} ;; + -url=*) URL_BASE=${arg:5} ;; + -id=*) ID=${arg:4} ;; + -rooms=*) ROOMS=${arg:7} ;; + -hostname=*) HOSTNAME=${arg:10} ;; + -verbose) VERBOSE=1 ;; + -insecure) INSECURE=1 ;; + -proxy=*) PROXY=${arg:7} ;; + -noproxy) NOPROXY=yes ;; + -noreload) RELOAD=0 ;; + -user=*) NETDATA_USER=${arg:6} ;; + *) echo >&2 "Unknown argument ${arg}" + exit 1 ;; + esac + shift 1 +done + +if [ "$EUID" != "0" ] && [ "$(whoami)" != "$NETDATA_USER" ]; then + echo >&2 "This script must be run by the $NETDATA_USER user account" + exit 6 +fi + +# if curl not installed give warning SOCKS can't be used +if [[ "${URLTOOL}" != "curl" && "${PROXY:0:5}" = socks ]] ; then + echo >&2 "wget doesn't support SOCKS. Please install curl or disable SOCKS proxy." + exit 1 +fi + +echo >&2 "Token: ****************" +echo >&2 "Base URL: $URL_BASE" +echo >&2 "Id: $ID" +echo >&2 "Rooms: $ROOMS" +echo >&2 "Hostname: $HOSTNAME" +echo >&2 "Proxy: $PROXY" +echo >&2 "Netdata user: $NETDATA_USER" + +# create the claiming directory for this user +if [ ! -d "${CLAIMING_DIR}" ] ; then + mkdir -p "${CLAIMING_DIR}" && chmod 0770 "${CLAIMING_DIR}" +# shellcheck disable=SC2181 + if [ $? -ne 0 ] ; then + echo >&2 "Failed to create claiming working directory ${CLAIMING_DIR}" + exit 2 + fi +fi +if [ ! -w "${CLAIMING_DIR}" ] ; then + echo >&2 "No write permission in claiming working directory ${CLAIMING_DIR}" + exit 2 +fi + +if [ ! -f "${CLAIMING_DIR}/private.pem" ] ; then + echo >&2 "Generating private/public key for the first time." + if ! openssl genrsa -out "${CLAIMING_DIR}/private.pem" 2048 ; then + echo >&2 "Failed to generate private/public key pair." + exit 2 + fi +fi +if [ ! -f "${CLAIMING_DIR}/public.pem" ] ; then + echo >&2 "Extracting public key from private key." + if ! openssl rsa -in "${CLAIMING_DIR}/private.pem" -outform PEM -pubout -out "${CLAIMING_DIR}/public.pem" ; then + echo >&2 "Failed to extract public key." + exit 2 + fi +fi + +TARGET_URL="${URL_BASE%/}/api/v1/spaces/nodes/${ID}" +# shellcheck disable=SC2002 +KEY=$(cat "${CLAIMING_DIR}/public.pem" | tr '\n' '!' | sed -e 's/!/\\n/g') +# shellcheck disable=SC2001 +[ -n "$ROOMS" ] && ROOMS=\"$(echo "$ROOMS" | sed s'/,/", "/g')\" + +cat > "${CLAIMING_DIR}/tmpin.txt" <<EMBED_JSON +{ + "node": { + "id": "$ID", + "hostname": "$HOSTNAME" + }, + "token": "$TOKEN", + "rooms" : [ $ROOMS ], + "publicKey" : "$KEY", + "mGUID" : "$MGUID" +} +EMBED_JSON + +if [ "${VERBOSE}" == 1 ] ; then + echo "Request to server:" + cat "${CLAIMING_DIR}/tmpin.txt" +fi + + +if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="curl --connect-timeout 5 --retry 0 -s -i -X PUT -d \"@${CLAIMING_DIR}/tmpin.txt\"" + if [ "${NOPROXY}" = "yes" ] ; then + URLCOMMAND="${URLCOMMAND} -x \"\"" + elif [ -n "${PROXY}" ] ; then + URLCOMMAND="${URLCOMMAND} -x \"${PROXY}\"" + fi +else + URLCOMMAND="wget -T 15 -O - -q --save-headers --content-on-error=on --method=PUT \ + --body-file=\"${CLAIMING_DIR}/tmpin.txt\"" + if [ "${NOPROXY}" = "yes" ] ; then + URLCOMMAND="${URLCOMMAND} --no-proxy" + elif [ "${PROXY:0:4}" = http ] ; then + URLCOMMAND="export http_proxy=${PROXY}; ${URLCOMMAND}" + fi +fi + +if [ "${INSECURE}" == 1 ] ; then + if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="${URLCOMMAND} --insecure" + else + URLCOMMAND="${URLCOMMAND} --no-check-certificate" + fi +fi + +if [ -r "${CLOUD_CERTIFICATE_FILE}" ] ; then + if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="${URLCOMMAND} --cacert \"${CLOUD_CERTIFICATE_FILE}\"" + else + URLCOMMAND="${URLCOMMAND} --ca-certificate \"${CLOUD_CERTIFICATE_FILE}\"" + fi +fi + +if [ "${VERBOSE}" == 1 ]; then + echo "${URLCOMMAND} \"${TARGET_URL}\"" +fi + +attempt_contact () { + eval "${URLCOMMAND} \"${TARGET_URL}\"" >"${CLAIMING_DIR}/tmpout.txt" + URLCOMMAND_EXIT_CODE=$? + if [ "${URLTOOL}" = "wget" ] && [ "${URLCOMMAND_EXIT_CODE}" -eq 8 ] ; then + # We consider the server issuing an error response a successful attempt at communicating + URLCOMMAND_EXIT_CODE=0 + fi + + # Check if URLCOMMAND connected and received reply + if [ "${URLCOMMAND_EXIT_CODE}" -ne 0 ] ; then + echo >&2 "Failed to connect to ${URL_BASE}, return code ${URLCOMMAND_EXIT_CODE}" + rm -f "${CLAIMING_DIR}/tmpout.txt" + return 4 + fi + + if [ "${VERBOSE}" == 1 ] ; then + echo "Response from server:" + cat "${CLAIMING_DIR}/tmpout.txt" + fi + + return 0 +} + +for i in {1..5} +do + if attempt_contact ; then + echo "Connection attempt $i successful" + break + fi + echo "Connection attempt $i failed. Retry in ${i}s." + if [ "$i" -eq 5 ] ; then + rm -f "${CLAIMING_DIR}/tmpin.txt" + exit 4 + fi + sleep "$i" +done + +rm -f "${CLAIMING_DIR}/tmpin.txt" + +ERROR_KEY=$(grep "\"errorMsgKey\":" "${CLAIMING_DIR}/tmpout.txt" | awk -F "errorMsgKey\":\"" '{print $2}' | awk -F "\"" '{print $1}') +case ${ERROR_KEY} in + "ErrInvalidNodeID") EXIT_CODE=8 ;; + "ErrInvalidNodeName") EXIT_CODE=9 ;; + "ErrInvalidRoomID") EXIT_CODE=10 ;; + "ErrInvalidPublicKey") EXIT_CODE=11 ;; + "ErrForbidden") EXIT_CODE=12 ;; + "ErrAlreadyClaimed") EXIT_CODE=13 ;; + "ErrProcessingClaim") EXIT_CODE=14 ;; + "ErrInternalServerError") EXIT_CODE=15 ;; + "ErrGatewayTimeout") EXIT_CODE=16 ;; + "ErrServiceUnavailable") EXIT_CODE=17 ;; + *) EXIT_CODE=7 ;; +esac + +HTTP_STATUS_CODE=$(grep "HTTP" "${CLAIMING_DIR}/tmpout.txt" | awk -F " " '{print $2}') +if [ "${HTTP_STATUS_CODE}" = "204" ] ; then + EXIT_CODE=0 +fi + +if [ "${HTTP_STATUS_CODE}" = "204" ] || [ "${ERROR_KEY}" = "ErrAlreadyClaimed" ] ; then + rm -f "${CLAIMING_DIR}/tmpout.txt" + echo -n "${ID}" >"${CLAIMING_DIR}/claimed_id" || (echo >&2 "Claiming failed"; set -e; exit 2) + rm -f "${CLAIMING_DIR}/token" || (echo >&2 "Claiming failed"; set -e; exit 2) + + # Rewrite the cloud.conf on the disk + cat > "$CLAIMING_DIR/cloud.conf" <<HERE_DOC +[global] + enabled = yes + cloud base url = $URL_BASE +HERE_DOC + if [ "$EUID" == "0" ]; then + chown -R "${NETDATA_USER}:${NETDATA_USER}" ${CLAIMING_DIR} || (echo >&2 "Claiming failed"; set -e; exit 2) + fi + if [ "${RELOAD}" == "0" ] ; then + exit $EXIT_CODE + fi + + if [ -z "${PROXY}" ]; then + PROXYMSG="" + else + PROXYMSG="You have attempted to claim this node through a proxy - please update your the proxy setting in your netdata.conf to ${PROXY}. " + fi + # Update cloud.conf in the agent memory + @sbindir_POST@/netdatacli write-config 'cloud|global|enabled|yes' && \ + @sbindir_POST@/netdatacli write-config "cloud|global|cloud base url|$URL_BASE" && \ + @sbindir_POST@/netdatacli reload-claiming-state && \ + if [ "${HTTP_STATUS_CODE}" = "204" ] ; then + echo >&2 "${PROXYMSG}Node was successfully claimed." + else + echo >&2 "The agent cloud base url is set to the url provided." + echo >&2 "The cloud may have different credentials already registered for this agent ID and it cannot be reclaimed under different credentials for security reasons. If you are unable to connect use -id=\$(uuidgen) to overwrite this agent ID with a fresh value if the original credentials cannot be restored." + echo >&2 "${PROXYMSG}Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" + fi && exit $EXIT_CODE + + if [ "${ERROR_KEY}" = "ErrAlreadyClaimed" ] ; then + echo >&2 "The cloud may have different credentials already registered for this agent ID and it cannot be reclaimed under different credentials for security reasons. If you are unable to connect use -id=\$(uuidgen) to overwrite this agent ID with a fresh value if the original credentials cannot be restored." + echo >&2 "${PROXYMSG}Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" + exit $EXIT_CODE + fi + echo >&2 "${PROXYMSG}The claim was successful but the agent could not be notified ($?)- it requires a restart to connect to the cloud." + exit 5 +fi + +echo >&2 "Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" +if [ "${VERBOSE}" == 1 ]; then + echo >&2 "Error key was:\"${ERROR_KEYS[$EXIT_CODE]}\"" +fi +rm -f "${CLAIMING_DIR}/tmpout.txt" +exit $EXIT_CODE |