diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 14:31:17 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 14:31:17 +0000 |
commit | 8020f71afd34d7696d7933659df2d763ab05542f (patch) | |
tree | 2fdf1b5447ffd8bdd61e702ca183e814afdcb4fc /claim | |
parent | Initial commit. (diff) | |
download | netdata-8020f71afd34d7696d7933659df2d763ab05542f.tar.xz netdata-8020f71afd34d7696d7933659df2d763ab05542f.zip |
Adding upstream version 1.37.1.upstream/1.37.1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'claim')
-rw-r--r-- | claim/Makefile.am | 21 | ||||
-rw-r--r-- | claim/README.md | 609 | ||||
-rw-r--r-- | claim/claim.c | 208 | ||||
-rw-r--r-- | claim/claim.h | 16 | ||||
-rwxr-xr-x | claim/netdata-claim.sh.in | 442 |
5 files changed, 1296 insertions, 0 deletions
diff --git a/claim/Makefile.am b/claim/Makefile.am new file mode 100644 index 0000000..c838db9 --- /dev/null +++ b/claim/Makefile.am @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +CLEANFILES = \ + netdata-claim.sh \ + $(NULL) + +include $(top_srcdir)/build/subst.inc +SUFFIXES = .in + +sbin_SCRIPTS = \ + netdata-claim.sh \ + $(NULL) + +dist_noinst_DATA = \ + netdata-claim.sh.in \ + README.md \ + $(NULL) + diff --git a/claim/README.md b/claim/README.md new file mode 100644 index 0000000..3731d20 --- /dev/null +++ b/claim/README.md @@ -0,0 +1,609 @@ +<!-- +title: "Connect Agent to Cloud" +description: "Connecting a Netdata Agent, running on a distributed node, to Netdata Cloud securely via the encrypted Agent-Cloud link (ACLK)." +custom_edit_url: https://github.com/netdata/netdata/edit/master/claim/README.md +--> + +# Connect Agent to Cloud + +You can securely connect a Netdata Agent, running on a distributed node, to Netdata Cloud. A Space's +administrator creates a **claiming token**, which is used to add an Agent to their Space via the [Agent-Cloud link +(ACLK)](/aclk/README.md). + +Are you just starting out with Netdata Cloud? See our [get started with +Cloud](https://learn.netdata.cloud/docs/cloud/get-started) guide for a walkthrough of the process and simplified +instructions. + +When connecting an agent (also referred to as a node) to Netdata Cloud, you must complete a verification process that proves you have some level of authorization to manage the node itself. This verification is a security feature that helps prevent unauthorized users from seeing the data on your node. + +Only the administrators of a Space in Netdata Cloud can view the claiming token and accompanying script generated by +Netdata Cloud. + +> The connection process ensures no third party can add your node, and then view your node's metrics, in a Cloud account, +> Space, or War Room that you did not authorize. + +By connecting a node, you opt-in to sending data from your Agent to Netdata Cloud via the [ACLK](/aclk/README.md). This +data is encrypted by TLS while it is in transit. We use the RSA keypair created during the connection process to authenticate the +identity of the Netdata Agent when it connects to the Cloud. While the data does flow through Netdata Cloud servers on its way +from Agents to the browser, we do not store or log it. + +You can connect a node during the Netdata Cloud onboarding process, or after you created a Space by clicking on **Connect +Nodes** in the [Spaces management area](https://learn.netdata.cloud/docs/cloud/spaces#manage-spaces). + +There are two important notes regarding connecting nodes: + +- _You can only connect any given node in a single Space_. You can, however, add that connected node to multiple War Rooms + within that one Space. +- You must repeat the connection process on every node you want to add to Netdata Cloud. + +## How to connect a node + +There will be three main flows from where you might want to connect a node to Netdata Cloud. +* when you are on an [ +War Room](#empty-war-room) and you want to connect your first node +* when you are at the [Manage Space](#manage-space-or-war-room) area and you select **Connect Nodes** to connect a node, coming from Manage Space or Manage War Room +* when you are on the [Nodes view page](https://learn.netdata.cloud/docs/cloud/visualize/nodes) and want to connect a node - this process falls into the [Manage Space](#manage-space-or-war-room) flow + +Please note that only the administrators of a Space in Netdata Cloud can view the claiming token and accompanying script, generated by Netdata Cloud, to trigger the connection process. + +### Empty War Room + +Either at your first sign in or following ones, when you enter Netdata Cloud and are at a War Room that doesn’t have any node added to it, you will be able to: +* connect a new node to Netdata Cloud and add it to the War Room +* add a previously connected node to the War Room + +If your case is to connect a new node and add it to the War Room, you will need to tell us what environment the node is running on (Linux, Docker, macOS, Kubernetes) and then we will provide you with a script to initiate the connection process. You just will need to copy and paste it into your node's terminal. See one of the following sections depending on your case: +* [Linux](#connect-an-agent-running-in-linux) +* [Docker](#connect-an-agent-running-in-docker) +* [macOS](#connect-an-agent-running-in-macos) +* [Kubernetes](#connect-a-kubernetes-clusters-parent-netdata-pod) + +Repeat this process with every node you want to add to Netdata Cloud during onboarding. You can also add more nodes once you've +finished onboarding. + +### Manage Space or War Room + +To connect a node, select which War Rooms you want to add this node to with the dropdown, then copy and paste the script +given by Netdata Cloud into your node's terminal. + +When coming from [Nodes view page](https://learn.netdata.cloud/docs/cloud/visualize/nodes) the room parameter is already defined to current War Room. + +### Connect an agent running in Linux + +If you want to connect a node that is running on a Linux environment, the script that will be provided to you by Netdata Cloud is the [kickstart](/packaging/installer/README.md#automatic-one-line-installation-script) which will install the Netdata Agent on your node, if it isn't already installed, and connect the node to Netdata Cloud. It should be similar to: + +``` +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://api.netdata.cloud +``` +The script should return `Agent was successfully claimed.`. If the connecting to Netdata Cloud process returns errors, or if you don't see +the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +Please note that to run it you will either need to have root privileges or run it with the user that is running the agent, more details on the [Connect an agent without root privileges](#connect-an-agent-without-root-privileges) section. + +For more details on what are the extra parameters `claim-token`, `claim-rooms` and `claim-url` please refer to [Connect node to Netdata Cloud during installation](/packaging/installer/methods/kickstart.md#connect-node-to-netdata-cloud-during-installation). + +### Connect an agent without root privileges + +If you don't want to run the installation script to connect your nodes to Netdata Cloud with root privileges, you can discover which user is running the Agent, +switch to that user, and run the script. + +Use `grep` to search your `netdata.conf` file, which is typically located at `/etc/netdata/netdata.conf`, for the `run +as user` setting. For example: +To connect a node, select which War Rooms you want to add this node to with the dropdown, then copy and paste the script +given by Netdata Cloud into your node's terminal. + +```bash +grep "run as user" /etc/netdata/netdata.conf + # run as user = netdata +``` + +The default user is `netdata`. Yours may be different, so pay attention to the output from `grep`. Switch to that user +and run the script. + +```bash +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://api.netdata.cloud +``` +### Connect an agent running in Docker + +To connect an instance of the Netdata Agent running inside of a Docker container, it is recommended that you follow +the instructions and use the commands provided either in the `Nodes` tab of an [empty War Room](#empty-war-room) on Netdata Cloud or +in the shelf that appears when you click **Connect Nodes** and select **Docker**. + +However, users can also claim a new node by claiming environment variables in the container to have it automatically +connected on startup or restart. + +For the connection process to work, the contents of `/var/lib/netdata` _must_ be preserved across container +restarts using a persistent volume. See our [recommended `docker run` and Docker Compose +examples](/packaging/docker/README.md#create-a-new-netdata-agent-container) for details. + +#### Known issues on older hosts with seccomp enabled + +The nodes running on the following hosts **cannot be claimed**: + +- `libseccomp` version less than v2.3.3. +- Docker version less than v18.04.0-ce. +- The kernel is configured with CONFIG_SECCOMP enabled. + +To check if your kernel supports `seccomp`: + +```cmd +# grep CONFIG_SECCOMP= /boot/config-$(uname -r) 2>/dev/null || zgrep CONFIG_SECCOMP /proc/config.gz 2>/dev/null +CONFIG_SECCOMP=y +``` + +To resolve the issue, do one of the following actions: + +- Update to a newer version of Docker and `libseccomp` (recommended). +- Create a custom profile and pass it for the container. +- Run [without the default seccomp profile](https://docs.docker.com/engine/security/seccomp/#run-without-the-default-seccomp-profile) (unsafe, not recommended). + +<details> +<summary>See how to create a custom profile</summary> + +1. Download the moby default seccomp profile and change `defaultAction` to `SCMP_ACT_TRACE` on line 2. + + ```cmd + sudo wget https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json -O /etc/docker/seccomp.json + sudo sed -i '2s/SCMP_ACT_ERRNO/SCMP_ACT_TRACE/' /etc/docker/seccomp.json + ``` + +2. Specify the new policy for the container explicitly. + + - When using `docker run`: + + ```cmd + docker run -d --name=netdata \ + --security-opt=seccomp=/etc/docker/seccomp.json \ + ... + ``` + + - When using `docker-compose`: + + > :warning: The security_opt option is ignored when deploying a stack in swarm mode. + + ```yaml + version: '3' + services: + netdata: + security_opt: + - seccomp:/etc/docker/seccomp.json + ... + ``` + + - When using `docker stack deploy`: + + Change the default profile globally by adding `--seccomp-profile=/etc/docker/seccomp.json` to the options passed to + dockerd on startup. + +</details> + +#### Using environment variables + +The Netdata Docker container looks for the following environment variables on startup: + +- `NETDATA_CLAIM_TOKEN` +- `NETDATA_CLAIM_URL` +- `NETDATA_CLAIM_ROOMS` +- `NETDATA_CLAIM_PROXY` + +If the token and URL are specified in their corresponding variables _and_ the container is not already connected, +it will use these values to attempt to connect the container, automatically adding the node to the specified War +Rooms. If a proxy is specified, it will be used for the connection process and for connecting to Netdata Cloud. + +These variables can be specified using any mechanism supported by your container tooling for setting environment +variables inside containers. + +When using the `docker run` command, if you have an agent container already running, it is important to know that there will be a short period of downtime. This is due to the process of recreating the new agent container. + +The command to connect a new node to Netdata Cloud is: + +```bash +docker run -d --name=netdata \ + -p 19999:19999 \ + -v netdataconfig:/etc/netdata \ + -v netdatalib:/var/lib/netdata \ + -v netdatacache:/var/cache/netdata \ + -v /etc/passwd:/host/etc/passwd:ro \ + -v /etc/group:/host/etc/group:ro \ + -v /proc:/host/proc:ro \ + -v /sys:/host/sys:ro \ + -v /etc/os-release:/host/etc/os-release:ro \ + --restart unless-stopped \ + --cap-add SYS_PTRACE \ + --security-opt apparmor=unconfined \ + -e NETDATA_CLAIM_TOKEN=TOKEN \ + -e NETDATA_CLAIM_URL="https://api.netdata.cloud" \ + -e NETDATA_CLAIM_ROOMS=ROOM1,ROOM2 \ + -e NETDATA_CLAIM_PROXY=PROXY \ + netdata/netdata +``` +>Note: This command is suggested for connecting a new container. Using this command for an existing container recreates the container, though data +and configuration of the old container may be preserved. If you are claiming an existing container that can not be recreated, +you can add the container by going to Netdata Cloud, clicking the **Nodes** tab, clicking **Connect Nodes**, selecting **Docker**, and following +the instructions and commands provided or by following the instructions in an [empty War Room](#empty-war-room). + +The output that would be seen from the connection process when using other methods will be present in the container logs. + +Using the environment variables like this to handle the connection process is the preferred method of connecting Docker containers +as it works in the widest variety of situations and simplifies configuration management. + +#### Using Docker compose + +If you use `docker compose`, you can copy the config provided by Netdata Cloud, which should be same as the one below: + +```bash +version: '3' +services: + netdata: + image: netdata/netdata + container_name: netdata + hostname: example.com # set to fqdn of host + ports: + - 19999:19999 + restart: unless-stopped + cap_add: + - SYS_PTRACE + security_opt: + - apparmor:unconfined + volumes: + - netdataconfig:/etc/netdata + - netdatalib:/var/lib/netdata + - netdatacache:/var/cache/netdata + - /etc/passwd:/host/etc/passwd:ro + - /etc/group:/host/etc/group:ro + - /proc:/host/proc:ro + - /sys:/host/sys:ro + - /etc/os-release:/host/etc/os-release:ro + environment: + - NETDATA_CLAIM_TOKEN=TOKEN + - NETDATA_CLAIM_URL="https://api.netdata.cloud" + - NETDATA_CLAIM_ROOMS=ROOM1,ROOM2 + +volumes: + netdataconfig: + netdatalib: + netdatacache: +``` + +Then run the following command in the same directory as the `docker-compose.yml` file to start the container. + +```bash +docker-compose up -d +``` +#### Using docker exec + +Connect a _running Netdata Agent container_, where you don't want to recreate the existing container, append the script offered by Netdata Cloud to a `docker exec ...` command, replacing +`netdata` with the name of your running container: + +```bash +docker exec -it netdata netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://api.netdata.cloud +``` +The values for `ROOM1,ROOM2` can be found by by going to Netdata Cloud, clicking the **Nodes** tab, clicking **Connect Nodes**, selecting **Docker**, and copying the `rooms=` value in the command provided. + +The script should return `Agent was successfully claimed.`. If the connection process returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +### Connect an agent running in macOS + +To connect a node that is running on a macOS environment the script that will be provided to you by Netdata Cloud is the [kickstart](/packaging/installer/methods/macos.md#install-netdata-with-our-automatic-one-line-installation-script) which will install the Netdata Agent on your node, if it isn't already installed, and connect the node to Netdata Cloud. It should be similar to: + +```bash +curl https://my-netdata.io/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh --install /usr/local/ --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://api.netdata.cloud +``` +The script should return `Agent was successfully claimed.`. If the connecting to Netdata Cloud process returns errors, or if you don't see +the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +### Connect a Kubernetes cluster's parent Netdata pod + +Read our [Kubernetes installation](/packaging/installer/methods/kubernetes.md#connect-your-kubernetes-cluster-to-netdata-cloud) +for details on connecting a parent Netdata pod. + +### Connect through a proxy + +A Space's administrator can connect a node through HTTP(S) proxy. + +You should first configure the proxy in the `[cloud]` section of `netdata.conf`. The proxy settings you specify here +will also be used to tunnel the ACLK. The default `proxy` setting is `none`. + +```conf +[cloud] + proxy = none +``` + +The `proxy` setting can take one of the following values: + +- `none`: Do not use a proxy, even if the system configured otherwise. +- `env`: Try to read proxy settings from set environment variables `http_proxy`. +- `http://[user:pass@]host:ip`: The ACLK and connection process will use the specified HTTP(S) proxy. + +For example, a HTTP proxy setting may look like the following: + +```conf +[cloud] + proxy = http://203.0.113.0:1080 # With an IP address + proxy = http://proxy.example.com:1080 # With a URL +``` + +You can now move on to connecting. When you connect with the [kickstart](/packaging/installer/README.md#automatic-one-line-installation-script) script, add the `--claim-proxy=` parameter and +append the same proxy setting you added to `netdata.conf`. + +```bash +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://api.netdata.cloud --claim-proxy http://[user:pass@]host:ip +``` + +Hit **Enter**. The script should return `Agent was successfully claimed.`. If the connecting to Netdata Cloud process returns errors, or if +you don't see the node in your Space after 60 seconds, see the [troubleshooting information](#troubleshooting). + +### Troubleshooting + +If you're having trouble connecting a node, this may be because the [ACLK](/aclk/README.md) cannot connect to Cloud. + +With the Netdata Agent running, visit `http://NODE:19999/api/v1/info` in your browser, replacing `NODE` with the IP +address or hostname of your Agent. The returned JSON contains four keys that will be helpful to diagnose any issues you +might be having with the ACLK or connection process. + +```json + "cloud-enabled" + "cloud-available" + "agent-claimed" + "aclk-available" +``` + +On Netdata agent version `1.32` (`netdata -v` to find your version) and newer, the `netdata -W aclk-state` command can be used to get some diagnostic information about ACLK. Sample output: + +``` +ACLK Available: Yes +ACLK Implementation: Next Generation +New Cloud Protocol Support: Yes +Claimed: Yes +Claimed Id: 53aa76c2-8af5-448f-849a-b16872cc4ba1 +Online: Yes +Used Cloud Protocol: New +``` + +Use these keys and the information below to troubleshoot the ACLK. + +#### kickstart: unsupported Netdata installation + +If you run the kickstart script and get the following error `Existing install appears to be handled manually or through the system package manager.` you most probably installed Netdata using an unsupported package. + +If you are using an unsupported package, such as a third-party `.deb`/`.rpm` package provided by your distribution, +please remove that package and reinstall using our [recommended kickstart +script](/docs/get-started.mdx#install-on-linux-with-one-line-installer). + +#### kickstart: Failed to write new machine GUID + +If you run the kickstart script but don't have privileges required for the actions done on the connecting to Netdata Cloud process you will get the following error: + +```bash +Failed to write new machine GUID. Please make sure you have rights to write to /var/lib/netdata/registry/netdata.public.unique.id. +``` +For a successful execution you will need to run the script with root privileges or run it with the user that is running the agent, more details on the [Connect an agent without root privileges](#connect-an-agent-without-root-privileges) section. + +#### bash: netdata-claim.sh: command not found + +If you run the claiming script and see a `command not found` error, you either installed Netdata in a non-standard +location or are using an unsupported package. If you installed Netdata in a non-standard path using the `--install` +option, you need to update your `$PATH` or run `netdata-claim.sh` using the full path. For example, if you installed +Netdata to `/opt/netdata`, use `/opt/netdata/bin/netdata-claim.sh` to run the claiming script. + +If you are using an unsupported package, such as a third-party `.deb`/`.rpm` package provided by your distribution, +please remove that package and reinstall using our [recommended kickstart +script](/docs/get-started.mdx#install-on-linux-with-one-line-installer). + +#### Connecting on older distributions (Ubuntu 14.04, Debian 8, CentOS 6) + +If you're running an older Linux distribution or one that has reached EOL, such as Ubuntu 14.04 LTS, Debian 8, or CentOS +6, your Agent may not be able to securely connect to Netdata Cloud due to an outdated version of OpenSSL. These old +versions of OpenSSL cannot perform [hostname validation](https://wiki.openssl.org/index.php/Hostname_validation), which +helps securely encrypt SSL connections. + +We recommend you reinstall Netdata with a [static build](/packaging/installer/methods/kickstart.md#static-builds), which uses an +up-to-date version of OpenSSL with hostname validation enabled. + +If you choose to continue using the outdated version of OpenSSL, your node will still connect to Netdata Cloud, albeit +with hostname verification disabled. Without verification, your Netdata Cloud connection could be vulnerable to +man-in-the-middle attacks. + +#### cloud-enabled is false + +If `cloud-enabled` is `false`, you probably ran the installer with `--disable-cloud` option. + +Additionally, check that the `enabled` setting in `var/lib/netdata/cloud.d/cloud.conf` is set to `true`: + +```conf +[global] + enabled = true +``` + +To fix this issue, reinstall Netdata using your [preferred method](/packaging/installer/README.md) and do not add the +`--disable-cloud` option. + +#### cloud-available is false / ACLK Available: No + +If `cloud-available` is `false` after you verified Cloud is enabled in the previous step, the most likely issue is that +Cloud features failed to build during installation. + +If Cloud features fail to build, the installer continues and finishes the process without Cloud functionality as opposed +to failing the installation altogether. We do this to ensure the Agent will always finish installing. + +If you can't see an explicit error in the installer's output, you can run the installer with the `--require-cloud` +option. This option causes the installation to fail if Cloud functionality can't be built and enabled, and the +installer's output should give you more error details. + +You may see one of the following error messages during installation: + +- Failed to build libmosquitto. The install process will continue, but you will not be able to connect this node to + Netdata Cloud. +- Unable to fetch sources for libmosquitto. The install process will continue, but you will not be able to connect + this node to Netdata Cloud. +- Failed to build libwebsockets. The install process will continue, but you may not be able to connect this node to + Netdata Cloud. +- Unable to fetch sources for libwebsockets. The install process will continue, but you may not be able to connect + this node to Netdata Cloud. +- Could not find cmake, which is required to build libwebsockets. The install process will continue, but you may not + be able to connect this node to Netdata Cloud. +- Could not find cmake, which is required to build JSON-C. The install process will continue, but Netdata Cloud + support will be disabled. +- Failed to build JSON-C. Netdata Cloud support will be disabled. +- Unable to fetch sources for JSON-C. Netdata Cloud support will be disabled. + +One common cause of the installer failing to build Cloud features is not having one of the following dependencies on +your system: `cmake`, `json-c` and `OpenSSL`, including corresponding `devel` packages. + +You can also look for error messages in `/var/log/netdata/error.log`. Try one of the following two commands to search +for ACLK-related errors. + +```bash +less /var/log/netdata/error.log +grep -i ACLK /var/log/netdata/error.log +``` + +If the installer's output does not help you enable Cloud features, contact us by [creating an issue on +GitHub](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml&title=The+installer+failed+to+prepare+the+required+dependencies+for+Netdata+Cloud+functionality) +with details about your system and relevant output from `error.log`. + +#### agent-claimed is false / Claimed: No + +You must [connect your node](#how-to-connect-a-node). + +#### aclk-available is false / Online: No + +If `aclk-available` is `false` and all other keys are `true`, your Agent is having trouble connecting to the Cloud +through the ACLK. Please check your system's firewall. + +If your Agent needs to use a proxy to access the internet, you must [set up a proxy for +connecting](#connect-through-a-proxy). + +If you are certain firewall and proxy settings are not the issue, you should consult the Agent's `error.log` at +`/var/log/netdata/error.log` and contact us by [creating an issue on +GitHub](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml&title=ACLK-available-is-false) +with details about your system and relevant output from `error.log`. + +### Remove and reconnect a node + +To remove a node from your Space in Netdata Cloud, delete the `cloud.d/` directory in your Netdata library directory. + +```bash +cd /var/lib/netdata # Replace with your Netdata library directory, if not /var/lib/netdata/ +sudo rm -rf cloud.d/ +``` + +This node no longer has access to the credentials it was used when connecting to Netdata Cloud via the ACLK. +You will still be able to see this node in your War Rooms in an **unreachable** state. + +If you want to reconnect this node, you need to: +1. Ensure that the `/var/lib/netdata/cloud.d` directory doesn't exist. In some installations, the path is `/opt/netdata/var/lib/netdata/cloud.d`. +2. Stop the agent. +3. Ensure that the `uuidgen-runtime` package is installed. Run ```echo "$(uuidgen)"``` and validate you get back a UUID. +4. Copy the kickstart.sh command to add a node from your space and add to the end of it `--claim-id "$(uuidgen)"`. Run the command and look for the message `Node was successfully claimed.` +5. Start the agent + + +## Connecting reference +In the sections below, you can find reference material for the kickstart script, claiming script, connecting via the Agent's command line +tool, and details about the files found in `cloud.d`. + +### The `cloud.conf` file + +This section defines how and whether your Agent connects to [Netdata Cloud](https://learn.netdata.cloud/docs/cloud/) +using the [ACLK](/aclk/README.md). + +| setting | default | info | +|:-------------- |:------------------------- |:-------------------------------------------------------------------------------------------------------------------------------------- | +| cloud base url | https://api.netdata.cloud | The URL for the Netdata Cloud web application. You should not change this. If you want to disable Cloud, change the `enabled` setting. | +| enabled | yes | The runtime option to disable the [Agent-Cloud link](/aclk/README.md) and prevent your Agent from connecting to Netdata Cloud. | + +### kickstart script + +The best way to install Netdata and connect your nodes to Netdata Cloud is with our automatic one-line installation script, [kickstart](/packaging/installer/README.md#automatic-one-line-installation-script). This script will install the Netdata Agent, in case it isn't already installed, and connect your node to Netdata Cloud. + +This works with: +* most Linux distributions, see [Netdata's platform support policy](/packaging/PLATFORM_SUPPORT.md) +* macOS + +For details on how to run this script please check [How to connect a node](#how-to-connect-a-node) and choose your environment. + +In case Netdata Agent is already installed and you run this script to connect a node to Netdata Cloud it will not upgrade your agent automatically. If you also want to upgrade the Agent installation you'll need to run the script again without the connection options. + +Our suggestion is to first run kickstart to upgrade your agent by running the command below and the run the [How to connect a node] +(#how-to-connect-a-node). + +**Linux** + +```bash +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh +``` + +**macOS** + +```bash +curl https://my-netdata.io/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh --install /usr/local/ +``` +### Claiming script + +A Space's administrator can also connect an Agent by directly calling the `netdata-claim.sh` script either with root privileges +using `sudo`, or as the user running the Agent (typically `netdata`), and passing the following arguments: + +```sh +-token=TOKEN + where TOKEN is the Space's claiming token. +-rooms=ROOM1,ROOM2,... + where ROOMX is the War Room this node should be added to. This list is optional. +-url=URL_BASE + where URL_BASE is the Netdata Cloud endpoint base URL. By default, this is https://api.netdata.cloud. +-id=AGENT_ID + where AGENT_ID is the unique identifier of the Agent. This is the Agent's MACHINE_GUID by default. +-hostname=HOSTNAME + where HOSTNAME is the result of the hostname command by default. +-proxy=PROXY_URL + where PROXY_URL is the endpoint of a HTTP or HTTPS proxy. +``` + +For example, the following command connects an Agent and adds it to rooms `room1` and `room2`: + +```sh +netdata-claim.sh -token=MYTOKEN1234567 -rooms=room1,room2 +``` + +You should then update the `netdata` service about the result with `netdatacli`: + +```sh +netdatacli reload-claiming-state +``` + +This reloads the Agent connection state from disk. + +Our recommendation is to trigger the connection process using the [kickstart](/packaging/installer/README.md#automatic-one-line-installation-script) whenever possible. + +### Netdata Agent command line + +If a Netdata Agent is running, the Space's administrator can connect a node using the `netdata` service binary with +additional command line parameters: + +```sh +-W "claim -token=TOKEN -rooms=ROOM1,ROOM2" +``` + +For example: + +```sh +/usr/sbin/netdata -D -W "claim -token=MYTOKEN1234567 -rooms=room1,room2" +``` + +If need be, the user can override the Agent's defaults by providing additional arguments like those described +[here](#claiming-script). + +### Connection directory + +Netdata stores the Agent's connection-related state in the Netdata library directory under `cloud.d`. For a default +installation, this directory exists at `/var/lib/netdata/cloud.d`. The directory and its files should be owned by the +user that runs the Agent, which is typically the `netdata` user. + +The `cloud.d/token` file should contain the claiming-token and the `cloud.d/rooms` file should contain the list of War +Rooms you added that node to. + +The user can also put the Cloud endpoint's full certificate chain in `cloud.d/cloud_fullchain.pem` so that the Agent +can trust the endpoint if necessary. + + diff --git a/claim/claim.c b/claim/claim.c new file mode 100644 index 0000000..d997fc8 --- /dev/null +++ b/claim/claim.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "claim.h" +#include "registry/registry_internals.h" +#include "aclk/aclk.h" +#include "aclk/aclk_proxy.h" + +char *claiming_pending_arguments = NULL; + +static char *claiming_errors[] = { + "Agent claimed successfully", // 0 + "Unknown argument", // 1 + "Problems with claiming working directory", // 2 + "Missing dependencies", // 3 + "Failure to connect to endpoint", // 4 + "The CLI didn't work", // 5 + "Wrong user", // 6 + "Unknown HTTP error message", // 7 + "invalid node id", // 8 + "invalid node name", // 9 + "invalid room id", // 10 + "invalid public key", // 11 + "token expired/token not found/invalid token", // 12 + "already claimed", // 13 + "processing claiming", // 14 + "Internal Server Error", // 15 + "Gateway Timeout", // 16 + "Service Unavailable", // 17 + "Agent Unique Id Not Readable" // 18 +}; + +/* Retrieve the claim id for the agent. + * Caller owns the string. +*/ +char *get_agent_claimid() +{ + char *result; + rrdhost_aclk_state_lock(localhost); + result = (localhost->aclk_state.claimed_id == NULL) ? NULL : strdupz(localhost->aclk_state.claimed_id); + rrdhost_aclk_state_unlock(localhost); + return result; +} + +#define CLAIMING_COMMAND_LENGTH 16384 +#define CLAIMING_PROXY_LENGTH CLAIMING_COMMAND_LENGTH/4 + +extern struct registry registry; + +/* rrd_init() and post_conf_load() must have been called before this function */ +void claim_agent(char *claiming_arguments) +{ + if (!netdata_cloud_setting) { + error("Refusing to claim agent -> cloud functionality has been disabled"); + return; + } + +#ifndef DISABLE_CLOUD + int exit_code; + pid_t command_pid; + char command_buffer[CLAIMING_COMMAND_LENGTH + 1]; + FILE *fp_child_output, *fp_child_input; + + // This is guaranteed to be set early in main via post_conf_load() + char *cloud_base_url = appconfig_get(&cloud_config, CONFIG_SECTION_GLOBAL, "cloud base url", NULL); + if (cloud_base_url == NULL) + fatal("Do not move the cloud base url out of post_conf_load!!"); + const char *proxy_str; + ACLK_PROXY_TYPE proxy_type; + char proxy_flag[CLAIMING_PROXY_LENGTH] = "-noproxy"; + + proxy_str = aclk_get_proxy(&proxy_type); + + if (proxy_type == PROXY_TYPE_SOCKS5 || proxy_type == PROXY_TYPE_HTTP) + snprintf(proxy_flag, CLAIMING_PROXY_LENGTH, "-proxy=\"%s\"", proxy_str); + + snprintfz(command_buffer, + CLAIMING_COMMAND_LENGTH, + "exec netdata-claim.sh %s -hostname=%s -id=%s -url=%s -noreload %s", + + proxy_flag, + netdata_configured_hostname, + localhost->machine_guid, + cloud_base_url, + claiming_arguments); + + info("Executing agent claiming command 'netdata-claim.sh'"); + fp_child_output = netdata_popen(command_buffer, &command_pid, &fp_child_input); + if(!fp_child_output) { + error("Cannot popen(\"%s\").", command_buffer); + return; + } + info("Waiting for claiming command to finish."); + while (fgets(command_buffer, CLAIMING_COMMAND_LENGTH, fp_child_output) != NULL) {;} + exit_code = netdata_pclose(fp_child_input, fp_child_output, command_pid); + info("Agent claiming command returned with code %d", exit_code); + if (0 == exit_code) { + load_claiming_state(); + return; + } + if (exit_code < 0) { + error("Agent claiming command failed to complete its run."); + return; + } + errno = 0; + unsigned maximum_known_exit_code = sizeof(claiming_errors) / sizeof(claiming_errors[0]) - 1; + + if ((unsigned)exit_code > maximum_known_exit_code) { + error("Agent failed to be claimed with an unknown error."); + return; + } + error("Agent failed to be claimed with the following error message:"); + error("\"%s\"", claiming_errors[exit_code]); +#else + UNUSED(claiming_arguments); + UNUSED(claiming_errors); +#endif +} + +#ifdef ENABLE_ACLK +extern int aclk_connected, aclk_kill_link, aclk_disable_runtime; +#endif + +/* Change the claimed state of the agent. + * + * This only happens when the user has explicitly requested it: + * - via the cli tool by reloading the claiming state + * - after spawning the claim because of a command-line argument + * If this happens with the ACLK active under an old claim then we MUST KILL THE LINK + */ +void load_claiming_state(void) +{ + // -------------------------------------------------------------------- + // Check if the cloud is enabled +#if defined( DISABLE_CLOUD ) || !defined( ENABLE_ACLK ) + netdata_cloud_setting = 0; +#else + uuid_t uuid; + + // Propagate into aclk and registry. Be kind of atomic... + appconfig_get(&cloud_config, CONFIG_SECTION_GLOBAL, "cloud base url", DEFAULT_CLOUD_BASE_URL); + + rrdhost_aclk_state_lock(localhost); + if (localhost->aclk_state.claimed_id) { + if (aclk_connected) + localhost->aclk_state.prev_claimed_id = strdupz(localhost->aclk_state.claimed_id); + freez(localhost->aclk_state.claimed_id); + localhost->aclk_state.claimed_id = NULL; + } + if (aclk_connected) + { + info("Agent was already connected to Cloud - forcing reconnection under new credentials"); + aclk_kill_link = 1; + } + aclk_disable_runtime = 0; + + char filename[FILENAME_MAX + 1]; + snprintfz(filename, FILENAME_MAX, "%s/cloud.d/claimed_id", netdata_configured_varlib_dir); + + long bytes_read; + char *claimed_id = read_by_filename(filename, &bytes_read); + if(claimed_id && uuid_parse(claimed_id, uuid)) { + error("claimed_id \"%s\" doesn't look like valid UUID", claimed_id); + freez(claimed_id); + claimed_id = NULL; + } + + if(claimed_id) { + localhost->aclk_state.claimed_id = mallocz(UUID_STR_LEN); + uuid_unparse_lower(uuid, localhost->aclk_state.claimed_id); + } + + invalidate_node_instances(&localhost->host_uuid, claimed_id ? &uuid : NULL); + metaqueue_store_claim_id(&localhost->host_uuid, claimed_id ? &uuid : NULL); + + rrdhost_aclk_state_unlock(localhost); + if (!claimed_id) { + info("Unable to load '%s', setting state to AGENT_UNCLAIMED", filename); + return; + } + + freez(claimed_id); + + info("File '%s' was found. Setting state to AGENT_CLAIMED.", filename); + netdata_cloud_setting = appconfig_get_boolean(&cloud_config, CONFIG_SECTION_GLOBAL, "enabled", 1); +#endif +} + +struct config cloud_config = { .first_section = NULL, + .last_section = NULL, + .mutex = NETDATA_MUTEX_INITIALIZER, + .index = { .avl_tree = { .root = NULL, .compar = appconfig_section_compare }, + .rwlock = AVL_LOCK_INITIALIZER } }; + +void load_cloud_conf(int silent) +{ + char *filename; + errno = 0; + + int ret = 0; + + filename = strdupz_path_subpath(netdata_configured_varlib_dir, "cloud.d/cloud.conf"); + + ret = appconfig_load(&cloud_config, filename, 1, NULL); + if(!ret && !silent) { + info("CONFIG: cannot load cloud config '%s'. Running with internal defaults.", filename); + } + freez(filename); +} diff --git a/claim/claim.h b/claim/claim.h new file mode 100644 index 0000000..fc76037 --- /dev/null +++ b/claim/claim.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_CLAIM_H +#define NETDATA_CLAIM_H 1 + +#include "daemon/common.h" + +extern char *claiming_pending_arguments; +extern struct config cloud_config; + +void claim_agent(char *claiming_arguments); +char *get_agent_claimid(void); +void load_claiming_state(void); +void load_cloud_conf(int silent); + +#endif //NETDATA_CLAIM_H diff --git a/claim/netdata-claim.sh.in b/claim/netdata-claim.sh.in new file mode 100755 index 0000000..f87fbc2 --- /dev/null +++ b/claim/netdata-claim.sh.in @@ -0,0 +1,442 @@ +#!/usr/bin/env bash +# netdata +# real-time performance and health monitoring, done right! +# (C) 2017 Costa Tsaousis <costa@tsaousis.gr> +# SPDX-License-Identifier: GPL-3.0-or-later + +# Exit code: 0 - Success +# Exit code: 1 - Unknown argument +# Exit code: 2 - Problems with claiming working directory +# Exit code: 3 - Missing dependencies +# Exit code: 4 - Failure to connect to endpoint +# Exit code: 5 - The CLI didn't work +# Exit code: 6 - Wrong user +# Exit code: 7 - Unknown HTTP error message +# +# OK: Agent claimed successfully +# HTTP Status code: 204 +# Exit code: 0 +# +# Unknown HTTP error message +# HTTP Status code: 422 +# Exit code: 7 +ERROR_KEYS[7]="None" +ERROR_MESSAGES[7]="Unknown HTTP error message" + +# Error: The agent id is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 8 +ERROR_KEYS[8]="ErrInvalidNodeID" +ERROR_MESSAGES[8]="invalid node id" + +# Error: The agent hostname is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 9 +ERROR_KEYS[9]="ErrInvalidNodeName" +ERROR_MESSAGES[9]="invalid node name" + +# Error: At least one of the given rooms ids is invalid; it does not fulfill the constraints +# HTTP Status code: 422 +# Exit code: 10 +ERROR_KEYS[10]="ErrInvalidRoomID" +ERROR_MESSAGES[10]="invalid room id" + +# Error: Invalid public key; the public key is empty or not present +# HTTP Status code: 422 +# Exit code: 11 +ERROR_KEYS[11]="ErrInvalidPublicKey" +ERROR_MESSAGES[11]="invalid public key" +# +# Error: Expired, missing or invalid token +# HTTP Status code: 403 +# Exit code: 12 +ERROR_KEYS[12]="ErrForbidden" +ERROR_MESSAGES[12]="token expired/token not found/invalid token" + +# Error: Duplicate agent id; an agent with the same id is already registered in the cloud +# HTTP Status code: 409 +# Exit code: 13 +ERROR_KEYS[13]="ErrAlreadyClaimed" +ERROR_MESSAGES[13]="already claimed" + +# Error: The node claiming process is still in progress. +# HTTP Status code: 102 +# Exit code: 14 +ERROR_KEYS[14]="ErrProcessingClaim" +ERROR_MESSAGES[14]="processing claiming" + +# Error: Internal server error. Any other unexpected error (DB problems, etc.) +# HTTP Status code: 500 +# Exit code: 15 +ERROR_KEYS[15]="ErrInternalServerError" +ERROR_MESSAGES[15]="Internal Server Error" + +# Error: There was a timeout processing the claim. +# HTTP Status code: 504 +# Exit code: 16 +ERROR_KEYS[16]="ErrGatewayTimeout" +ERROR_MESSAGES[16]="Gateway Timeout" + +# Error: The service cannot handle the claiming request at this time. +# HTTP Status code: 503 +# Exit code: 17 +ERROR_KEYS[17]="ErrServiceUnavailable" +ERROR_MESSAGES[17]="Service Unavailable" + +# Exit code: 18 - Agent unique id is not generated yet. + +NETDATA_RUNNING=1 + +get_config_value() { + conf_file="${1}" + section="${2}" + key_name="${3}" + if [ "${NETDATA_RUNNING}" -eq 1 ]; then + config_result=$(@sbindir_POST@/netdatacli 2>/dev/null read-config "$conf_file|$section|$key_name"; exit $?) + result="$?" + if [ "${result}" -ne 0 ]; then + echo >&2 "Unable to communicate with Netdata daemon, querying config from disk instead." + NETDATA_RUNNING=0 + fi + fi + if [ "${NETDATA_RUNNING}" -eq 0 ]; then + config_result=$(@sbindir_POST@/netdata 2>/dev/null -W get2 "$conf_file" "$section" "$key_name" unknown_default) + fi + echo "$config_result" +} +if command -v curl >/dev/null 2>&1 ; then + URLTOOL="curl" +elif command -v wget >/dev/null 2>&1 ; then + URLTOOL="wget" +else + echo >&2 "I need curl or wget to proceed, but neither is available on this system." + exit 3 +fi +if ! command -v openssl >/dev/null 2>&1 ; then + echo >&2 "I need openssl to proceed, but it is not available on this system." + exit 3 +fi + +# shellcheck disable=SC2050 +if [ "@enable_cloud_POST@" = "no" ]; then + echo >&2 "This agent was built with --disable-cloud and cannot be claimed" + exit 3 +fi +# shellcheck disable=SC2050 +if [ "@enable_aclk_POST@" != "yes" ]; then + echo >&2 "This agent was built without the dependencies for Cloud and cannot be claimed" + exit 3 +fi + +# ----------------------------------------------------------------------------- +# defaults to allow running this script by hand + +[ -z "${NETDATA_VARLIB_DIR}" ] && NETDATA_VARLIB_DIR="@varlibdir_POST@" +MACHINE_GUID_FILE="@registrydir_POST@/netdata.public.unique.id" +CLAIMING_DIR="${NETDATA_VARLIB_DIR}/cloud.d" +TOKEN="unknown" +URL_BASE=$(get_config_value cloud global "cloud base url") +[ -z "$URL_BASE" ] && URL_BASE="https://api.netdata.cloud" # Cover post-install with --dont-start +ID="unknown" +ROOMS="" +[ -z "$HOSTNAME" ] && HOSTNAME=$(hostname) +CLOUD_CERTIFICATE_FILE="${CLAIMING_DIR}/cloud_fullchain.pem" +VERBOSE=0 +INSECURE=0 +RELOAD=1 +NETDATA_USER=$(get_config_value netdata global "run as user") +[ -z "$EUID" ] && EUID="$(id -u)" + + +gen_id() { + local id + + if command -v uuidgen > /dev/null 2>&1; then + id="$(uuidgen | tr '[:upper:]' '[:lower:]')" + elif [ -r /proc/sys/kernel/random/uuid ]; then + id="$(cat /proc/sys/kernel/random/uuid)" + else + echo >&2 "Unable to generate machine ID." + exit 18 + fi + + if [ "${id}" = "8a795b0c-2311-11e6-8563-000c295076a6" ] || [ "${id}" = "4aed1458-1c3e-11e6-a53f-000c290fc8f5" ]; then + gen_id + else + echo "${id}" + fi +} + +# get the MACHINE_GUID by default +if [ -r "${MACHINE_GUID_FILE}" ]; then + ID="$(cat "${MACHINE_GUID_FILE}")" + MGUID=$ID +elif [ -f "${MACHINE_GUID_FILE}" ]; then + echo >&2 "netdata.public.unique.id is not readable. Please make sure you have rights to read it (Filename: ${MACHINE_GUID_FILE})." + exit 18 +else + if mkdir -p "${MACHINE_GUID_FILE%/*}" && /bin/echo -n "$(gen_id)" > "${MACHINE_GUID_FILE}"; then + ID="$(cat "${MACHINE_GUID_FILE}")" + MGUID=$ID + else + echo >&2 "Failed to write new machine GUID. Please make sure you have rights to write to ${MACHINE_GUID_FILE}." + exit 18 + fi +fi + +# get token from file +if [ -r "${CLAIMING_DIR}/token" ]; then + TOKEN="$(cat "${CLAIMING_DIR}/token")" +fi + +# get rooms from file +if [ -r "${CLAIMING_DIR}/rooms" ]; then + ROOMS="$(cat "${CLAIMING_DIR}/rooms")" +fi + +for arg in "$@" +do + case $arg in + -token=*) TOKEN=${arg:7} ;; + -url=*) [ -n "${arg:5}" ] && URL_BASE=${arg:5} ;; + -id=*) ID=$(echo "${arg:4}" | tr '[:upper:]' '[:lower:]');; + -rooms=*) ROOMS=${arg:7} ;; + -hostname=*) HOSTNAME=${arg:10} ;; + -verbose) VERBOSE=1 ;; + -insecure) INSECURE=1 ;; + -proxy=*) PROXY=${arg:7} ;; + -noproxy) NOPROXY=yes ;; + -noreload) RELOAD=0 ;; + -user=*) NETDATA_USER=${arg:6} ;; + -daemon-not-running) NETDATA_RUNNING=0 ;; + *) echo >&2 "Unknown argument ${arg}" + exit 1 ;; + esac + shift 1 +done + +if [ "$EUID" != "0" ] && [ "$(whoami)" != "$NETDATA_USER" ]; then + echo >&2 "This script must be run by the $NETDATA_USER user account" + exit 6 +fi + +# if curl not installed give warning SOCKS can't be used +if [[ "${URLTOOL}" != "curl" && "${PROXY:0:5}" = socks ]] ; then + echo >&2 "wget doesn't support SOCKS. Please install curl or disable SOCKS proxy." + exit 1 +fi + +echo >&2 "Token: ****************" +echo >&2 "Base URL: $URL_BASE" +echo >&2 "Id: $ID" +echo >&2 "Rooms: $ROOMS" +echo >&2 "Hostname: $HOSTNAME" +echo >&2 "Proxy: $PROXY" +echo >&2 "Netdata user: $NETDATA_USER" + +# create the claiming directory for this user +if [ ! -d "${CLAIMING_DIR}" ] ; then + mkdir -p "${CLAIMING_DIR}" && chmod 0770 "${CLAIMING_DIR}" +# shellcheck disable=SC2181 + if [ $? -ne 0 ] ; then + echo >&2 "Failed to create claiming working directory ${CLAIMING_DIR}" + exit 2 + fi +fi +if [ ! -w "${CLAIMING_DIR}" ] ; then + echo >&2 "No write permission in claiming working directory ${CLAIMING_DIR}" + exit 2 +fi + +if [ ! -f "${CLAIMING_DIR}/private.pem" ] ; then + echo >&2 "Generating private/public key for the first time." + if ! openssl genrsa -out "${CLAIMING_DIR}/private.pem" 2048 ; then + echo >&2 "Failed to generate private/public key pair." + exit 2 + fi +fi +if [ ! -f "${CLAIMING_DIR}/public.pem" ] ; then + echo >&2 "Extracting public key from private key." + if ! openssl rsa -in "${CLAIMING_DIR}/private.pem" -outform PEM -pubout -out "${CLAIMING_DIR}/public.pem" ; then + echo >&2 "Failed to extract public key." + exit 2 + fi +fi + +TARGET_URL="${URL_BASE%/}/api/v1/spaces/nodes/${ID}" +# shellcheck disable=SC2002 +KEY=$(cat "${CLAIMING_DIR}/public.pem" | tr '\n' '!' | sed -e 's/!/\\n/g') +# shellcheck disable=SC2001 +[ -n "$ROOMS" ] && ROOMS=\"$(echo "$ROOMS" | sed s'/,/", "/g')\" + +cat > "${CLAIMING_DIR}/tmpin.txt" <<EMBED_JSON +{ + "node": { + "id": "$ID", + "hostname": "$HOSTNAME" + }, + "token": "$TOKEN", + "rooms" : [ $ROOMS ], + "publicKey" : "$KEY", + "mGUID" : "$MGUID" +} +EMBED_JSON + +if [ "${VERBOSE}" == 1 ] ; then + echo "Request to server:" + cat "${CLAIMING_DIR}/tmpin.txt" +fi + + +if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="curl --connect-timeout 30 --retry 0 -s -i -X PUT -d \"@${CLAIMING_DIR}/tmpin.txt\"" + if [ "${NOPROXY}" = "yes" ] ; then + URLCOMMAND="${URLCOMMAND} -x \"\"" + elif [ -n "${PROXY}" ] ; then + URLCOMMAND="${URLCOMMAND} -x \"${PROXY}\"" + fi +else + URLCOMMAND="wget -T 15 -O - -q --server-response --content-on-error=on --method=PUT \ + --body-file=\"${CLAIMING_DIR}/tmpin.txt\"" + if [ "${NOPROXY}" = "yes" ] ; then + URLCOMMAND="${URLCOMMAND} --no-proxy" + elif [ "${PROXY:0:4}" = http ] ; then + URLCOMMAND="export http_proxy=${PROXY}; ${URLCOMMAND}" + fi +fi + +if [ "${INSECURE}" == 1 ] ; then + if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="${URLCOMMAND} --insecure" + else + URLCOMMAND="${URLCOMMAND} --no-check-certificate" + fi +fi + +if [ -r "${CLOUD_CERTIFICATE_FILE}" ] ; then + if [ "${URLTOOL}" = "curl" ] ; then + URLCOMMAND="${URLCOMMAND} --cacert \"${CLOUD_CERTIFICATE_FILE}\"" + else + URLCOMMAND="${URLCOMMAND} --ca-certificate \"${CLOUD_CERTIFICATE_FILE}\"" + fi +fi + +if [ "${VERBOSE}" == 1 ]; then + echo "${URLCOMMAND} \"${TARGET_URL}\"" +fi + +attempt_contact () { + if [ "${URLTOOL}" = "curl" ] ; then + eval "${URLCOMMAND} \"${TARGET_URL}\"" >"${CLAIMING_DIR}/tmpout.txt" + else + eval "${URLCOMMAND} \"${TARGET_URL}\"" >"${CLAIMING_DIR}/tmpout.txt" 2>&1 + fi + URLCOMMAND_EXIT_CODE=$? + if [ "${URLTOOL}" = "wget" ] && [ "${URLCOMMAND_EXIT_CODE}" -eq 8 ] ; then + # We consider the server issuing an error response a successful attempt at communicating + URLCOMMAND_EXIT_CODE=0 + fi + + # Check if URLCOMMAND connected and received reply + if [ "${URLCOMMAND_EXIT_CODE}" -ne 0 ] ; then + echo >&2 "Failed to connect to ${URL_BASE}, return code ${URLCOMMAND_EXIT_CODE}" + rm -f "${CLAIMING_DIR}/tmpout.txt" + return 4 + fi + + if [ "${VERBOSE}" == 1 ] ; then + echo "Response from server:" + cat "${CLAIMING_DIR}/tmpout.txt" + fi + + return 0 +} + +for i in {1..3} +do + if attempt_contact ; then + echo "Connection attempt $i successful" + break + fi + echo "Connection attempt $i failed. Retry in ${i}s." + if [ "$i" -eq 5 ] ; then + rm -f "${CLAIMING_DIR}/tmpin.txt" + exit 4 + fi + sleep "$i" +done + +rm -f "${CLAIMING_DIR}/tmpin.txt" + +ERROR_KEY=$(grep "\"errorMsgKey\":" "${CLAIMING_DIR}/tmpout.txt" | awk -F "errorMsgKey\":\"" '{print $2}' | awk -F "\"" '{print $1}') +case ${ERROR_KEY} in + "ErrInvalidNodeID") EXIT_CODE=8 ;; + "ErrInvalidNodeName") EXIT_CODE=9 ;; + "ErrInvalidRoomID") EXIT_CODE=10 ;; + "ErrInvalidPublicKey") EXIT_CODE=11 ;; + "ErrForbidden") EXIT_CODE=12 ;; + "ErrAlreadyClaimed") EXIT_CODE=13 ;; + "ErrProcessingClaim") EXIT_CODE=14 ;; + "ErrInternalServerError") EXIT_CODE=15 ;; + "ErrGatewayTimeout") EXIT_CODE=16 ;; + "ErrServiceUnavailable") EXIT_CODE=17 ;; + *) EXIT_CODE=7 ;; +esac + +HTTP_STATUS_CODE=$(grep "HTTP" "${CLAIMING_DIR}/tmpout.txt" | tail -1 | awk -F " " '{print $2}') +if [ "${HTTP_STATUS_CODE}" = "204" ] ; then + EXIT_CODE=0 +fi + +if [ "${HTTP_STATUS_CODE}" = "204" ] || [ "${ERROR_KEY}" = "ErrAlreadyClaimed" ] ; then + rm -f "${CLAIMING_DIR}/tmpout.txt" + if [ "${HTTP_STATUS_CODE}" = "204" ] ; then + echo -n "${ID}" >"${CLAIMING_DIR}/claimed_id" || (echo >&2 "Claiming failed"; set -e; exit 2) + fi + rm -f "${CLAIMING_DIR}/token" || (echo >&2 "Claiming failed"; set -e; exit 2) + + # Rewrite the cloud.conf on the disk + cat > "$CLAIMING_DIR/cloud.conf" <<HERE_DOC +[global] + enabled = yes + cloud base url = $URL_BASE +HERE_DOC + if [ "$EUID" == "0" ]; then + chown -R "${NETDATA_USER}:${NETDATA_USER}" ${CLAIMING_DIR} || (echo >&2 "Claiming failed"; set -e; exit 2) + fi + if [ "${RELOAD}" == "0" ] ; then + exit $EXIT_CODE + fi + + if [ -z "${PROXY}" ]; then + PROXYMSG="" + else + PROXYMSG="You have attempted to claim this node through a proxy - please update your the proxy setting in your netdata.conf to ${PROXY}. " + fi + # Update cloud.conf in the agent memory + @sbindir_POST@/netdatacli write-config 'cloud|global|enabled|yes' && \ + @sbindir_POST@/netdatacli write-config "cloud|global|cloud base url|$URL_BASE" && \ + @sbindir_POST@/netdatacli reload-claiming-state && \ + if [ "${HTTP_STATUS_CODE}" = "204" ] ; then + echo >&2 "${PROXYMSG}Node was successfully claimed." + else + echo >&2 "The agent cloud base url is set to the url provided." + echo >&2 "The cloud may have different credentials already registered for this agent ID and it cannot be reclaimed under different credentials for security reasons. If you are unable to connect use -id=\$(uuidgen) to overwrite this agent ID with a fresh value if the original credentials cannot be restored." + echo >&2 "${PROXYMSG}Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" + fi && exit $EXIT_CODE + + if [ "${ERROR_KEY}" = "ErrAlreadyClaimed" ] ; then + echo >&2 "The cloud may have different credentials already registered for this agent ID and it cannot be reclaimed under different credentials for security reasons. If you are unable to connect use -id=\$(uuidgen) to overwrite this agent ID with a fresh value if the original credentials cannot be restored." + echo >&2 "${PROXYMSG}Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" + exit $EXIT_CODE + fi + echo >&2 "${PROXYMSG}The claim was successful but the agent could not be notified ($?)- it requires a restart to connect to the cloud." + [ "$NETDATA_RUNNING" -eq 0 ] && exit 0 || exit 5 +fi + +echo >&2 "Failed to claim node with the following error message:\"${ERROR_MESSAGES[$EXIT_CODE]}\"" +if [ "${VERBOSE}" == 1 ]; then + echo >&2 "Error key was:\"${ERROR_KEYS[$EXIT_CODE]}\"" +fi +rm -f "${CLAIMING_DIR}/tmpout.txt" +exit $EXIT_CODE |