diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 11:08:07 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 11:08:07 +0000 |
commit | c69cb8cc094cc916adbc516b09e944cd3d137c01 (patch) | |
tree | f2878ec41fb6d0e3613906c6722fc02b934eeb80 /registry/README.md | |
parent | Initial commit. (diff) | |
download | netdata-c69cb8cc094cc916adbc516b09e944cd3d137c01.tar.xz netdata-c69cb8cc094cc916adbc516b09e944cd3d137c01.zip |
Adding upstream version 1.29.3.upstream/1.29.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'registry/README.md')
-rw-r--r-- | registry/README.md | 203 |
1 files changed, 203 insertions, 0 deletions
diff --git a/registry/README.md b/registry/README.md new file mode 100644 index 0000000..968292c --- /dev/null +++ b/registry/README.md @@ -0,0 +1,203 @@ +<!-- +title: "Registry" +description: "Netdata utilizes a central registry of machines/person GUIDs, URLs, and opt-in account information to provide unified cross-server dashboards." +custom_edit_url: https://github.com/netdata/netdata/edit/master/registry/README.md +--> + +# Registry + +Netdata provides distributed monitoring. + +Traditional monitoring solutions centralize all the data to provide unified dashboards across all servers. Before +Netdata, this was the standard practice. However it has a few issues: + +1. due to the resources required, the number of metrics collected is limited. +2. for the same reason, the data collection frequency is not that high, at best it will be once every 10 or 15 seconds, + at worst every 5 or 10 mins. +3. the central monitoring solution needs dedicated resources, thus becoming "another bottleneck" in the whole + ecosystem. It also requires maintenance, administration, etc. +4. most centralized monitoring solutions are usually only good for presenting _statistics of past performance_ (i.e. + cannot be used for real-time performance troubleshooting). + +Netdata follows a different approach: + +1. data collection happens per second +2. thousands of metrics per server are collected +3. data do not leave the server where they are collected +4. Netdata servers do not talk to each other +5. your browser connects all the Netdata servers + +Using Netdata, your monitoring infrastructure is embedded on each server, limiting significantly the need of additional +resources. Netdata is blazingly fast, very resource efficient and utilizes server resources that already exist and are +spare (on each server). This allows **scaling out** the monitoring infrastructure. + +However, the Netdata approach introduces a few new issues that need to be addressed, one being **the list of Netdata we +have installed**, i.e. the URLs our Netdata servers are listening. + +To solve this, Netdata utilizes a **central registry**. This registry, together with certain browser features, allow +Netdata to provide unified cross-server dashboards. For example, when you jump from server to server using the node +menu, several session settings (like the currently viewed charts, the current zoom and pan operations on the charts, +etc.) are propagated to the new server, so that the new dashboard will come with exactly the same view. + +## What data does the registry store? + +The registry keeps track of 4 entities: + +1. **machines**: i.e. the Netdata installations (a random GUID generated by each Netdata the first time it starts; we + call this **machine_guid**) + + For each Netdata installation (each `machine_guid`) the registry keeps track of the different URLs it has accessed. + +2. **persons**: i.e. the web browsers accessing the Netdata installations (a random GUID generated by the registry the + first time it sees a new web browser; we call this **person_guid**) + + For each person, the registry keeps track of the Netdata installations it has accessed and their URLs. + +3. **URLs** of Netdata installations (as seen by the web browsers) + + For each URL, the registry keeps the URL and nothing more. Each URL is linked to _persons_ and _machines_. The only + way to find a URL is to know its **machine_guid** or have a **person_guid** it is linked to it. + +4. **accounts**: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the + method, this may include an email, or an email and a profile picture or avatar. + +For _persons_/_accounts_ and _machines_, the registry keeps links to _URLs_, each link with 2 timestamps (first time +seen, last time seen) and a counter (number of times it has been seen). *machines_, _persons_ and timestamps are stored +in the Netdata registry regardless of whether you sign in or not. + +## Who talks to the registry? + +Your web browser **only**! If sending this information is against your policies, you can [run your own +registry](#run-your-own-registry) + +Your Netdata servers do not talk to the registry. This is a UML diagram of its operation: + +![registry](https://cloud.githubusercontent.com/assets/2662304/19448565/11a70632-94ab-11e6-9d80-f410b4acb797.png) + +## Which is the default registry? + +`https://registry.my-netdata.io`, which is currently served by `https://london.my-netdata.io`. This registry listens to +both HTTP and HTTPS requests but the default is HTTPS. + +### Can this registry handle the global load of Netdata installations? + +Yeap! The registry can handle 50.000 - 100.000 requests **per second per core** (depending on the type of CPU, the +computer's memory bandwidth, etc). 50.000 is on J1900 (celeron 2Ghz). + +We believe, it can do it... + +## Run your own registry + +**Every Netdata can be a registry**. Just pick one and configure it. + +**To turn any Netdata into a registry**, edit `/etc/netdata/netdata.conf` and set: + +```conf +[registry] + enabled = yes + registry to announce = http://your.registry:19999 +``` + +Restart your Netdata to activate it. + +Then, you need to tell **all your other Netdata servers to advertise your registry**, instead of the default. To do +this, on each of your Netdata servers, edit `/etc/netdata/netdata.conf` and set: + +```conf +[registry] + enabled = no + registry to announce = http://your.registry:19999 +``` + +Note that we have not enabled the registry on the other servers. Only one Netdata (the registry) needs +`[registry].enabled = yes`. + +This is it. You have your registry now. + +You may also want to give your server different names under the node menu (i.e. to have them sorted / grouped). You can +change its registry name, by setting on each Netdata server: + +```conf +[registry] + registry hostname = Group1 - Master DB +``` + +So this server will appear in the node menu as `Group1 - Master DB`. The max name length is 50 characters. + +### Limiting access to the registry + +Netdata v1.9+ support limiting access to the registry from given IPs, like this: + +```conf +[registry] + allow from = * +``` + +`allow from` settings are [Netdata simple patterns](/libnetdata/simple_pattern/README.md): string matches that use `*` +as wildcard (any number of times) and a `!` prefix for a negative match. So: `allow from = !10.1.2.3 10.*` will allow +all IPs in `10.*` except `10.1.2.3`. The order is important: left to right, the first positive or negative match is +used. + +Keep in mind that connections to Netdata API ports are filtered by `[web].allow connections from`. So, IPs allowed by +`[registry].allow from` should also be allowed by `[web].allow connection from`. + +The patterns can be matches over IP addresses or FQDN of the host. In order to check the FQDN of the connection without +opening the Netdata agent to DNS-spoofing, a reverse-dns record must be setup for the connecting host. At connection +time the reverse-dns of the peer IP address is resolved, and a forward DNS resolution is made to validate the IP address +against the name-pattern. + +Please note that this process can be expensive on a machine that is serving many connections. The behaviour of the +pattern matching can be controlled with the following setting: + +```conf +[registry] + allow by dns = heuristic +``` + +The settings are: +- `yes` allows the pattern to match DNS names. +- `no` disables DNS matching for the patterns (they only match IP addresses). +- `heuristic` will estimate if the patterns should match FQDNs by the presence or absence of `:`s or alpha-characters. + +### Where is the registry database stored? + +`/var/lib/netdata/registry/*.db` + +There can be up to 2 files: + +- `registry-log.db`, the transaction log + + all incoming requests that affect the registry are saved in this file in real-time. + +- `registry.db`, the database + + every `[registry].registry save db every new entries` entries in `registry-log.db`, Netdata will save its database + to `registry.db` and empty `registry-log.db`. + +Both files are machine readable text files. + +## The future + +The registry opens a whole world of new possibilities for Netdata. Check here what we think: +<https://github.com/netdata/netdata/issues/416> + +## Troubleshooting the registry + +The registry URL should be set to the URL of a Netdata dashboard. This server has to have `[registry].enabled = yes`. +So, accessing the registry URL directly with your web browser, should present the dashboard of the Netdata operating the +registry. + +To use the registry, your web browser needs to support **third party cookies**, since the cookies are set by the +registry while you are browsing the dashboard of another Netdata server. The registry, the first time it sees a new web +browser it tries to figure if the web browser has cookies enabled or not. It does this by setting a cookie and +redirecting the browser back to itself hoping that it will receive the cookie. If it does not receive the cookie, the +registry will keep redirecting your web browser back to itself, which after a few redirects will fail with an error like +this: + +```conf +ERROR 409: Cannot ACCESS netdata registry: https://registry.my-netdata.io responded with: {"status":"redirect","registry":"https://registry.my-netdata.io"} +``` + +This error is printed on your web browser console (press F12 on your browser to see it). + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fregistry%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) |