summaryrefslogtreecommitdiffstats
path: root/docs/guides/troubleshoot
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md (renamed from docs/guides/troubleshoot/monitor-debug-applications-ebpf.md)24
-rw-r--r--docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md147
2 files changed, 12 insertions, 159 deletions
diff --git a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md
index 728606c83..91d2a2ef2 100644
--- a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md
+++ b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md
@@ -12,7 +12,7 @@ learn_rel_path: "Operations"
When trying to troubleshoot or debug a finicky application, there's no such thing as too much information. At Netdata,
we developed programs that connect to the [_extended Berkeley Packet Filter_ (eBPF) virtual
-machine](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md) to help you see exactly how specific applications are interacting with the
+machine](/src/collectors/ebpf.plugin/README.md) to help you see exactly how specific applications are interacting with the
Linux kernel. With these charts, you can root out bugs, discover optimizations, diagnose memory leaks, and much more.
This means you can see exactly how often, and in what volume, the application creates processes, opens files, writes to
@@ -29,7 +29,7 @@ To start troubleshooting an application with eBPF metrics, you need to ensure yo
displays those metrics independent from any other process.
You can use the `apps_groups.conf` file to configure which applications appear in charts generated by
-[`apps.plugin`](https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md). Once you edit this file and create a new group for the application
+[`apps.plugin`](/src/collectors/apps.plugin/README.md). Once you edit this file and create a new group for the application
you want to monitor, you can see how it's interacting with the Linux kernel via real-time eBPF metrics.
Let's assume you have an application that runs on the process `custom-app`. To monitor eBPF metrics for that application
@@ -61,12 +61,12 @@ dev: custom-app
```
Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate
-method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to begin seeing metrics for this particular
+method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to begin seeing metrics for this particular
group+process. You can also add additional processes to the same group.
You can set up `apps_groups.conf` to more show more precise eBPF metrics for any application or service running on your
system, even if it's a standard package like Redis, Apache, or any other [application/service Netdata collects
-from](https://github.com/netdata/netdata/blob/master/src/collectors/COLLECTORS.md).
+from](/src/collectors/COLLECTORS.md).
```conf
# -----------------------------------------------------------------------------
@@ -86,7 +86,7 @@ to show other charts that will help you debug and troubleshoot how it interacts
## Configure the eBPF collector to monitor errors
-The eBPF collector has [two possible modes](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md#ebpf-load-mode): `entry` and `return`. The default
+The eBPF collector has [two possible modes](/src/collectors/ebpf.plugin/README.md#ebpf-load-mode): `entry` and `return`. The default
is `entry`, and only monitors calls to kernel functions, but the `return` also monitors and charts _whether these calls
return in error_.
@@ -110,7 +110,7 @@ Replace `entry` with `return`:
```
Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate
-method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system.
+method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system.
## Get familiar with per-application eBPF metrics and charts
@@ -122,7 +122,7 @@ Pay particular attention to the charts in the **ebpf file**, **ebpf syscall**, *
sub-sections. These charts are populated by low-level Linux kernel metrics thanks to eBPF, and showcase the volume of
calls to open/close files, call functions like `do_fork`, IO activity on the VFS, and much more.
-See the [eBPF collector documentation](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md#integration-with-appsplugin) for the full list
+See the [eBPF collector documentation](/src/collectors/ebpf.plugin/README.md#integration-with-appsplugin) for the full list
of per-application charts.
Let's show some examples of how you can first identify normal eBPF patterns, then use that knowledge to identify
@@ -239,16 +239,16 @@ same application on multiple systems and want to correlate how it performs on ea
findings with someone else on your team.
If you don't already have a Netdata Cloud account, go [sign in](https://app.netdata.cloud) and get started for free.
-You can also read how to [monitor your infrastructure with Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) to understand the key features that it has to offer.
+You can also read how to [monitor your infrastructure with Netdata Cloud](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md) to understand the key features that it has to offer.
-Once you've added one or more nodes to a Space in Netdata Cloud, you can see aggregated eBPF metrics in the [Overview
-dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) under the same **Applications** or **eBPF** sections that you
-find on the local Agent dashboard. Or, [create new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) using eBPF metrics
+Once you've added one or more nodes to a Space in Netdata Cloud, you can see aggregated eBPF metrics in the Overview
+dashboard under the same **Applications** or **eBPF** sections that you
+find on the local Agent dashboard. Or, [create new dashboards](/docs/dashboards-and-charts/dashboards-tab.md) using eBPF metrics
from any number of distributed nodes to see how your application interacts with multiple Linux kernels on multiple Linux
systems.
Now that you can see eBPF metrics in Netdata Cloud, you can [invite your
-team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/organize-your-infrastrucutre-invite-your-team.md#invite-your-team) and share your findings with others.
+team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#invite-your-team) and share your findings with others.
diff --git a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md
deleted file mode 100644
index 0c9962ba2..000000000
--- a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md
+++ /dev/null
@@ -1,147 +0,0 @@
-# Troubleshoot Agent-Cloud connectivity issues
-
-Learn how to troubleshoot connectivity issues leading to agents not appearing at all in Netdata Cloud, or
-appearing with a status other than `live`.
-
-After installing an agent with the claiming token provided by Netdata Cloud, you should see charts from that node on
-Netdata Cloud within seconds. If you don't see charts, check if the node appears in the list of nodes
-(Nodes tab, top right Node filter, or Manage Nodes screen). If your node does not appear in the list, or it does appear with a status other than "Live", this guide will help you troubleshoot what's happening.
-
- The most common explanation for connectivity issues usually falls into one of the following three categories:
-
-- If the node does not appear at all in Netdata Cloud, [the claiming process was unsuccessful](#the-claiming-process-was-unsuccessful).
-- If the node appears as in Netdata Cloud, but is in the "Unseen" state, [the Agent was claimed but can not connect](#the-agent-was-claimed-but-can-not-connect).
-- If the node appears as in Netdata Cloud as "Offline" or "Stale", it is a [previously connected agent that can no longer connect](#previously-connected-agent-that-can-no-longer-connect).
-
-## The claiming process was unsuccessful
-
-If the claiming process fails, the node will not appear at all in Netdata Cloud.
-
-First ensure that you:
-- Use the newest possible stable or nightly version of the agent (at least v1.32).
-- Your node can successfully issue an HTTPS request to https://app.netdata.cloud
-
-Other possible causes differ between kickstart installations and Docker installations.
-
-### Verify your node can access Netdata Cloud
-
-If you run either `curl` or `wget` to do an HTTPS request to https://app.netdata.cloud, you should get
-back a 404 response. If you do not, check your network connectivity, domain resolution,
-and firewall settings for outbound connections.
-
-If your firewall is configured to completely prevent outbound connections, you need to whitelist `app.netdata.cloud` and `mqtt.netdata.cloud`. If you can't whitelist domains in your firewall, you can whitelist the IPs that the hostnames resolve to, but keep in mind that they can change without any notice.
-
-If you use an outbound proxy, you need to [take some extra steps]( https://github.com/netdata/netdata/blob/master/src/claim/README.md#connect-through-a-proxy).
-
-### Troubleshoot claiming with kickstart.sh
-
-Claiming is done by executing `netdata-claim.sh`, a script that is usually located under `${INSTALL_PREFIX}/netdata/usr/sbin/netdata-claim.sh`. Possible error conditions we have identified are:
-- No script found at all in any of our search paths.
-- The path where the claiming script should be does not exist.
-- The path exists, but is not a file.
-- The path is a file, but is not executable.
-Check the output of the kickstart script for any reported errors claiming and verify that the claiming script exists
-and can be executed.
-
-### Troubleshoot claiming with Docker
-
-First verify that the NETDATA_CLAIM_TOKEN parameter is correctly configured and then check for any errors during
-initialization of the container.
-
-The most common issue we have seen claiming nodes in Docker is [running on older hosts with seccomp enabled](https://github.com/netdata/netdata/blob/master/src/claim/README.md#known-issues-on-older-hosts-with-seccomp-enabled).
-
-## The Agent was claimed but can not connect
-
-Agents that appear on the cloud with state "Unseen" have successfully been claimed, but have never
-been able to successfully establish an ACLK connection.
-
-Agents that appear with state "Offline" or "Stale" were able to connect at some point, but are currently not
-connected. The difference between the two is that "Stale" nodes had some of their data replicated to a
-parent node that is still connected.
-
-### Verify that the agent is running
-
-#### Troubleshoot connection establishment with kickstart.sh
-
-The kickstart script will install/update your Agent and then try to claim the node to the Cloud
-(if tokens are provided). To complete the second part, the Agent must be running. In some platforms,
-the Netdata service cannot be enabled by default and you must do it manually, using the following steps:
-
-1. Check if the Agent is running:
-
- ```bash
- systemctl status netdata
- ```
-
- The expected output should contain info like this:
-
- ```bash
- Active: active (running) since Wed 2022-07-06 12:25:02 EEST; 1h 40min ago
- ```
-
-2. Enable and start the Netdata Service.
-
- ```bash
- systemctl enable netdata
- systemctl start netdata
- ```
-
-3. Retry the kickstart claiming process.
-
-> ### Note
->
-> In some cases a simple restart of the Agent can fix the issue.
-> Read more about [Starting, Stopping and Restarting the Agent](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation).
-
-#### Troubleshoot connection establishment with Docker
-
-If a Netdata container exits or is killed before it properly starts, it may be able to complete the claiming
-process, but not have enough time to establish the ACLK connection.
-
-### Verify that your firewall allows websockets
-
-The agent initiates an SSL connection to `app.netdata.cloud` and then upgrades that connection to use secure
-websockets. Some firewalls completely prevent the use of websockets, even for outbound connections.
-
-## Previously connected agent that can no longer connect
-
-The states "Offline" and "Stale" suggest that the agent was able to connect at some point in the past, but
-that it is currently not connected.
-
-### Verify that network connectivity is still possible
-
-Verify that you can still issue HTTPS requests to app.netdata.cloud and that no firewall or proxy changes were made.
-
-### Verify that the claiming info is persisted
-
-If you use Docker, verify that the contents of `/var/lib/netdata` are preserved across container restarts, using a persistent volume.
-
-### Verify that the claiming info is not cloned
-
-A relatively common case we have seen especially with VMs is two or more nodes sharing the same credentials.
-This happens if you claim a node in a VM and then create an image based on that node. Netdata can't properly
-work this way, as we have unique node identification information under `/var/lib/netdata`.
-
-### Verify that your IP is not blocked by Netdata Cloud
-
-Most of the nodes change IPs dynamically. It is possible that your current IP has been restricted from accessing `app.netdata.cloud` due to security concerns, usually because it was spamming Netdata Coud with too many
-failed requests (old versions of the agent).
-
-To verify this:
-
-1. Check the Agent's `aclk-state`.
-
- ```bash
- sudo netdatacli aclk-state | grep "Banned By Cloud"
- ```
-
- The output will contain a line indicating if the IP is banned from `app.netdata.cloud`:
-
- ```bash
- Banned By Cloud: yes
- ```
-
-2. If your node's IP is banned, you can:
-
- - Contact our team to whitelist your IP by submitting a ticket in the [Netdata forum](https://community.netdata.cloud/)
- - Change your node's IP