summaryrefslogtreecommitdiffstats
path: root/collectors/tc.plugin/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2018-11-07 12:19:29 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2018-11-07 12:20:17 +0000
commita64a253794ac64cb40befee54db53bde17dd0d49 (patch)
treec1024acc5f6e508814b944d99f112259bb28b1be /collectors/tc.plugin/README.md
parentNew upstream version 1.10.0+dfsg (diff)
downloadnetdata-a64a253794ac64cb40befee54db53bde17dd0d49.tar.xz
netdata-a64a253794ac64cb40befee54db53bde17dd0d49.zip
New upstream version 1.11.0+dfsgupstream/1.11.0+dfsg
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--collectors/tc.plugin/README.md183
1 files changed, 183 insertions, 0 deletions
diff --git a/collectors/tc.plugin/README.md b/collectors/tc.plugin/README.md
new file mode 100644
index 000000000..6670c491f
--- /dev/null
+++ b/collectors/tc.plugin/README.md
@@ -0,0 +1,183 @@
+## tc.plugin
+
+Live demo - **[see it in action here](https://registry.my-netdata.io/#menu_tc)** !
+
+![qos](https://cloud.githubusercontent.com/assets/2662304/14439411/b7f36254-0033-11e6-93f0-c739bb6a1c3a.gif)
+
+Netdata monitors `tc` QoS classes for all interfaces.
+
+If you also use [FireQOS](http://firehol.org/tutorial/fireqos-new-user/) it will collect
+interface and class names.
+
+There is a [shell helper](tc-qos-helper.sh.in) for this (all parsing is done by the plugin
+in `C` code - this shell script is just a configuration for the command to run to get `tc` output).
+
+The source of the tc plugin is [here](plugin_tc.c). It is somewhat complex, because a state
+machine was needed to keep track of all the `tc` classes, including the pseudo classes tc
+dynamically creates.
+
+## Motivation
+
+One category of metrics missing in Linux monitoring, is bandwidth consumption for each open
+socket (inbound and outbound traffic). So, you cannot tell how much bandwidth your web server,
+your database server, your backup, your ssh sessions, etc are using.
+
+To solve this problem, the most *adventurous* Linux monitoring tools install kernel modules to
+capture all traffic, analyze it and provide reports per application. A lot of work, CPU intensive
+and with a great degree of risk (due to the kernel modules involved which might affect the
+stability of the whole system). Not to mention that such solutions are probably better suited
+for a core linux router in your network.
+
+Others use NFACCT, the netfilter accounting module which is already part of the Linux firewall.
+However, this would require configuring a firewall on every system you want to measure bandwidth.
+
+QoS monitoring attempts to solve this in a much cleaner way.
+
+## Introduction to QoS
+
+One of the features the Linux kernel has, but it is rarely used, is its ability to
+**apply QoS on traffic**. Even most interesting is that it can apply QoS to **both inbound and
+outbound traffic**.
+
+QoS is about 2 features:
+
+1. **Classify traffic**
+
+ Classification is the process of organizing traffic in groups, called **classes**.
+ Classification can evaluate every aspect of network packets, like source and destination ports,
+ source and destination IPs, netfilter marks, etc.
+
+ When you classify traffic, you just assign a label to it. For example **I call `web server`
+ traffic, the traffic from my server's tcp/80, tcp/443 and to my server's tcp/80, tcp/443,
+ while I call `web surfing` all other tcp/80 and tcp/443 traffic**. You can use any combinations
+ you like. There is no limit.
+
+2. **Apply traffic shaping rules to these classes**
+
+ Traffic shaping is used to control how network interface bandwidth should be shared among the
+ classes. Of course we are not interested for this feature to just monitor the traffic.
+ Classification will be enough for monitoring everything.
+
+The key reasons of applying QoS on all servers (even cloud ones) are:
+
+ - **ensure administrative tasks (like ssh, dns, etc) will always have a small but guaranteed
+ bandwidth.** QoS can guarantee that services like ssh, dns, ntp, etc will always have a small
+ supply of bandwidth. So, no matter what happens, you will be able to ssh to your server and
+ DNS will always work.
+
+ - **ensure other administrative tasks will not monopolize all the available bandwidth.**
+ Services like backups, file copies, database dumps, etc can easily monopolize all the
+ available bandwidth. It is common for example a nightly backup, or a huge file transfer
+ to negatively influence the end-user experience. QoS can fix that.
+
+ - **ensure each end-user connection will get a fair cut of the available bandwidth.**
+ Several QoS queuing disciplines in Linux do this automatically, without any configuration from you.
+ The result is that new sockets are favored over older ones, so that users will get a snappier
+ experience, while others are transferring large amounts of traffic.
+
+ - **protect the servers from DDoS attacks.**
+ When your system is under a DDoS attack, it will get a lot more bandwidth compared to the one it
+ can handle and probably your applications will crash. Setting a limit on the inbound traffic using
+ QoS, will protect your servers (throttle the requests) and depending on the size of the attack may
+ allow your legitimate users to access the server, while the attack is taking place.
+
+
+Once **traffic classification** is applied, netdata can visualize the bandwidth consumption per
+class in real-time (no configuration is needed for netdata - it will figure it out).
+
+QoS, is extremely light. You will configure it once, and this is it. It will not bother you again
+and it will not use any noticeable CPU resources, especially on application and database servers.
+
+## QoS in Linux? Have you lost your mind?
+
+Yes I know... but no, I have not!
+
+Of course, `tc` is probably **the most undocumented, complicated and unfriendly** command in Linux.
+
+For example, for matching a simple port range in `tc`, e.g. all the high ports, from 1025 to 65535
+inclusive, you have to match these:
+
+```
+1025/0xffff 1026/0xfffe 1028/0xfffc 1032/0xfff8 1040/0xfff0
+1056/0xffe0 1088/0xffc0 1152/0xff80 1280/0xff00 1536/0xfe00
+2048/0xf800 4096/0xf000 8192/0xe000 16384/0xc000 32768/0x8000
+```
+
+I know what you are thinking right now! **And I agree!**
+
+This is why I wrote **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, a tool to
+simplify QoS management in Linux.
+
+The **[FireHOL](https://firehol.org/)** package already distributes **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**.
+Check the **[FireQOS tutorial](https://firehol.org/tutorial/fireqos-new-user/)**
+to learn how to write your own QoS configuration.
+
+With **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, it is **really simple for everyone
+to use QoS in Linux**. Just install the package `firehol`. It should already be available for your
+distribution. If not, check the **[FireHOL Installation Guide](https://firehol.org/installing/)**.
+After that, you will have the `fireqos` command.
+
+This is the file `/etc/firehol/fireqos.conf` we use at the netdata demo site:
+
+```sh
+ # configure the netdata ports
+ server_netdata_ports="tcp/19999"
+
+ interface eth0 world bidirectional ethernet balanced rate 50Mbit
+ class arp
+ match arp
+
+ class icmp
+ match icmp
+
+ class dns commit 1Mbit
+ server dns
+ client dns
+
+ class ntp
+ server ntp
+ client ntp
+
+ class ssh commit 2Mbit
+ server ssh
+ client ssh
+
+ class rsync commit 2Mbit max 10Mbit
+ server rsync
+ client rsync
+
+ class web_server commit 40Mbit
+ server http
+ server netdata
+
+ class client
+ client surfing
+
+ class nms commit 1Mbit
+ match input src 10.2.3.5
+```
+
+Nothing more is needed. You just run `fireqos start` to apply this configuration, restart netdata
+and you have real-time visualization of the bandwidth consumption of your applications. FireQOS is
+not a daemon. It will just convert the configuration to `tc` commands. It will run them and it will
+exit.
+
+**IMPORTANT**: If you copy this configuration to apply it to your system, please adapt the
+speeds - experiment in non-production environments to learn the tool, before applying it on
+your servers.
+
+And this is what you are going to get:
+
+![image](https://cloud.githubusercontent.com/assets/2662304/14436322/c91d90a4-0024-11e6-9fb1-57cdef1580df.png)
+
+---
+
+## More examples:
+
+This is QoS from a linux router. Check these features:
+
+1. It is real-time (per second updates)
+2. QoS really works in Linux - check that the `background` traffic is squeezed when `surfing` needs it.
+
+![test2](https://cloud.githubusercontent.com/assets/2662304/14093004/68966020-f553-11e5-98fe-ffee2086fafd.gif)
+