diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/actions/actions-general | 256 | ||||
-rw-r--r-- | doc/actions/gact-usage | 78 | ||||
-rw-r--r-- | doc/actions/ifb-README | 125 | ||||
-rw-r--r-- | doc/actions/mirred-usage | 164 |
4 files changed, 623 insertions, 0 deletions
diff --git a/doc/actions/actions-general b/doc/actions/actions-general new file mode 100644 index 0000000..a0074a5 --- /dev/null +++ b/doc/actions/actions-general @@ -0,0 +1,256 @@ + +This documented is slightly dated but should give you idea of how things +work. + +What is it? +----------- + +An extension to the filtering/classification architecture of Linux Traffic +Control. +Up to 2.6.8 the only action that could be "attached" to a filter was policing. +i.e you could say something like: + +----- +tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \ +127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k +----- + +which implies "if a packet is seen on the ingress of the lo device with +a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and +we execute a policing action which rate limits its bandwidth utilization +to 1.5Mbps". + +The new extensions allow for more than just policing actions to be added. +They are also fully backward compatible. If you have a kernel that doesn't +understand them, then the effect is null i.e if you have a newer tc +but older kernel, the actions are not installed. Likewise if you +have a newer kernel but older tc, obviously the tc will use current +syntax which will work fine. Of course to get the required effect you need +both newer tc and kernel. If you are reading this you have the +right tc ;-> + +A side effect is that we can now get stateless firewalling to work with tc. +Essentially this is now an alternative to iptables. +I won't go into details of my dislike for iptables at times, but +scalability is one of the main issues; however, if you need stateful +classification - use netfilter (for now). + +This stuff works on both ingress and egress qdiscs. + +Features +-------- + +1) new additional syntax and actions enabled. Note old syntax is still valid. + +Essentially this is still the same syntax as tc with a new construct +"action". The syntax is of the form: +tc filter add <DEVICE> parent 1:0 protocol ip prio 10 <Filter description> +flowid 1:1 action <ACTION description>* + +You can have as many actions as you want (within sensible reasoning). + +In the past the only real action was the policer; i.e you could do something +along the lines of: +tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ +match ip src 127.0.0.1/32 flowid 1:1 \ +police mtu 4000 rate 1500kbit burst 90k + +Although you can still use the same syntax, now you can say: + +tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \ +match ip src 127.0.0.1/32 flowid 1:1 \ +action police mtu 4000 rate 1500kbit burst 90k + +" generic Actions" (gact) at the moment are: +{ drop, pass, reclassify, continue} +(If you have others, no listed here give me a reason and we will add them) ++drop says to drop the packet ++pass and ok (are equivalent) says to accept it ++reclassify requests for reclassification of the packet ++continue requests for next lookup to match + +2)In order to take advantage of some of the targets written by the +iptables people, a classifier can have a packet being massaged by an +iptable target. I have only tested with mangler targets up to now. +(in fact anything that is not in the mangling table is disabled right now) + +In terms of hooks: +*ingress is mapped to pre-routing hook +*egress is mapped to post-routing hook +I don't see much value in the other hooks, if you see it and email me good +reasons, the addition is trivial. + +Example syntax for iptables targets usage becomes: +tc filter add ..... u32 <u32 syntax> action ipt -j <iptables target syntax> + +example: +tc filter add dev lo parent ffff: protocol ip prio 8 u32 \ +match ip dst 127.0.0.8/32 flowid 1:12 \ +action ipt -j mark --set-mark 2 + +NOTE: flowid 1:12 is parsed flowid 0x1:0x12. Make sure if you want flowid +decimal 12, then use flowid 1:c. + +3) A feature i call pipe +The motivation is derived from Unix pipe mechanism but applied to packets. +Essentially take a matching packet and pass it through +action1 | action2 | action3 etc. +You could do something similar to this with the tc policer and the "continue" +operator but this rather restricts it to just the policer and requires +multiple rules (and lookups, hence quiet inefficient); + +as an example -- and please note that this is just an example _not_ The +Word Youve Been Waiting For (yes i have had problems giving examples +which ended becoming dogma in documents and people modifying them a little +to look clever); + +i selected the metering rates to be small so that i can show better how +things work. + +The script below does the following: +- an incoming packet from 10.0.0.21 is first given a firewall mark of 1. + +- It is then metered to make sure it does not exceed its allocated rate of +1Kbps. If it doesn't exceed rate, this is where we terminate action execution. + +- If it does exceed its rate, its "color" changes to a mark of 2 and it is +then passed through a second meter. + +-The second meter is shared across all flows on that device [i am surpised +that this seems to be not a well know feature of the policer; Bert was telling +me that someone was writing a qdisc just to do sharing across multiple devices; +it must be the summer heat again; weve had someone doing that every year around +summer -- the key to sharing is to use a operator "index" in your policer +rules (example "index 20"). All your rules have to use the same index to +share.] + +-If the second meter is exceeded the color of the flow changes further to 3. + +-We then pass the packet to another meter which is shared across all devices +in the system. If this meter is exceeded we drop the packet. + +Note the mark can be used further up the system to do things like policy +or more interesting things on the egress. + +------------------ cut here ------------------------------- +# +# Add an ingress qdisc on eth0 +tc qdisc add dev eth0 ingress +# +#if you see an incoming packet from 10.0.0.21 +tc filter add dev eth0 parent ffff: protocol ip prio 1 \ +u32 match ip src 10.0.0.21/32 flowid 1:15 \ +# +# first give it a mark of 1 +action ipt -j mark --set-mark 1 index 2 \ +# +# then pass it through a policer which allows 1kbps; if the flow +# doesn't exceed that rate, this is where we stop, if it exceeds we +# pipe the packet to the next action +action police rate 1kbit burst 9k pipe \ +# +# which marks the packet fwmark as 2 and pipes +action ipt -j mark --set-mark 2 \ +# +# next attempt to borrow b/width from a meter +# used across all flows incoming on eth0("index 30") +# and if that is exceeded we pipe to the next action +action police index 30 mtu 5000 rate 1kbit burst 10k pipe \ +# mark it as fwmark 3 if exceeded +action ipt -j mark --set-mark 3 \ +# and then attempt to borrow from a meter used by all devices in the +# system. Should this be exceeded, drop the packet on the floor. +action police index 20 mtu 5000 rate 1kbit burst 90k drop +--------------------------------- + +Now lets see the actions installed with +"tc filter show parent ffff: dev eth0" + +-------- output ----------- +jroot# tc filter show parent ffff: dev eth0 +filter protocol ip pref 1 u32 +filter protocol ip pref 1 u32 fh 800: ht divisor 1 +filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 + + action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x1 index 2 + + action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb + + action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x2 index 1 + + action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b + + action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x3 index 3 + + action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b + + match 0a000015/ffffffff at 12 +------------------------------- + +Note the ordering of the actions is based on the order in which we entered +them. In the future i will add explicit priorities. + +Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after +you see a few lines of dots + +---- +[root@jzny hadi]# ping -f 10.0.0.22 +PING 10.0.0.22 (10.0.0.22): 56 data bytes +.................................................................................................................................................................................................................................................................................................................................................................................................................................................... +--- 10.0.0.22 ping statistics --- +2248 packets transmitted, 1811 packets received, 19% packet loss +round-trip min/avg/max = 0.7/9.3/20.1 ms +----------------------------- + +Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0" + +-------------- +jroot# tc -s filter show parent ffff: dev eth0 +filter protocol ip pref 1 u32 +filter protocol ip pref 1 u32 fh 800: ht divisor 1 +filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 +5 + + action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x1 index 2 + Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) + + action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb + Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) + + action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x2 index 1 + Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) + + action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b + Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) + + action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x3 index 3 + Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) + + action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b + Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) + + match 0a000015/ffffffff at 12 +------------------------------- + +Neat, eh? + + +Want to write an action module? +------------------------------ +Its easy. Either look at the code or send me email. I will document at +some point; will also accept documentation. + +TODO +---- + +Lotsa goodies/features coming. Requests also being accepted. +At the moment the focus has been on getting the architecture in place. +Expect new things in the spurious time i have to work on this +(particularly around end of year when i have typically get time off +from work). diff --git a/doc/actions/gact-usage b/doc/actions/gact-usage new file mode 100644 index 0000000..7cf48ab --- /dev/null +++ b/doc/actions/gact-usage @@ -0,0 +1,78 @@ + +gact <ACTION> [RAND] [INDEX] + +Where: + ACTION := reclassify | drop | continue | pass | ok + RAND := random <RANDTYPE> <ACTION> <VAL> + RANDTYPE := netrand | determ + VAL : = value not exceeding 10000 + INDEX := index value used + +ACTION semantics +- pass and ok are equivalent to accept +- continue allows one to restart classification lookup +- drop drops packets +- reclassify implies continue classification where we left off + +randomization +-------------- + +At the moment there are only two algorithms. One is deterministic +and the other uses internal kernel netrand. + +Examples: + +Rules can be installed on both ingress and egress - this shows ingress +only + +tc qdisc add dev eth0 ingress + +# example 1 +tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \ +10.0.0.9/32 flowid 1:16 action drop + +ping -c 20 10.0.0.9 + +-- +filter u32 +filter u32 fh 800: ht divisor 1 +filter u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 32 success 20) + match 0a000009/ffffffff at 12 (success 20 ) + action order 1: gact action drop + random type none pass val 0 + index 1 ref 1 bind 1 installed 59 sec used 35 sec + Sent 1680 bytes 20 pkts (dropped 20, overlimits 0 ) + +---- + +# example 2 +#allow 1 out 10 randomly using the netrand generator +tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \ +10.0.0.9/32 flowid 1:16 action drop random netrand ok 10 + +ping -c 20 10.0.0.9 + +---- +filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1filter protocol ip pref 6 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 20 success 20) + match 0a000009/ffffffff at 12 (success 20 ) + action order 1: gact action drop + random type netrand pass val 10 + index 5 ref 1 bind 1 installed 49 sec used 25 sec + Sent 1680 bytes 20 pkts (dropped 16, overlimits 0 ) + +-------- +#alternative: deterministically accept every second packet +tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \ +10.0.0.9/32 flowid 1:16 action drop random determ ok 2 + +ping -c 20 10.0.0.9 + +tc -s filter show parent ffff: dev eth0 +----- +filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1filter protocol ip pref 6 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 20 success 20) + match 0a000009/ffffffff at 12 (success 20 ) + action order 1: gact action drop + random type determ pass val 2 + index 4 ref 1 bind 1 installed 118 sec used 82 sec + Sent 1680 bytes 20 pkts (dropped 10, overlimits 0 ) +----- diff --git a/doc/actions/ifb-README b/doc/actions/ifb-README new file mode 100644 index 0000000..5fe9171 --- /dev/null +++ b/doc/actions/ifb-README @@ -0,0 +1,125 @@ + +IFB is intended to replace IMQ. +Advantage over current IMQ; cleaner in particular in in SMP; +with a _lot_ less code. + +Known IMQ/IFB USES +------------------ + +As far as i know the reasons listed below is why people use IMQ. +It would be nice to know of anything else that i missed. + +1) qdiscs/policies that are per device as opposed to system wide. +IFB allows for sharing. + +2) Allows for queueing incoming traffic for shaping instead of +dropping. I am not aware of any study that shows policing is +worse than shaping in achieving the end goal of rate control. +I would be interested if anyone is experimenting. + +3) Very interesting use: if you are serving p2p you may want to give +preference to your own locally originated traffic (when responses come back) +vs someone using your system to do bittorent. So QoSing based on state +comes in as the solution. What people did to achieve this was stick +the IMQ somewhere prelocal hook. +I think this is a pretty neat feature to have in Linux in general. +(i.e not just for IMQ). +But i won't go back to putting netfilter hooks in the device to satisfy +this. I also don't think its worth it hacking ifb some more to be +aware of say L3 info and play ip rule tricks to achieve this. +--> Instead the plan is to have a conntrack related action. This action will +selectively either query/create conntrack state on incoming packets. +Packets could then be redirected to ifb based on what happens -> eg +on incoming packets; if we find they are of known state we could send to +a different queue than one which didn't have existing state. This +all however is dependent on whatever rules the admin enters. + +At the moment this 3rd function does not exist yet. I have decided that +instead of sitting on the patch for another year, to release it and then +if there is pressure i will add this feature. + +An example, to provide functionality that most people use IMQ for below: + +-------- +export TC="/sbin/tc" + +$TC qdisc add dev ifb0 root handle 1: prio +$TC qdisc add dev ifb0 parent 1:1 handle 10: sfq +$TC qdisc add dev ifb0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000 +$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq +$TC filter add dev ifb0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1 +$TC filter add dev ifb0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2 + +ifconfig ifb0 up + +$TC qdisc add dev eth0 ingress + +# redirect all IP packets arriving in eth0 to ifb0 +# use mark 1 --> puts them onto class 1:1 +$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ +match u32 0 0 flowid 1:1 \ +action ipt -j MARK --set-mark 1 \ +action mirred egress redirect dev ifb0 + +-------- + + +Run A Little test: + +from another machine ping so that you have packets going into the box: +----- +[root@jzny action-tests]# ping 10.22 +PING 10.22 (10.0.0.22): 56 data bytes +64 bytes from 10.0.0.22: icmp_seq=0 ttl=64 time=2.8 ms +64 bytes from 10.0.0.22: icmp_seq=1 ttl=64 time=0.6 ms +64 bytes from 10.0.0.22: icmp_seq=2 ttl=64 time=0.6 ms + +--- 10.22 ping statistics --- +3 packets transmitted, 3 packets received, 0% packet loss +round-trip min/avg/max = 0.6/1.3/2.8 ms +[root@jzny action-tests]# +----- +Now look at some stats: + +--- +[root@jmandrake]:~# $TC -s filter show parent ffff: dev eth0 +filter protocol ip pref 10 u32 +filter protocol ip pref 10 u32 fh 800: ht divisor 1 +filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 + match 00000000/00000000 at 0 + action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING + target MARK set 0x1 + index 1 ref 1 bind 1 installed 4195sec used 27sec + Sent 252 bytes 3 pkts (dropped 0, overlimits 0) + + action order 2: mirred (Egress Redirect to device ifb0) stolen + index 1 ref 1 bind 1 installed 165 sec used 27 sec + Sent 252 bytes 3 pkts (dropped 0, overlimits 0) + +[root@jmandrake]:~# $TC -s qdisc +qdisc sfq 30: dev ifb0 limit 128p quantum 1514b + Sent 0 bytes 0 pkts (dropped 0, overlimits 0) +qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s + Sent 210 bytes 3 pkts (dropped 0, overlimits 0) +qdisc sfq 10: dev ifb0 limit 128p quantum 1514b + Sent 294 bytes 3 pkts (dropped 0, overlimits 0) +qdisc prio 1: dev ifb0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 + Sent 504 bytes 6 pkts (dropped 0, overlimits 0) +qdisc ingress ffff: dev eth0 ---------------- + Sent 308 bytes 5 pkts (dropped 0, overlimits 0) + +[root@jmandrake]:~# ifconfig ifb0 +ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 + inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link + UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 + RX packets:6 errors:0 dropped:3 overruns:0 frame:0 + TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:32 + RX bytes:504 (504.0 b) TX bytes:252 (252.0 b) +----- + +You send it any packet not originating from the actions it will drop them. +[In this case the three dropped packets were ipv6 ndisc]. + +cheers, +jamal diff --git a/doc/actions/mirred-usage b/doc/actions/mirred-usage new file mode 100644 index 0000000..482ff66 --- /dev/null +++ b/doc/actions/mirred-usage @@ -0,0 +1,164 @@ + +Very funky action. I do plan to add to a few more things to it +This is the basic stuff. Idea borrowed from the way ethernet switches +mirror and redirect packets. The main difference with say a vannila +ethernet switch is that you can use u32 classifier to select a +flow to be mirrored. High end switches typically can select based +on more than just a port (eg a 5 tuple classifier). They may also be +capable of redirecting. + +Usage: + +mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME> +where: +DIRECTION := <ingress | egress> +ACTION := <mirror | redirect> +INDEX is the specific policy instance id +DEVICENAME is the devicename + +Direction: +- Ingress is not supported at the moment. It will be in the +future as well as mirror/redirecting to a socket. + +Action: +- Mirror takes a copy of the packet and sends it to specified +dev ("port" in ethernet switch/bridging terminology) +- redirect +steals the packet and redirects to specified destination dev. + +What NOT to do if you don't want your machine to crash: +------------------------------------------------------ + +Do not create loops! +Loops are not hard to create in the egress qdiscs. + +Here are simple rules to follow if you don't want to get +hurt: +A) Do not have the same packet go to same netdevice twice +in a single graph of policies. Your machine will just hang! +This is design intent _not a bug_ to teach you some lessons. + +In the future if there are easy ways to do this in the kernel +without affecting other packets not interested in this feature +I will add them. At the moment that is not clear. + +Some examples of bad things NOT to do: +1) redirecting eth0 to eth0 +2) eth0->eth1-> eth0 +3) eth0->lo-> eth1-> eth0 + +B) Do not redirect from one IFB device to another. +Remember that IFB is a very specialized case of packet redirecting +device. Instead of redirecting it puts packets at the exact spot +on the stack it found them from. +Redirecting from ifbX->ifbY will actually not crash your machine but your +packets will all be dropped (this is much simpler to detect +and resolve and is only affecting users of ifb as opposed to the +whole stack). + +In the case of A) the problem has to do with a recursive contention +for the devices queue lock and in the second case for the transmit lock. + +Some examples: +------------- + +1) Mirror all packets arriving on eth0 to be sent out on eth1. +You may have a sniffer or some accounting box hooked up on eth1. + +--- +tc qdisc add dev eth0 ingress +tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ +match u32 0 0 flowid 1:2 action mirred egress mirror dev eth1 +--- + +If you replace "mirror" with "redirect" then not a copy but rather +the original packet is sent to eth1. + +2) Host A is hooked up to us on eth0 + +# redirect all packets arriving on ingress of lo to eth0 +--- +tc qdisc add dev lo ingress +tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ +match u32 0 0 flowid 1:2 action mirred egress redirect dev eth0 +--- + +On host A start a tcpdump on interface connecting to us. + +on our host ping -c 2 127.0.0.1 + +Ping would fail since all packets are heading out eth0 +tcpudmp on host A would show them + +if you substitute the redirect with mirror above as in: +tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ +match u32 0 0 flowid 1:2 action mirred egress mirror dev eth0 + +Then you should see the packets on both host A and the local +stack (i.e ping would work). + +3) Even more funky example: + +# +#allow 1 out 10 packets on ingress of lo to randomly make it to the +# host A (Randomness uses the netrand generator) +# +--- +tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ +match u32 0 0 flowid 1:2 \ +action drop random determ ok 10\ +action mirred egress mirror dev eth0 +--- + +4) +# for packets from 10.0.0.9 going out on eth0 (could be local +# IP or something # we are forwarding) - +# if exceeding a 100Kbps rate, then redirect to eth1 +# + +--- +tc qdisc add dev eth0 handle 1:0 root prio +tc filter add dev eth0 parent 1:0 protocol ip prio 6 u32 \ +match ip src 10.0.0.9/32 flowid 1:16 \ +action police rate 100kbit burst 90k ok \ +action mirred egress mirror dev eth1 +--- + +A more interesting example is when you mirror flows to a dummy device +so you could tcpdump them (dummy by defaults drops all packets it sees). +This is a very useful debug feature. + +Lets say you are policing packets from alias 192.168.200.200/32 +you don't want those to exceed 100kbps going out. + +--- +tc qdisc add dev eth0 handle 1:0 root prio +tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \ +match ip src 192.168.200.200/32 flowid 1:2 \ +action police rate 100kbit burst 90k drop +--- + +If you run tcpdump on eth0 you will see all packets going out +with src 192.168.200.200/32 dropped or not (since tcpdump shows +all packets being egressed). +Extend the rule a little to see only the packets making it out. + +--- +tc qdisc add dev eth0 handle 1:0 root prio +tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \ +match ip src 192.168.200.200/32 flowid 1:2 \ +action police rate 10kbit burst 90k drop \ +action mirred egress mirror dev dummy0 +--- + +Now fire tcpdump on dummy0 to see only those packets .. +tcpdump -n -i dummy0 -x -e -t + +Essentially a good debugging/logging interface (sort of like +BSDs speacialized log device does without needing one). + +If you replace mirror with redirect, those packets will be +blackholed and will never make it out. + +cheers, +jamal |