summaryrefslogtreecommitdiffstats
path: root/man/man8/tc-cake.8
blob: ced9ac78ccca49c9a11d2ec058b54791dce74ba9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
.TH CAKE 8 "19 July 2018" "iproute2" "Linux"
.SH NAME
CAKE \- Common Applications Kept Enhanced (CAKE)
.SH SYNOPSIS
.B tc qdisc ... cake
.br
[
.BR bandwidth
RATE |
.BR unlimited*
|
.BR autorate-ingress
]
.br
[
.BR rtt
TIME |
.BR datacentre
|
.BR lan
|
.BR metro
|
.BR regional
|
.BR internet*
|
.BR oceanic
|
.BR satellite
|
.BR interplanetary
]
.br
[
.BR besteffort
|
.BR diffserv8
|
.BR diffserv4
|
.BR diffserv3*
]
.br
[
.BR flowblind
|
.BR srchost
|
.BR dsthost
|
.BR hosts
|
.BR flows
|
.BR dual-srchost
|
.BR dual-dsthost
|
.BR triple-isolate*
]
.br
[
.BR nat
|
.BR nonat*
]
.br
[
.BR wash
|
.BR nowash*
]
.br
[
.BR split-gso*
|
.BR no-split-gso
]
.br
[
.BR ack-filter
|
.BR ack-filter-aggressive
|
.BR no-ack-filter*
]
.br
[
.BR memlimit
LIMIT ]
.br
[
.BR fwmark
MASK ]
.br
[
.BR ptm
|
.BR atm
|
.BR noatm*
]
.br
[
.BR overhead
N |
.BR conservative
|
.BR raw*
]
.br
[
.BR mpu
N ]
.br
[
.BR ingress
|
.BR egress*
]
.br
(* marks defaults)


.SH DESCRIPTION
CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline
which uses both AQM and FQ.  It combines COBALT, which is an AQM algorithm
combining Codel and BLUE, a shaper which operates in deficit mode, and a variant
of DRR++ for flow isolation.  8-way set-associative hashing is used to virtually
eliminate hash collisions.  Priority queuing is available through a simplified
diffserv implementation.  Overhead compensation for various encapsulation
schemes is tightly integrated.

All settings are optional; the default settings are chosen to be sensible in
most common deployments.  Most people will only need to set the
.B bandwidth
parameter to get useful results, but reading the
.B Overhead Compensation
and
.B Round Trip Time
sections is strongly encouraged.

.SH SHAPER PARAMETERS
CAKE uses a deficit-mode shaper, which does not exhibit the initial burst
typical of token-bucket shapers.  It will automatically burst precisely as much
as required to maintain the configured throughput.  As such, it is very
straightforward to configure.
.PP
.B unlimited
(default)
.br
	No limit on the bandwidth.
.PP
.B bandwidth
RATE
.br
	Set the shaper bandwidth.  See
.BR tc(8)
or examples below for details of the RATE value.
.PP
.B autorate-ingress
.br
	Automatic capacity estimation based on traffic arriving at this qdisc.
This is most likely to be useful with cellular links, which tend to change
quality randomly.  A
.B bandwidth
parameter can be used in conjunction to specify an initial estimate.  The shaper
will periodically be set to a bandwidth slightly below the estimated rate.  This
estimator cannot estimate the bandwidth of links downstream of itself.

.SH OVERHEAD COMPENSATION PARAMETERS
The size of each packet on the wire may differ from that seen by Linux.  The
following parameters allow CAKE to compensate for this difference by internally
considering each packet to be bigger than Linux informs it.  To assist users who
are not expert network engineers, keywords have been provided to represent a
number of common link technologies.

.SS	Manual Overhead Specification
.B overhead
BYTES
.br
	Adds BYTES to the size of each packet.  BYTES may be negative; values
between -64 and 256 (inclusive) are accepted.
.PP
.B mpu
BYTES
.br
	Rounds each packet (including overhead) up to a minimum length
BYTES. BYTES may not be negative; values between 0 and 256 (inclusive)
are accepted.
.PP
.B atm
.br
	Compensates for ATM cell framing, which is normally found on ADSL links.
This is performed after the
.B overhead
parameter above.  ATM uses fixed 53-byte cells, each of which can carry 48 bytes
payload.
.PP
.B ptm
.br
	Compensates for PTM encoding, which is normally found on VDSL2 links and
uses a 64b/65b encoding scheme. It is even more efficient to simply
derate the specified shaper bandwidth by a factor of 64/65 or 0.984. See
ITU G.992.3 Annex N and IEEE 802.3 Section 61.3 for details.
.PP
.B noatm
.br
	Disables ATM and PTM compensation.

.SS	Failsafe Overhead Keywords
These two keywords are provided for quick-and-dirty setup.  Use them if you
can't be bothered to read the rest of this section.
.PP
.B raw
(default)
.br
	Turns off all overhead compensation in CAKE.  The packet size reported
by Linux will be used directly.
.PP
	Other overhead keywords may be added after "raw".  The effect of this is
to make the overhead compensation operate relative to the reported packet size,
not the underlying IP packet size.
.PP
.B conservative
.br
	Compensates for more overhead than is likely to occur on any
widely-deployed link technology.
.br
	Equivalent to
.B overhead 48 atm.

.SS ADSL Overhead Keywords
Most ADSL modems have a way to check which framing scheme is in use.  Often this
is also specified in the settings document provided by the ISP.  The keywords in
this section are intended to correspond with these sources of information.  All
of them implicitly set the
.B atm
flag.
.PP
.B pppoa-vcmux
.br
	Equivalent to
.B overhead 10 atm
.PP
.B pppoa-llc
.br
	Equivalent to
.B overhead 14 atm
.PP
.B pppoe-vcmux
.br
	Equivalent to
.B overhead 32 atm
.PP
.B pppoe-llcsnap
.br
	Equivalent to
.B overhead 40 atm
.PP
.B bridged-vcmux
.br
	Equivalent to
.B overhead 24 atm
.PP
.B bridged-llcsnap
.br
	Equivalent to
.B overhead 32 atm
.PP
.B ipoa-vcmux
.br
	Equivalent to
.B overhead 8 atm
.PP
.B ipoa-llcsnap
.br
	Equivalent to
.B overhead 16 atm
.PP
See also the Ethernet Correction Factors section below.

.SS VDSL2 Overhead Keywords
ATM was dropped from VDSL2 in favour of PTM, which is a much more
straightforward framing scheme.  Some ISPs retained PPPoE for compatibility with
their existing back-end systems.
.PP
.B pppoe-ptm
.br
	Equivalent to
.B overhead 30 ptm

.br
	PPPoE: 2B PPP + 6B PPPoE +
.br
	ETHERNET: 6B dest MAC + 6B src MAC + 2B ethertype + 4B Frame Check Sequence +
.br
	PTM: 1B Start of Frame (S) + 1B End of Frame (Ck) + 2B TC-CRC (PTM-FCS)
.br
.PP
.B bridged-ptm
.br
	Equivalent to
.B overhead 22 ptm
.br
	ETHERNET: 6B dest MAC + 6B src MAC + 2B ethertype + 4B Frame Check Sequence +
.br
	PTM: 1B Start of Frame (S) + 1B End of Frame (Ck) + 2B TC-CRC (PTM-FCS)
.br
.PP
See also the Ethernet Correction Factors section below.

.SS DOCSIS Cable Overhead Keyword
DOCSIS is the universal standard for providing Internet service over cable-TV
infrastructure.

In this case, the actual on-wire overhead is less important than the packet size
the head-end equipment uses for shaping and metering.  This is specified to be
an Ethernet frame including the CRC (aka FCS).
.PP
.B docsis
.br
	Equivalent to
.B overhead 18 mpu 64 noatm

.SS Ethernet Overhead Keywords
.PP
.B ethernet
.br
	Accounts for Ethernet's preamble, inter-frame gap, and Frame Check
Sequence.  Use this keyword when the bottleneck being shaped for is an
actual Ethernet cable.
.br
	Equivalent to
.B overhead 38 mpu 84 noatm
.PP
.B ether-vlan
.br
	Adds 4 bytes to the overhead compensation, accounting for an IEEE 802.1Q
VLAN header appended to the Ethernet frame header.  NB: Some ISPs use one or
even two of these within PPPoE; this keyword may be repeated as necessary to
express this.

.SH ROUND TRIP TIME PARAMETERS
Active Queue Management (AQM) consists of embedding congestion signals in the
packet flow, which receivers use to instruct senders to slow down when the queue
is persistently occupied.  CAKE uses ECN signalling when available, and packet
drops otherwise, according to a combination of the Codel and BLUE AQM algorithms
called COBALT.

Very short latencies require a very rapid AQM response to adequately control
latency.  However, such a rapid response tends to impair throughput when the
actual RTT is relatively long.  CAKE allows specifying the RTT it assumes for
tuning various parameters.  Actual RTTs within an order of magnitude of this
will generally work well for both throughput and latency management.

At the 'lan' setting and below, the time constants are similar in magnitude to
the jitter in the Linux kernel itself, so congestion might be signalled
prematurely. The flows will then become sparse and total throughput reduced,
leaving little or no back-pressure for the fairness logic to work against. Use
the "metro" setting for local lans unless you have a custom kernel.
.PP
.B rtt
TIME
.br
	Manually specify an RTT.
.PP
.B datacentre
.br
	For extremely high-performance 10GigE+ networks only.  Equivalent to
.B rtt 100us.
.PP
.B lan
.br
	For pure Ethernet (not Wi-Fi) networks, at home or in the office.  Don't
use this when shaping for an Internet access link.  Equivalent to
.B rtt 1ms.
.PP
.B metro
.br
	For traffic mostly within a single city.  Equivalent to
.B rtt 10ms.
.PP
.B regional
.br
	For traffic mostly within a European-sized country.  Equivalent to
.B rtt 30ms.
.PP
.B internet
(default)
.br
	This is suitable for most Internet traffic.  Equivalent to
.B rtt 100ms.
.PP
.B oceanic
.br
	For Internet traffic with generally above-average latency, such as that
suffered by Australasian residents.  Equivalent to
.B rtt 300ms.
.PP
.B satellite
.br
	For traffic via geostationary satellites.  Equivalent to
.B rtt 1000ms.
.PP
.B interplanetary
.br
	So named because Jupiter is about 1 light-hour from Earth.  Use this to
(almost) completely disable AQM actions.  Equivalent to
.B rtt 3600s.

.SH FLOW ISOLATION PARAMETERS
With flow isolation enabled, CAKE places packets from different flows into
different queues, each of which carries its own AQM state.  Packets from each
queue are then delivered fairly, according to a DRR++ algorithm which minimizes
latency for "sparse" flows.  CAKE uses a set-associative hashing algorithm to
minimize flow collisions.

These keywords specify whether fairness based on source address, destination
address, individual flows, or any combination of those is desired.
.PP
.B flowblind
.br
	Disables flow isolation; all traffic passes through a single queue for
each tin.
.PP
.B srchost
.br
	Flows are defined only by source address.  Could be useful on the egress
path of an ISP backhaul.
.PP
.B dsthost
.br
	Flows are defined only by destination address.  Could be useful on the
ingress path of an ISP backhaul.
.PP
.B hosts
.br
	Flows are defined by source-destination host pairs.  This is host
isolation, rather than flow isolation.
.PP
.B flows
.br
	Flows are defined by the entire 5-tuple of source address, destination
address, transport protocol, source port and destination port.  This is the type
of flow isolation performed by SFQ and fq_codel.
.PP
.B dual-srchost
.br
	Flows are defined by the 5-tuple, and fairness is applied first over
source addresses, then over individual flows.  Good for use on egress traffic
from a LAN to the internet, where it'll prevent any one LAN host from
monopolising the uplink, regardless of the number of flows they use.
.PP
.B dual-dsthost
.br
	Flows are defined by the 5-tuple, and fairness is applied first over
destination addresses, then over individual flows.  Good for use on ingress
traffic to a LAN from the internet, where it'll prevent any one LAN host from
monopolising the downlink, regardless of the number of flows they use.
.PP
.B triple-isolate
(default)
.br
	Flows are defined by the 5-tuple, and fairness is applied over source
*and* destination addresses intelligently (ie. not merely by host-pairs), and
also over individual flows.  Use this if you're not certain whether to use
dual-srchost or dual-dsthost; it'll do both jobs at once, preventing any one
host on *either* side of the link from monopolising it with a large number of
flows.
.PP
.B nat
.br
	Instructs Cake to perform a NAT lookup before applying flow-isolation
rules, to determine the true addresses and port numbers of the packet, to
improve fairness between hosts "inside" the NAT.  This has no practical effect
in "flowblind" or "flows" modes, or if NAT is performed on a different host.
.PP
.B nonat
(default)
.br
	Cake will not perform a NAT lookup.  Flow isolation will be performed
using the addresses and port numbers directly visible to the interface Cake is
attached to.

.SH PRIORITY QUEUE PARAMETERS
CAKE can divide traffic into "tins" based on the Diffserv field.  Each tin has
its own independent set of flow-isolation queues, and is serviced based on a WRR
algorithm.  To avoid perverse Diffserv marking incentives, tin weights have a
"priority sharing" value when bandwidth used by that tin is below a threshold,
and a lower "bandwidth sharing" value when above.  Bandwidth is compared against
the threshold using the same algorithm as the deficit-mode shaper.

Detailed customisation of tin parameters is not provided.  The following presets
perform all necessary tuning, relative to the current shaper bandwidth and RTT
settings.
.PP
.B besteffort
.br
	Disables priority queuing by placing all traffic in one tin.
.PP
.B precedence
.br
	Enables legacy interpretation of TOS "Precedence" field.  Use of this
preset on the modern Internet is firmly discouraged.
.PP
.B diffserv4
.br
	Provides a general-purpose Diffserv implementation with four tins:
.br
		Bulk (CS1, LE in kernel v5.9+), 6.25% threshold, generally low priority.
.br
		Best Effort (general), 100% threshold.
.br
		Video (AF4x, AF3x, CS3, AF2x, CS2, TOS4, TOS1), 50% threshold.
.br
		Voice (CS7, CS6, EF, VA, CS5, CS4), 25% threshold.
.PP
.B diffserv3
(default)
.br
	Provides a simple, general-purpose Diffserv implementation with three tins:
.br
		Bulk (CS1, LE in kernel v5.9+), 6.25% threshold, generally low priority.
.br
		Best Effort (general), 100% threshold.
.br
		Voice (CS7, CS6, EF, VA, TOS4), 25% threshold, reduced Codel interval.

.PP
.B fwmark
MASK
.br
	This options turns on fwmark-based overriding of CAKE's tin selection.
If set, the option specifies a bitmask that will be applied to the fwmark
associated with each packet. If the result of this masking is non-zero, the
result will be right-shifted by the number of least-significant unset bits in
the mask value, and the result will be used as a the tin number for that packet.
This can be used to set policies in a firewall script that will override CAKE's
built-in tin selection.

.SH OTHER PARAMETERS
.B memlimit
LIMIT
.br
	Limit the memory consumed by Cake to LIMIT bytes. Note that this does
not translate directly to queue size (so do not size this based on bandwidth
delay product considerations, but rather on worst case acceptable memory
consumption), as there is some overhead in the data structures containing the
packets, especially for small packets.

	By default, the limit is calculated based on the bandwidth and RTT
settings.

.PP
.B wash

.br
	Traffic entering your diffserv domain is frequently mis-marked in
transit from the perspective of your network, and traffic exiting yours may be
mis-marked from the perspective of the transiting provider.

Apply the wash option to clear all extra diffserv (but not ECN bits), after
priority queuing has taken place.

If you are shaping inbound, and cannot trust the diffserv markings (as is the
case for Comcast Cable, among others), it is best to use a single queue
"besteffort" mode with wash.

.PP
.B split-gso

.br
	This option controls whether CAKE will split General Segmentation
Offload (GSO) super-packets into their on-the-wire components and
dequeue them individually.

.br
Super-packets are created by the networking stack to improve efficiency.
However, because they are larger they take longer to dequeue, which
translates to higher latency for competing flows, especially at lower
bandwidths. CAKE defaults to splitting GSO packets to achieve the lowest
possible latency. At link speeds higher than 10 Gbps, setting the
no-split-gso parameter can increase the maximum achievable throughput by
retaining the full GSO packets.

.SH OVERRIDING CLASSIFICATION WITH TC FILTERS

CAKE supports overriding of its internal classification of packets through the
tc filter mechanism. Packets can be assigned to different priority tins by
setting the
.B priority
field on the skb, and the flow hashing can be overridden by setting the
.B classid
parameter.

.PP
.B Tin override

.br
        To assign a priority tin, the major number of the priority field needs
to match the qdisc handle of the cake instance; if it does, the minor number
will be interpreted as the tin index. For example, to classify all ICMP packets
as 'bulk', the following filter can be used:

.br
        # tc qdisc replace dev eth0 handle 1: root cake diffserv3
        # tc filter add dev eth0 parent 1: protocol ip prio 1 \\
          u32 match icmp type 0 0 action skbedit priority 1:1

.PP
.B Flow hash override

.br
        To override flow hashing, the classid can be set. CAKE will interpret
the major number of the classid as the host hash used in host isolation mode,
and the minor number as the flow hash used for flow-based queueing. One or both
of those can be set, and will be used if the relevant flow isolation parameter
is set (i.e., the major number will be ignored if CAKE is not configured in
hosts mode, and the minor number will be ignored if CAKE is not configured in
flows mode).

.br
This example will assign all ICMP packets to the first queue:

.br
        # tc qdisc replace dev eth0 handle 1: root cake
        # tc filter add dev eth0 parent 1: protocol ip prio 1 \\
          u32 match icmp type 0 0 classid 0:1

.br
If only one of the host and flow overrides is set, CAKE will compute the other
hash from the packet as normal. Note, however, that the host isolation mode
works by assigning a host ID to the flow queue; so if overriding both host and
flow, the same flow cannot have more than one host assigned. In addition, it is
not possible to assign different source and destination host IDs through the
override mechanism; if a host ID is assigned, it will be used as both source and
destination host.



.SH EXAMPLES
# tc qdisc delete root dev eth0
.br
# tc qdisc add root dev eth0 cake bandwidth 100Mbit ethernet
.br
# tc -s qdisc show dev eth0
.br
qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms noatm overhead 38 mpu 84
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 0b of 5000000b
 capacity estimate: 100Mbit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh       6250Kbit      100Mbit       25Mbit
  target          5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms
  pk_delay          0us          0us          0us
  av_delay          0us          0us          0us
  sp_delay          0us          0us          0us
  pkts                0            0            0
  bytes               0            0            0
  way_inds            0            0            0
  way_miss            0            0            0
  way_cols            0            0            0
  drops               0            0            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            0            0
  bk_flows            0            0            0
  un_flows            0            0            0
  max_len             0            0            0
  quantum           300         1514          762

After some use:
.br
# tc -s qdisc show dev eth0

qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms noatm overhead 38 mpu 84
 Sent 44709231 bytes 31931 pkt (dropped 45, overlimits 93782 requeues 0)
 backlog 33308b 22p requeues 0
 memory used: 292352b of 5000000b
 capacity estimate: 100Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       84 /    1538
 average network hdr offset:           14

                   Bulk  Best Effort        Voice
  thresh       6250Kbit      100Mbit       25Mbit
  target          5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms
  pk_delay        8.7ms        6.9ms        5.0ms
  av_delay        4.9ms        5.3ms        3.8ms
  sp_delay        727us        1.4ms        511us
  pkts             2590        21271         8137
  bytes         3081804     30302659     11426206
  way_inds            0           46            0
  way_miss            3           17            4
  way_cols            0            0            0
  drops              20           15           10
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            2            4            1
  bk_flows            1            2            1
  un_flows            0            0            0
  max_len          1514         1514         1514
  quantum           300         1514          762

.SH SEE ALSO
.BR tc (8),
.BR tc-codel (8),
.BR tc-fq_codel (8),
.BR tc-htb (8)

.SH AUTHORS
Cake's principal author is Jonathan Morton, with contributions from
Tony Ambardar, Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen,
Sebastian Moeller, Ryan Mounce, Dean Scarff, Nils Andreas Svee, and Dave Täht.

This manual page was written by Loganaden Velvindron. Please report corrections
to the Linux Networking mailing list <netdev@vger.kernel.org>.