1 files changed, 875 insertions, 0 deletions
diff --git a/doc/statements.txt b/doc/statements.txt
new file mode 100644
index 0000000..1967280
--- /dev/null
+++ b/doc/statements.txt
@@ -0,0 +1,875 @@
+VERDICT STATEMENT
+~~~~~~~~~~~~~~~~~
+The verdict statement alters control flow in the ruleset and issues policy decisions for packets.
+
+[verse]
+{*accept* | *drop* | *queue* | *continue* | *return*}
+{*jump* | *goto*} 'chain'
+
+*accept* and *drop* are absolute verdicts -- they terminate ruleset evaluation immediately.
+
+[horizontal]
+*accept*:: Terminate ruleset evaluation and accept the packet.
+The packet can still be dropped later by another hook, for instance accept
+in the forward hook still allows one to drop the packet later in the postrouting hook,
+or another forward base chain that has a higher priority number and is evaluated
+afterwards in the processing pipeline.
+*drop*:: Terminate ruleset evaluation and drop the packet.
+The drop occurs instantly, no further chains or hooks are evaluated.
+It is not possible to accept the packet in a later chain again, as those
+are not evaluated anymore for the packet.
+*queue*:: Terminate ruleset evaluation and queue the packet to userspace.
+Userspace must provide a drop or accept verdict.  In case of accept, processing
+resumes with the next base chain hook, not the rule following the queue verdict.
+*continue*:: Continue ruleset evaluation with the next rule. This
+ is the default behaviour in case a rule issues no verdict.
+*return*:: Return from the current chain and continue evaluation at the
+ next rule in the last chain. If issued in a base chain, it is equivalent to the
+ base chain policy.
+*jump* 'chain':: Continue evaluation at the first rule in 'chain'. The current
+ position in the ruleset is pushed to a call stack and evaluation will continue
+ there when the new chain is entirely evaluated or a *return* verdict is issued.
+ In case an absolute verdict is issued by a rule in the chain, ruleset evaluation
+ terminates immediately and the specific action is taken.
+*goto* 'chain':: Similar to *jump*, but the current position is not pushed to the
+ call stack, meaning that after the new chain evaluation will continue at the last
+ chain instead of the one containing the goto statement.
+
+.Using verdict statements
+-------------------
+# process packets from eth0 and the internal network in from_lan
+# chain, drop all packets from eth0 with different source addresses.
+
+filter input iif eth0 ip saddr 192.168.0.0/24 jump from_lan
+filter input iif eth0 drop
+-------------------
+
+PAYLOAD STATEMENT
+~~~~~~~~~~~~~~~~~
+[verse]
+'payload_expression' *set* 'value'
+
+The payload statement alters packet content. It can be used for example to
+set ip DSCP (diffserv) header field or ipv6 flow labels.
+
+.route some packets instead of bridging
+---------------------------------------
+# redirect tcp:http from 192.160.0.0/16 to local machine for routing instead of bridging
+# assumes 00:11:22:33:44:55 is local MAC address.
+bridge input meta iif eth0 ip saddr 192.168.0.0/16 tcp dport 80 meta pkttype set unicast ether daddr set 00:11:22:33:44:55
+-------------------------------------------
+
+.Set IPv4 DSCP header field
+---------------------------
+ip forward ip dscp set 42
+---------------------------
+
+EXTENSION HEADER STATEMENT
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+'extension_header_expression' *set* 'value'
+
+The extension header statement alters packet content in variable-sized headers.
+This can currently be used to alter the TCP Maximum segment size of packets,
+similar to the TCPMSS target in iptables.
+
+.change tcp mss
+---------------
+tcp flags syn tcp option maxseg size set 1360
+# set a size based on route information:
+tcp flags syn tcp option maxseg size set rt mtu
+---------------
+
+You can also remove tcp options via reset keyword.
+
+.remove tcp option
+---------------
+tcp flags syn reset tcp option sack-perm
+---------------
+
+LOG STATEMENT
+~~~~~~~~~~~~~
+[verse]
+*log* [*prefix* 'quoted_string'] [*level* 'syslog-level'] [*flags* 'log-flags']
+*log* *group* 'nflog_group' [*prefix* 'quoted_string'] [*queue-threshold* 'value'] [*snaplen* 'size']
+*log level audit*
+
+The log statement enables logging of matching packets. When this statement is
+used from a rule, the Linux kernel will print some information on all matching
+packets, such as header fields, via the kernel log (where it can be read with
+dmesg(1) or read in the syslog).
+
+In the second form of invocation (if 'nflog_group' is specified), the Linux
+kernel will pass the packet to nfnetlink_log which will send the log through a
+netlink socket to the specified group. One userspace process may subscribe to
+the group to receive the logs, see man(8) ulogd for the Netfilter userspace log
+daemon and libnetfilter_log documentation for details in case you would like to
+develop a custom program to digest your logs.
+
+In the third form of invocation (if level audit is specified), the Linux
+kernel writes a message into the audit buffer suitably formatted for reading
+with auditd. Therefore no further formatting options (such as prefix or flags)
+are allowed in this mode.
+
+This is a non-terminating statement, so the rule evaluation continues after
+the packet is logged.
+
+.log statement options
+[options="header"]
+|==================
+|Keyword | Description | Type
+|prefix|
+Log message prefix|
+quoted string
+|level|
+Syslog level of logging |
+string: emerg, alert, crit, err, warn [default], notice, info, debug, audit
+|group|
+NFLOG group to send messages to|
+unsigned integer (16 bit)
+|snaplen|
+Length of packet payload to include in netlink message |
+unsigned integer (32 bit)
+|queue-threshold|
+Number of packets to queue inside the kernel before sending them to userspace |
+unsigned integer (32 bit)
+|==================================
+
+.log-flags
+[options="header"]
+|==================
+| Flag | Description
+|tcp sequence|
+Log TCP sequence numbers.
+|tcp options|
+Log options from the TCP packet header.
+|ip options|
+Log options from the IP/IPv6 packet header.
+|skuid|
+Log the userid of the process which generated the packet.
+|ether|
+Decode MAC addresses and protocol.
+|all|
+Enable all log flags listed above.
+|==============================
+
+.Using log statement
+--------------------
+# log the UID which generated the packet and ip options
+ip filter output log flags skuid flags ip options
+
+# log the tcp sequence numbers and tcp options from the TCP packet
+ip filter output log flags tcp sequence,options
+
+# enable all supported log flags
+ip6 filter output log flags all
+-----------------------
+
+REJECT STATEMENT
+~~~~~~~~~~~~~~~~
+[verse]
+____
+*reject* [ *with* 'REJECT_WITH' ]
+
+'REJECT_WITH' := *icmp* 'icmp_code' |
+                 *icmpv6* 'icmpv6_code' |
+                 *icmpx* 'icmpx_code' |
+                 *tcp reset*
+____
+
+A reject statement is used to send back an error packet in response to the
+matched packet otherwise it is equivalent to drop so it is a terminating
+statement, ending rule traversal. This statement is only valid in base chains
+using the *input*,
+*forward* or *output* hooks, and user-defined chains which are only called from
+those chains.
+
+.different ICMP reject variants are meant for use in different table families
+[options="header"]
+|==================
+|Variant |Family | Type
+|icmp|
+ip|
+icmp_code
+|icmpv6|
+ip6|
+icmpv6_code
+|icmpx|
+inet|
+icmpx_code
+|==================
+
+For a description of the different types and a list of supported keywords refer
+to DATA TYPES section above. The common default reject value is
+*port-unreachable*. +
+
+Note that in bridge family, reject statement is only allowed in base chains
+which hook into input or prerouting.
+
+COUNTER STATEMENT
+~~~~~~~~~~~~~~~~~
+A counter statement sets the hit count of packets along with the number of bytes.
+
+[verse]
+*counter* *packets* 'number' *bytes* 'number'
+*counter* { *packets* 'number' | *bytes* 'number' }
+
+CONNTRACK STATEMENT
+~~~~~~~~~~~~~~~~~~~
+The conntrack statement can be used to set the conntrack mark and conntrack labels.
+
+[verse]
+*ct* {*mark* | *event* | *label* | *zone*} *set* 'value'
+
+The ct statement sets meta data associated with a connection. The zone id
+has to be assigned before a conntrack lookup takes place, i.e. this has to be
+done in prerouting and possibly output (if locally generated packets need to be
+placed in a distinct zone), with a hook priority of *raw* (-300).
+
+Unlike iptables, where the helper assignment happens in the raw table,
+the helper needs to be assigned after a conntrack entry has been
+found, i.e. it will not work when used with hook priorities equal or before
+-200.
+
+.Conntrack statement types
+[options="header"]
+|==================
+|Keyword| Description| Value
+|event|
+conntrack event bits |
+bitmask, integer (32 bit)
+|helper|
+name of ct helper object to assign to the connection |
+quoted string
+|mark|
+Connection tracking mark |
+mark
+|label|
+Connection tracking label|
+label
+|zone|
+conntrack zone|
+integer (16 bit)
+|==================
+
+.save packet nfmark in conntrack
+--------------------------------
+ct mark set meta mark
+--------------------------------
+
+.set zone mapped via interface
+------------------------------
+table inet raw {
+  chain prerouting {
+      type filter hook prerouting priority raw;
+      ct zone set iif map { "eth1" : 1, "veth1" : 2 }
+  }
+  chain output {
+      type filter hook output priority raw;
+      ct zone set oif map { "eth1" : 1, "veth1" : 2 }
+  }
+}
+------------------------------------------------------
+
+.restrict events reported by ctnetlink
+--------------------------------------
+ct event set new,related,destroy
+--------------------------------------
+
+NOTRACK STATEMENT
+~~~~~~~~~~~~~~~~~
+The notrack statement allows one to disable connection tracking for certain
+packets.
+
+[verse]
+*notrack*
+
+Note that for this statement to be effective, it has to be applied to packets
+before a conntrack lookup happens. Therefore, it needs to sit in a chain with
+either prerouting or output hook and a hook priority of -300 (*raw*) or less.
+
+See SYNPROXY STATEMENT for an example usage.
+
+META STATEMENT
+~~~~~~~~~~~~~~
+A meta statement sets the value of a meta expression. The existing meta fields
+are: priority, mark, pkttype, nftrace. +
+
+[verse]
+*meta* {*mark* | *priority* | *pkttype* | *nftrace* | *broute*} *set* 'value'
+
+A meta statement sets meta data associated with a packet. +
+
+.Meta statement types
+[options="header"]
+|==================
+|Keyword| Description| Value
+|priority |
+TC packet priority|
+tc_handle
+|mark|
+Packet mark |
+mark
+|pkttype |
+packet type |
+pkt_type
+|nftrace |
+ruleset packet tracing on/off. Use *monitor trace* command to watch traces|
+0, 1
+|broute |
+broute on/off. packets are routed instead of being bridged|
+0, 1
+|==========================
+
+LIMIT STATEMENT
+~~~~~~~~~~~~~~~
+[verse]
+____
+*limit rate* [*over*] 'packet_number' */* 'TIME_UNIT' [*burst* 'packet_number' *packets*]
+*limit rate* [*over*] 'byte_number' 'BYTE_UNIT' */* 'TIME_UNIT' [*burst* 'byte_number' 'BYTE_UNIT']
+
+'TIME_UNIT' := *second* | *minute* | *hour* | *day*
+'BYTE_UNIT' := *bytes* | *kbytes* | *mbytes*
+____
+
+A limit statement matches at a limited rate using a token bucket filter. A rule
+using this statement will match until this limit is reached. It can be used in
+combination with the log statement to give limited logging. The optional
+*over* keyword makes it match over the specified rate.
+
+The *burst* value influences the bucket size, i.e. jitter tolerance. With
+packet-based *limit*, the bucket holds exactly *burst* packets, by default
+five. If you specify packet *burst*, it must be a non-zero value. With
+byte-based *limit*, the bucket's minimum size is the given rate's byte value
+and the *burst* value adds to that, by default zero bytes.
+
+.limit statement values
+[options="header"]
+|==================
+|Value | Description | Type
+|packet_number |
+Number of packets |
+unsigned integer (32 bit)
+|byte_number |
+Number of bytes |
+unsigned integer (32 bit)
+|========================
+
+NAT STATEMENTS
+~~~~~~~~~~~~~~
+[verse]
+____
+*snat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*dnat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*masquerade* [*to :*'PORT_SPEC'] ['FLAGS']
+*redirect* [*to :*'PORT_SPEC'] ['FLAGS']
+
+'ADDR_SPEC' := 'address' | 'address' *-* 'address'
+'PORT_SPEC' := 'port' | 'port' *-* 'port'
+
+'FLAGS'  := 'FLAG' [*,* 'FLAGS']
+'FLAG'  := *persistent* | *random* | *fully-random*
+____
+
+The nat statements are only valid from nat chain types. +
+
+The *snat* and *masquerade* statements specify that the source address of the
+packet should be modified. While *snat* is only valid in the postrouting and
+input chains, *masquerade* makes sense only in postrouting. The dnat and
+redirect statements are only valid in the prerouting and output chains, they
+specify that the destination address of the packet should be modified. You can
+use non-base chains which are called from base chains of nat chain type too.
+All future packets in this connection will also be mangled, and rules should
+cease being examined.
+
+The *masquerade* statement is a special form of snat which always uses the
+outgoing interface's IP address to translate to. It is particularly useful on
+gateways with dynamic (public) IP addresses.
+
+The *redirect* statement is a special form of dnat which always translates the
+destination address to the local host's one. It comes in handy if one only wants
+to alter the destination port of incoming traffic on different interfaces.
+
+When used in the inet family (available with kernel 5.2), the dnat and snat
+statements require the use of the ip and ip6 keyword in case an address is
+provided, see the examples below.
+
+Before kernel 4.18 nat statements require both prerouting and postrouting base chains
+to be present since otherwise packets on the return path won't be seen by
+netfilter and therefore no reverse translation will take place.
+
+The optional *prefix* keyword allows to map to map *n* source addresses to *n*
+destination addresses.  See 'Advanced NAT examples' below.
+
+.NAT statement values
+[options="header"]
+|==================
+|Expression| Description| Type
+|address|
+Specifies that the source/destination address of the packet should be modified.
+You may specify a mapping to relate a list of tuples composed of arbitrary
+expression key with address value. |
+ipv4_addr, ipv6_addr, e.g. abcd::1234, or you can use a mapping, e.g. meta mark map { 10 : 192.168.1.2, 20 : 192.168.1.3 }
+|port|
+Specifies that the source/destination port of the packet should be modified. |
+port number (16 bit)
+|===============================
+
+.NAT statement flags
+[options="header"]
+|==================
+|Flag| Description
+|persistent |
+Gives a client the same source-/destination-address for each connection.
+|random|
+In kernel 5.0 and newer this is the same as fully-random.
+In earlier kernels the port mapping will be randomized using a seeded MD5
+hash mix using source and destination address and destination port.
+
+|fully-random|
+If used then port mapping is generated based on a 32-bit pseudo-random algorithm.
+|=============================
+
+.Using NAT statements
+---------------------
+# create a suitable table/chain setup for all further examples
+add table nat
+add chain nat prerouting { type nat hook prerouting priority dstnat; }
+add chain nat postrouting { type nat hook postrouting priority srcnat; }
+
+# translate source addresses of all packets leaving via eth0 to address 1.2.3.4
+add rule nat postrouting oif eth0 snat to 1.2.3.4
+
+# redirect all traffic entering via eth0 to destination address 192.168.1.120
+add rule nat prerouting iif eth0 dnat to 192.168.1.120
+
+# translate source addresses of all packets leaving via eth0 to whatever
+# locally generated packets would use as source to reach the same destination
+add rule nat postrouting oif eth0 masquerade
+
+# redirect incoming TCP traffic for port 22 to port 2222
+add rule nat prerouting tcp dport 22 redirect to :2222
+
+# inet family:
+# handle ip dnat:
+add rule inet nat prerouting dnat ip to 10.0.2.99
+# handle ip6 dnat:
+add rule inet nat prerouting dnat ip6 to fe80::dead
+# this masquerades both ipv4 and ipv6:
+add rule inet nat postrouting meta oif ppp0 masquerade
+
+------------------------
+
+.Advanced NAT examples
+----------------------
+
+# map prefixes in one network to that of another, e.g. 10.141.11.4 is mangled to 192.168.2.4,
+# 10.141.11.5 is mangled to 192.168.2.5 and so on.
+add rule nat postrouting snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24 }
+
+# map a source address, source port combination to a pool of destination addresses and ports:
+add rule nat postrouting dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 }
+
+# The above example generates the following NAT expression:
+#
+# [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ]
+#
+# which expects to obtain the following tuple:
+# IP address (min), source port (min), IP address (max), source port (max)
+# to be obtained from the map. The given addresses and ports are inclusive.
+
+# This also works with named maps and in combination with both concatenations and ranges:
+table ip nat {
+	map ipportmap {
+		typeof ip saddr : interval ip daddr . tcp dport
+		flags interval
+		elements = { 192.168.1.2 : 10.141.10.1-10.141.10.3 . 8888-8999, 192.168.2.0/24 : 10.141.11.5-10.141.11.20 . 8888-8999 }
+	}
+
+	chain prerouting {
+		type nat hook prerouting priority dstnat; policy accept;
+		ip protocol tcp dnat ip to ip saddr map @ipportmap
+	}
+}
+
+@ipportmap maps network prefixes to a range of hosts and ports.
+The new destination is taken from the range provided by the map element.
+Same for the destination port.
+
+Note the use of the "interval" keyword in the typeof description.
+This is required so nftables knows that it has to ask for twice the
+amount of storage for each key-value pair in the map.
+
+": ipv4_addr . inet_service" would allow associating one address and one port
+with each key.  But for this case, for each key, two addresses and two ports
+(The minimum and maximum values for both) have to be stored.
+
+------------------------
+
+TPROXY STATEMENT
+~~~~~~~~~~~~~~~~
+Tproxy redirects the packet to a local socket without changing the packet header
+in any way. If any of the arguments is missing the data of the incoming packet
+is used as parameter. Tproxy matching requires another rule that ensures the
+presence of transport protocol header is specified.
+
+[verse]
+*tproxy to* 'address'*:*'port'
+*tproxy to* {'address' | *:*'port'}
+
+This syntax can be used in *ip/ip6* tables where network layer protocol is
+obvious. Either IP address or port can be specified, but at least one of them is
+necessary.
+
+[verse]
+*tproxy* {*ip* | *ip6*} *to* 'address'[*:*'port']
+*tproxy to :*'port'
+
+This syntax can be used in *inet* tables. The *ip/ip6* parameter defines the
+family the rule will match. The *address* parameter must be of this family.
+When only *port* is defined, the address family should not be specified. In
+this case the rule will match for both families.
+
+.tproxy attributes
+[options="header"]
+|=================
+| Name | Description
+| address | IP address the listening socket with IP_TRANSPARENT option is bound to.
+| port | Port the listening socket with IP_TRANSPARENT option is bound to.
+|=================
+
+.Example ruleset for tproxy statement
+-------------------------------------
+table ip x {
+    chain y {
+        type filter hook prerouting priority mangle; policy accept;
+        tcp dport ntp tproxy to 1.1.1.1
+        udp dport ssh tproxy to :2222
+    }
+}
+table ip6 x {
+    chain y {
+       type filter hook prerouting priority mangle; policy accept;
+       tcp dport ntp tproxy to [dead::beef]
+       udp dport ssh tproxy to :2222
+    }
+}
+table inet x {
+    chain y {
+        type filter hook prerouting priority mangle; policy accept;
+        tcp dport 321 tproxy to :ssh
+        tcp dport 99 tproxy ip to 1.1.1.1:999
+        udp dport 155 tproxy ip6 to [dead::beef]:smux
+    }
+}
+-------------------------------------
+
+SYNPROXY STATEMENT
+~~~~~~~~~~~~~~~~~~
+This statement will process TCP three-way-handshake parallel in netfilter
+context to protect either local or backend system. This statement requires
+connection tracking because sequence numbers need to be translated.
+
+[verse]
+*synproxy* [*mss* 'mss_value'] [*wscale* 'wscale_value'] ['SYNPROXY_FLAGS']
+
+.synproxy statement attributes
+[options="header"]
+|=================
+| Name | Description
+| mss | Maximum segment size announced to clients. This must match the backend.
+| wscale | Window scale announced to clients. This must match the backend.
+|=================
+
+.synproxy statement flags
+[options="header"]
+|=================
+| Flag | Description
+| sack-perm |
+Pass client selective acknowledgement option to backend (will be disabled if
+not present).
+| timestamp |
+Pass client timestamp option to backend (will be disabled if not present, also
+needed for selective acknowledgement and window scaling).
+|=================
+
+.Example ruleset for synproxy statement
+---------------------------------------
+Determine tcp options used by backend, from an external system
+
+              tcpdump -pni eth0 -c 1 'tcp[tcpflags] == (tcp-syn|tcp-ack)'
+                  port 80 &
+              telnet 192.0.2.42 80
+              18:57:24.693307 IP 192.0.2.42.80 > 192.0.2.43.48757:
+                  Flags [S.], seq 360414582, ack 788841994, win 14480,
+                  options [mss 1460,sackOK,
+                  TS val 1409056151 ecr 9690221,
+                  nop,wscale 9],
+                  length 0
+
+Switch tcp_loose mode off, so conntrack will mark out-of-flow packets as state INVALID.
+
+              echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
+
+Make SYN packets untracked.
+
+	table ip x {
+		chain y {
+			type filter hook prerouting priority raw; policy accept;
+			tcp flags syn notrack
+		}
+	}
+
+Catch UNTRACKED (SYN  packets) and INVALID (3WHS ACK packets) states and send
+them to SYNPROXY. This rule will respond to SYN packets with SYN+ACK
+syncookies, create ESTABLISHED for valid client response (3WHS ACK packets) and
+drop incorrect cookies. Flags combinations not expected during  3WHS will not
+match and continue (e.g. SYN+FIN, SYN+ACK). Finally, drop invalid packets, this
+will be out-of-flow packets that were not matched by SYNPROXY.
+
+    table ip x {
+            chain z {
+                    type filter hook input priority filter; policy accept;
+                    ct state invalid, untracked synproxy mss 1460 wscale 9 timestamp sack-perm
+                    ct state invalid drop
+            }
+    }
+---------------------------------------
+
+FLOW STATEMENT
+~~~~~~~~~~~~~~
+A flow statement allows us to select what flows you want to accelerate
+forwarding through layer 3 network stack bypass. You have to specify the
+flowtable name where you want to offload this flow.
+
+*flow add @*'flowtable'
+
+QUEUE STATEMENT
+~~~~~~~~~~~~~~~
+This statement passes the packet to userspace using the nfnetlink_queue handler.
+The packet is put into the queue identified by its 16-bit queue number.
+Userspace can inspect and modify the packet if desired. Userspace must then drop
+or re-inject the packet into the kernel. See libnetfilter_queue documentation
+for details.
+
+[verse]
+____
+*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'queue_number']
+*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'queue_number_from' - 'queue_number_to']
+*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'QUEUE_EXPRESSION' ]
+
+'QUEUE_FLAGS' := 'QUEUE_FLAG' [*,* 'QUEUE_FLAGS']
+'QUEUE_FLAG'  := *bypass* | *fanout*
+'QUEUE_EXPRESSION' := *numgen* | *hash* | *symhash* | *MAP STATEMENT*
+____
+
+QUEUE_EXPRESSION can be used to compute a queue number
+at run-time with the hash or numgen expressions. It also
+allows one to use the map statement to assign fixed queue numbers
+based on external inputs such as the source ip address or interface names.
+
+.queue statement values
+[options="header"]
+|==================
+|Value | Description | Type
+|queue_number |
+Sets queue number, default is 0. |
+unsigned integer (16 bit)
+|queue_number_from |
+Sets initial queue in the range, if fanout is used. |
+unsigned integer (16 bit)
+|queue_number_to |
+Sets closing queue in the range, if fanout is used. |
+unsigned integer (16 bit)
+|=====================
+
+.queue statement flags
+[options="header"]
+|==================
+|Flag | Description
+|bypass |
+Let packets go through if userspace application cannot back off. Before using
+this flag, read libnetfilter_queue documentation for performance tuning recommendations.
+|fanout |
+Distribute packets between several queues.
+|===============================
+
+DUP STATEMENT
+~~~~~~~~~~~~~
+The dup statement is used to duplicate a packet and send the copy to a different
+destination.
+
+[verse]
+*dup to* 'device'
+*dup to* 'address' *device* 'device'
+
+.Dup statement values
+[options="header"]
+|==================
+|Expression | Description | Type
+|address |
+Specifies that the copy of the packet should be sent to a new gateway.|
+ipv4_addr, ipv6_addr, e.g. abcd::1234, or you can use a mapping, e.g. ip saddr map { 192.168.1.2 : 10.1.1.1 }
+|device |
+Specifies that the copy should be transmitted via device. |
+string
+|===================
+
+
+.Using the dup statement
+------------------------
+# send to machine with ip address 10.2.3.4 on eth0
+ip filter forward dup to 10.2.3.4 device "eth0"
+
+# copy raw frame to another interface
+netdev ingress dup to "eth0"
+dup to "eth0"
+
+# combine with map dst addr to gateways
+dup to ip daddr map { 192.168.7.1 : "eth0", 192.168.7.2 : "eth1" }
+-----------------------------------
+
+FWD STATEMENT
+~~~~~~~~~~~~~
+The fwd statement is used to redirect a raw packet to another interface. It is
+only available in the netdev family ingress and egress hooks. It is similar to
+the dup statement except that no copy is made.
+
+You can also specify the address of the next hop and the device to forward the
+packet to. This updates the source and destination MAC address of the packet by
+transmitting it through the neighboring layer. This also decrements the ttl
+field of the IP packet. This provides a way to effectively bypass the classical
+forwarding path, thus skipping the fib (forwarding information base) lookup.
+
+[verse]
+*fwd to* 'device'
+*fwd* [*ip* | *ip6*] *to* 'address' *device* 'device'
+
+.Using the fwd statement
+------------------------
+# redirect raw packet to device
+netdev ingress fwd to "eth0"
+
+# forward packet to next hop 192.168.200.1 via eth0 device
+netdev ingress ether saddr set fwd ip to 192.168.200.1 device "eth0"
+-----------------------------------
+
+SET STATEMENT
+~~~~~~~~~~~~~
+The set statement is used to dynamically add or update elements in a set from
+the packet path. The set setname must already exist in the given table and must
+have been created with one or both of the dynamic and the timeout flags. The
+dynamic flag is required if the set statement expression includes a stateful
+object. The timeout flag is implied if the set is created with a timeout, and is
+required if the set statement updates elements, rather than adding them.
+Furthermore, these sets should specify both a maximum set size (to prevent
+memory exhaustion), and their elements should have a timeout (so their number
+will not grow indefinitely) either from the set definition or from the statement
+that adds or updates them. The set statement can be used to e.g. create dynamic
+blacklists.
+
+Dynamic updates are also supported with maps. In this case, the *add* or
+*update* rule needs to provide both the key and the data element (value),
+separated via ':'.
+
+[verse]
+{*add* | *update*} *@*'setname' *{* 'expression' [*timeout* 'timeout'] [*comment* 'string'] *}*
+
+.Example for simple blacklist
+-----------------------------
+# declare a set, bound to table "filter", in family "ip".
+# Timeout and size are mandatory because we will add elements from packet path.
+# Entries will timeout after one minute, after which they might be
+# re-added if limit condition persists.
+nft add set ip filter blackhole \
+    "{ type ipv4_addr; flags dynamic; timeout 1m; size 65536; }"
+
+# declare a set to store the limit per saddr.
+# This must be separate from blackhole since the timeout is different
+nft add set ip filter flood \
+    "{ type ipv4_addr; flags dynamic; timeout 10s; size 128000; }"
+
+# whitelist internal interface.
+nft add rule ip filter input meta iifname "internal" accept
+
+# drop packets coming from blacklisted ip addresses.
+nft add rule ip filter input ip saddr @blackhole counter drop
+
+# add source ip addresses to the blacklist if more than 10 tcp connection
+# requests occurred per second and ip address.
+nft add rule ip filter input tcp flags syn tcp dport ssh \
+    add @flood { ip saddr limit rate over 10/second } \
+    add @blackhole { ip saddr } \
+    drop
+
+# inspect state of the sets.
+nft list set ip filter flood
+nft list set ip filter blackhole
+
+# manually add two addresses to the blackhole.
+nft add element filter blackhole { 10.2.3.4, 10.23.1.42 }
+-----------------------------------------------
+
+MAP STATEMENT
+~~~~~~~~~~~~~
+The map statement is used to lookup data based on some specific input key.
+
+[verse]
+____
+'expression' *map* *{* 'MAP_ELEMENTS' *}*
+
+'MAP_ELEMENTS' := 'MAP_ELEMENT' [*,* 'MAP_ELEMENTS']
+'MAP_ELEMENT'  := 'key' *:* 'value'
+____
+
+The 'key' is a value returned by 'expression'.
+// XXX: Write about where map statement can be used (list of statements?)
+
+.Using the map statement
+------------------------
+# select DNAT target based on TCP dport:
+# connections to port 80 are redirected to 192.168.1.100,
+# connections to port 8888 are redirected to 192.168.1.101
+nft add rule ip nat prerouting dnat tcp dport map { 80 : 192.168.1.100, 8888 : 192.168.1.101 }
+
+# source address based SNAT:
+# packets from net 192.168.1.0/24 will appear as originating from 10.0.0.1,
+# packets from net 192.168.2.0/24 will appear as originating from 10.0.0.2
+nft add rule ip nat postrouting snat to ip saddr map { 192.168.1.0/24 : 10.0.0.1, 192.168.2.0/24 : 10.0.0.2 }
+------------------------
+
+VMAP STATEMENT
+~~~~~~~~~~~~~~
+The verdict map (vmap) statement works analogous to the map statement, but
+contains verdicts as values.
+
+[verse]
+____
+'expression' *vmap* *{* 'VMAP_ELEMENTS' *}*
+
+'VMAP_ELEMENTS' := 'VMAP_ELEMENT' [*,* 'VMAP_ELEMENTS']
+'VMAP_ELEMENT'  := 'key' *:* 'verdict'
+____
+
+.Using the vmap statement
+-------------------------
+# jump to different chains depending on layer 4 protocol type:
+nft add rule ip filter input ip protocol vmap { tcp : jump tcp-chain, udp : jump udp-chain , icmp : jump icmp-chain }
+------------------------
+
+XT STATEMENT
+~~~~~~~~~~~~
+This represents an xt statement from xtables compat interface. It is a
+fallback if translation is not available or not complete.
+
+[verse]
+____
+*xt* 'TYPE' 'NAME'
+
+'TYPE' := *match* | *target* | *watcher*
+____
+
+Seeing this means the ruleset (or parts of it) were created by *iptables-nft*
+and one should use that to manage it.
+
+*BEWARE:* nftables won't restore these statements.