summaryrefslogtreecommitdiffstats
path: root/doc/primary-expression.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/primary-expression.txt')
-rw-r--r--doc/primary-expression.txt488
1 files changed, 488 insertions, 0 deletions
diff --git a/doc/primary-expression.txt b/doc/primary-expression.txt
new file mode 100644
index 0000000..e13970c
--- /dev/null
+++ b/doc/primary-expression.txt
@@ -0,0 +1,488 @@
+META EXPRESSIONS
+~~~~~~~~~~~~~~~~
+[verse]
+*meta* {*length* | *nfproto* | *l4proto* | *protocol* | *priority*}
+[*meta*] {*mark* | *iif* | *iifname* | *iiftype* | *oif* | *oifname* | *oiftype* | *skuid* | *skgid* | *nftrace* | *rtclassid* | *ibrname* | *obrname* | *pkttype* | *cpu* | *iifgroup* | *oifgroup* | *cgroup* | *random* | *ipsec* | *iifkind* | *oifkind* | *time* | *hour* | *day* }
+
+A meta expression refers to meta data associated with a packet.
+
+There are two types of meta expressions: unqualified and qualified meta
+expressions. Qualified meta expressions require the meta keyword before the meta
+key, unqualified meta expressions can be specified by using the meta key
+directly or as qualified meta expressions. Meta l4proto is useful to match a
+particular transport protocol that is part of either an IPv4 or IPv6 packet. It
+will also skip any IPv6 extension headers present in an IPv6 packet.
+
+meta iif, oif, iifname and oifname are used to match the interface a packet
+arrived on or is about to be sent out on.
+
+iif and oif are used to match on the interface index, whereas iifname and
+oifname are used to match on the interface name.
+This is not the same -- assuming the rule
+
+ filter input meta iif "foo"
+
+Then this rule can only be added if the interface "foo" exists.
+Also, the rule will continue to match even if the
+interface "foo" is renamed to "bar".
+
+This is because internally the interface index is used.
+In case of dynamically created interfaces, such as tun/tap or dialup
+interfaces (ppp for example), it might be better to use iifname or oifname
+instead.
+
+In these cases, the name is used so the interface doesn't have to exist to
+add such a rule, it will stop matching if the interface gets renamed and it
+will match again in case interface gets deleted and later a new interface
+with the same name is created.
+
+Like with iptables, wildcard matching on interface name prefixes is available for
+*iifname* and *oifname* matches by appending an asterisk (*) character. Note
+however that unlike iptables, nftables does not accept interface names
+consisting of the wildcard character only - users are supposed to just skip
+those always matching expressions. In order to match on literal asterisk
+character, one may escape it using backslash (\).
+
+.Meta expression types
+[options="header"]
+|==================
+|Keyword | Description | Type
+|length|
+Length of the packet in bytes|
+integer (32-bit)
+|nfproto|
+real hook protocol family, useful only in inet table|
+integer (32 bit)
+|l4proto|
+layer 4 protocol, skips ipv6 extension headers|
+integer (8 bit)
+|protocol|
+EtherType protocol value|
+ether_type
+|priority|
+TC packet priority|
+tc_handle
+|mark|
+Packet mark |
+mark
+|iif|
+Input interface index |
+iface_index
+|iifname|
+Input interface name |
+ifname
+|iiftype|
+Input interface type|
+iface_type
+|oif|
+Output interface index|
+iface_index
+|oifname|
+Output interface name|
+ifname
+|oiftype|
+Output interface hardware type|
+iface_type
+|sdif|
+Slave device input interface index |
+iface_index
+|sdifname|
+Slave device interface name|
+ifname
+|skuid|
+UID associated with originating socket|
+uid
+|skgid|
+GID associated with originating socket|
+gid
+|rtclassid|
+Routing realm|
+realm
+|ibrname|
+Input bridge interface name|
+ifname
+|obrname|
+Output bridge interface name|
+ifname
+|pkttype|
+packet type|
+pkt_type
+|cpu|
+cpu number processing the packet|
+integer (32 bit)
+|iifgroup|
+incoming device group|
+devgroup
+|oifgroup|
+outgoing device group|
+devgroup
+|cgroup|
+control group id |
+integer (32 bit)
+|random|
+pseudo-random number|
+integer (32 bit)
+|ipsec|
+true if packet was ipsec encrypted |
+boolean (1 bit)
+|iifkind|
+Input interface kind |
+|oifkind|
+Output interface kind|
+|time|
+Absolute time of packet reception|
+Integer (32 bit) or string
+|day|
+Day of week|
+Integer (8 bit) or string
+|hour|
+Hour of day|
+String
+|====================
+
+.Meta expression specific types
+[options="header"]
+|==================
+|Type | Description
+|iface_index |
+Interface index (32 bit number). Can be specified numerically or as name of an existing interface.
+|ifname|
+Interface name (16 byte string). Does not have to exist.
+|iface_type|
+Interface type (16 bit number).
+|uid|
+User ID (32 bit number). Can be specified numerically or as user name.
+|gid|
+Group ID (32 bit number). Can be specified numerically or as group name.
+|realm|
+Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.
+|devgroup_type|
+Device group (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/group.
+|pkt_type|
+Packet type: *host* (addressed to local host), *broadcast* (to all),
+*multicast* (to group), *other* (addressed to another host).
+|ifkind|
+Interface kind (16 byte string). See TYPES in ip-link(8) for a list.
+|time|
+Either an integer or a date in ISO format. For example: "2019-06-06 17:00".
+Hour and seconds are optional and can be omitted if desired. If omitted,
+midnight will be assumed.
+The following three would be equivalent: "2019-06-06", "2019-06-06 00:00"
+and "2019-06-06 00:00:00".
+When an integer is given, it is assumed to be a UNIX timestamp.
+|day|
+Either a day of week ("Monday", "Tuesday", etc.), or an integer between 0 and 6.
+Strings are matched case-insensitively, and a full match is not expected (e.g. "Mon" would match "Monday").
+When an integer is given, 0 is Sunday and 6 is Saturday.
+|hour|
+A string representing an hour in 24-hour format. Seconds can optionally be specified.
+For example, 17:00 and 17:00:00 would be equivalent.
+|=============================
+
+.Using meta expressions
+-----------------------
+# qualified meta expression
+filter output meta oif eth0
+filter forward meta iifkind { "tun", "veth" }
+
+# unqualified meta expression
+filter output oif eth0
+
+# incoming packet was subject to ipsec processing
+raw prerouting meta ipsec exists accept
+-----------------------
+
+SOCKET EXPRESSION
+~~~~~~~~~~~~~~~~~
+[verse]
+*socket* {*transparent* | *mark* | *wildcard*}
+*socket* *cgroupv2* *level* 'NUM'
+
+Socket expression can be used to search for an existing open TCP/UDP socket and
+its attributes that can be associated with a packet. It looks for an established
+or non-zero bound listening socket (possibly with a non-local address). You can
+also use it to match on the socket cgroupv2 at a given ancestor level, e.g. if
+the socket belongs to cgroupv2 'a/b', ancestor level 1 checks for a matching on
+cgroup 'a' and ancestor level 2 checks for a matching on cgroup 'b'.
+
+.Available socket attributes
+[options="header"]
+|==================
+|Name |Description| Type
+|transparent|
+Value of the IP_TRANSPARENT socket option in the found socket. It can be 0 or 1.|
+boolean (1 bit)
+|mark| Value of the socket mark (SOL_SOCKET, SO_MARK). | mark
+|wildcard|
+Indicates whether the socket is wildcard-bound (e.g. 0.0.0.0 or ::0). |
+boolean (1 bit)
+|cgroupv2|
+cgroup version 2 for this socket (path from /sys/fs/cgroup)|
+cgroupv2
+|==================
+
+.Using socket expression
+------------------------
+# Mark packets that correspond to a transparent socket. "socket wildcard 0"
+# means that zero-bound listener sockets are NOT matched (which is usually
+# exactly what you want).
+table inet x {
+ chain y {
+ type filter hook prerouting priority mangle; policy accept;
+ socket transparent 1 socket wildcard 0 mark set 0x00000001 accept
+ }
+}
+
+# Trace packets that corresponds to a socket with a mark value of 15
+table inet x {
+ chain y {
+ type filter hook prerouting priority mangle; policy accept;
+ socket mark 0x0000000f nftrace set 1
+ }
+}
+
+# Set packet mark to socket mark
+table inet x {
+ chain y {
+ type filter hook prerouting priority mangle; policy accept;
+ tcp dport 8080 mark set socket mark
+ }
+}
+
+# Count packets for cgroupv2 "user.slice" at level 1
+table inet x {
+ chain y {
+ type filter hook input priority filter; policy accept;
+ socket cgroupv2 level 1 "user.slice" counter
+ }
+}
+----------------------
+
+OSF EXPRESSION
+~~~~~~~~~~~~~~
+[verse]
+*osf* [*ttl* {*loose* | *skip*}] {*name* | *version*}
+
+The osf expression does passive operating system fingerprinting. This
+expression compares some data (Window Size, MSS, options and their order, DF,
+and others) from packets with the SYN bit set.
+
+.Available osf attributes
+[options="header"]
+|==================
+|Name |Description| Type
+|ttl|
+Do TTL checks on the packet to determine the operating system.|
+string
+|version|
+Do OS version checks on the packet.|
+|name|
+Name of the OS signature to match. All signatures can be found at pf.os file.
+Use "unknown" for OS signatures that the expression could not detect.|
+string
+|==================
+
+.Available ttl values
+---------------------
+If no TTL attribute is passed, make a true IP header and fingerprint TTL true comparison. This generally works for LANs.
+
+* loose: Check if the IP header's TTL is less than the fingerprint one. Works for globally-routable addresses.
+* skip: Do not compare the TTL at all.
+---------------------
+
+.Using osf expression
+---------------------
+# Accept packets that match the "Linux" OS genre signature without comparing TTL.
+table inet x {
+ chain y {
+ type filter hook input priority filter; policy accept;
+ osf ttl skip name "Linux"
+ }
+}
+-----------------------
+
+FIB EXPRESSIONS
+~~~~~~~~~~~~~~~
+[verse]
+*fib* {*saddr* | *daddr* | *mark* | *iif* | *oif*} [*.* ...] {*oif* | *oifname* | *type*}
+
+A fib expression queries the fib (forwarding information base) to obtain
+information such as the output interface index a particular address would use.
+The input is a tuple of elements that is used as input to the fib lookup
+functions.
+
+.fib expression specific types
+[options="header"]
+|==================
+|Keyword| Description| Type
+|oif|
+Output interface index|
+integer (32 bit)
+|oifname|
+Output interface name|
+string
+|type|
+Address type |
+fib_addrtype
+|=======================
+
+Use *nft* *describe* *fib_addrtype* to get a list of all address types.
+
+.Using fib expressions
+----------------------
+# drop packets without a reverse path
+filter prerouting fib saddr . iif oif missing drop
+
+In this example, 'saddr . iif' looks up routing information based on the source address and the input interface.
+oif picks the output interface index from the routing information.
+If no route was found for the source address/input interface combination, the output interface index is zero.
+In case the input interface is specified as part of the input key, the output interface index is always the same as the input interface index or zero.
+If only 'saddr oif' is given, then oif can be any interface index or zero.
+
+# drop packets to address not configured on incoming interface
+filter prerouting fib daddr . iif type != { local, broadcast, multicast } drop
+
+# perform lookup in a specific 'blackhole' table (0xdead, needs ip appropriate ip rule)
+filter prerouting meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : jump prohibited, unreachable : drop }
+----------------------
+
+ROUTING EXPRESSIONS
+~~~~~~~~~~~~~~~~~~~
+[verse]
+*rt* [*ip* | *ip6*] {*classid* | *nexthop* | *mtu* | *ipsec*}
+
+A routing expression refers to routing data associated with a packet.
+
+.Routing expression types
+[options="header"]
+|=======================
+|Keyword| Description| Type
+|classid|
+Routing realm|
+realm
+|nexthop|
+Routing nexthop|
+ipv4_addr/ipv6_addr
+|mtu|
+TCP maximum segment size of route |
+integer (16 bit)
+|ipsec|
+route via ipsec tunnel or transport |
+boolean
+|=================================
+
+.Routing expression specific types
+[options="header"]
+|=======================
+|Type| Description
+|realm|
+Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.
+|========================
+
+.Using routing expressions
+--------------------------
+# IP family independent rt expression
+filter output rt classid 10
+
+# IP family dependent rt expressions
+ip filter output rt nexthop 192.168.0.1
+ip6 filter output rt nexthop fd00::1
+inet filter output rt ip nexthop 192.168.0.1
+inet filter output rt ip6 nexthop fd00::1
+
+# outgoing packet will be encapsulated/encrypted by ipsec
+filter output rt ipsec exists
+--------------------------
+
+IPSEC EXPRESSIONS
+~~~~~~~~~~~~~~~~~
+
+[verse]
+*ipsec* {*in* | *out*} [ *spnum* 'NUM' ] {*reqid* | *spi*}
+*ipsec* {*in* | *out*} [ *spnum* 'NUM' ] {*ip* | *ip6*} {*saddr* | *daddr*}
+
+An ipsec expression refers to ipsec data associated with a packet.
+
+The 'in' or 'out' keyword needs to be used to specify if the expression should
+examine inbound or outbound policies. The 'in' keyword can be used in the
+prerouting, input and forward hooks. The 'out' keyword applies to forward,
+output and postrouting hooks.
+The optional keyword spnum can be used to match a specific state in a chain,
+it defaults to 0.
+
+.Ipsec expression types
+[options="header"]
+|=======================
+|Keyword| Description| Type
+|reqid|
+Request ID|
+integer (32 bit)
+|spi|
+Security Parameter Index|
+integer (32 bit)
+|saddr|
+Source address of the tunnel|
+ipv4_addr/ipv6_addr
+|daddr|
+Destination address of the tunnel|
+ipv4_addr/ipv6_addr
+|=================================
+
+*Note:* When using xfrm_interface, this expression is not useable in output
+hook as the plain packet does not traverse it with IPsec info attached - use a
+chain in postrouting hook instead.
+
+NUMGEN EXPRESSION
+~~~~~~~~~~~~~~~~~
+
+[verse]
+*numgen* {*inc* | *random*} *mod* 'NUM' [ *offset* 'NUM' ]
+
+Create a number generator. The *inc* or *random* keywords control its
+operation mode: In *inc* mode, the last returned value is simply incremented.
+In *random* mode, a new random number is returned. The value after *mod*
+keyword specifies an upper boundary (read: modulus) which is not reached by
+returned numbers. The optional *offset* allows one to increment the returned value
+by a fixed offset.
+
+A typical use-case for *numgen* is load-balancing:
+
+.Using numgen expression
+------------------------
+# round-robin between 192.168.10.100 and 192.168.20.200:
+add rule nat prerouting dnat to numgen inc mod 2 map \
+ { 0 : 192.168.10.100, 1 : 192.168.20.200 }
+
+# probability-based with odd bias using intervals:
+add rule nat prerouting dnat to numgen random mod 10 map \
+ { 0-2 : 192.168.10.100, 3-9 : 192.168.20.200 }
+------------------------
+
+HASH EXPRESSIONS
+~~~~~~~~~~~~~~~~
+
+[verse]
+*jhash* {*ip saddr* | *ip6 daddr* | *tcp dport* | *udp sport* | *ether saddr*} [*.* ...] *mod* 'NUM' [ *seed* 'NUM' ] [ *offset* 'NUM' ]
+*symhash* *mod* 'NUM' [ *offset* 'NUM' ]
+
+Use a hashing function to generate a number. The functions available are
+*jhash*, known as Jenkins Hash, and *symhash*, for Symmetric Hash. The
+*jhash* requires an expression to determine the parameters of the packet
+header to apply the hashing, concatenations are possible as well. The value
+after *mod* keyword specifies an upper boundary (read: modulus) which is
+not reached by returned numbers. The optional *seed* is used to specify an
+init value used as seed in the hashing function. The optional *offset*
+allows one to increment the returned value by a fixed offset.
+
+A typical use-case for *jhash* and *symhash* is load-balancing:
+
+.Using hash expressions
+------------------------
+# load balance based on source ip between 2 ip addresses:
+add rule nat prerouting dnat to jhash ip saddr mod 2 map \
+ { 0 : 192.168.10.100, 1 : 192.168.20.200 }
+
+# symmetric load balancing between 2 ip addresses:
+add rule nat prerouting dnat to symhash mod 2 map \
+ { 0 : 192.168.10.100, 1 : 192.168.20.200 }
+------------------------