META EXPRESSIONS ~~~~~~~~~~~~~~~~ [verse] *meta* {*length* | *nfproto* | *l4proto* | *protocol* | *priority*} [*meta*] {*mark* | *iif* | *iifname* | *iiftype* | *oif* | *oifname* | *oiftype* | *skuid* | *skgid* | *nftrace* | *rtclassid* | *ibrname* | *obrname* | *pkttype* | *cpu* | *iifgroup* | *oifgroup* | *cgroup* | *random* | *ipsec* | *iifkind* | *oifkind* | *time* | *hour* | *day* } A meta expression refers to meta data associated with a packet. There are two types of meta expressions: unqualified and qualified meta expressions. Qualified meta expressions require the meta keyword before the meta key, unqualified meta expressions can be specified by using the meta key directly or as qualified meta expressions. Meta l4proto is useful to match a particular transport protocol that is part of either an IPv4 or IPv6 packet. It will also skip any IPv6 extension headers present in an IPv6 packet. meta iif, oif, iifname and oifname are used to match the interface a packet arrived on or is about to be sent out on. iif and oif are used to match on the interface index, whereas iifname and oifname are used to match on the interface name. This is not the same -- assuming the rule filter input meta iif "foo" Then this rule can only be added if the interface "foo" exists. Also, the rule will continue to match even if the interface "foo" is renamed to "bar". This is because internally the interface index is used. In case of dynamically created interfaces, such as tun/tap or dialup interfaces (ppp for example), it might be better to use iifname or oifname instead. In these cases, the name is used so the interface doesn't have to exist to add such a rule, it will stop matching if the interface gets renamed and it will match again in case interface gets deleted and later a new interface with the same name is created. Like with iptables, wildcard matching on interface name prefixes is available for *iifname* and *oifname* matches by appending an asterisk (*) character. Note however that unlike iptables, nftables does not accept interface names consisting of the wildcard character only - users are supposed to just skip those always matching expressions. In order to match on literal asterisk character, one may escape it using backslash (\). .Meta expression types [options="header"] |================== |Keyword | Description | Type |length| Length of the packet in bytes| integer (32-bit) |nfproto| real hook protocol family, useful only in inet table| integer (32 bit) |l4proto| layer 4 protocol, skips ipv6 extension headers| integer (8 bit) |protocol| EtherType protocol value| ether_type |priority| TC packet priority| tc_handle |mark| Packet mark | mark |iif| Input interface index | iface_index |iifname| Input interface name | ifname |iiftype| Input interface type| iface_type |oif| Output interface index| iface_index |oifname| Output interface name| ifname |oiftype| Output interface hardware type| iface_type |sdif| Slave device input interface index | iface_index |sdifname| Slave device interface name| ifname |skuid| UID associated with originating socket| uid |skgid| GID associated with originating socket| gid |rtclassid| Routing realm| realm |ibrname| Input bridge interface name| ifname |obrname| Output bridge interface name| ifname |pkttype| packet type| pkt_type |cpu| cpu number processing the packet| integer (32 bit) |iifgroup| incoming device group| devgroup |oifgroup| outgoing device group| devgroup |cgroup| control group id | integer (32 bit) |random| pseudo-random number| integer (32 bit) |ipsec| true if packet was ipsec encrypted | boolean (1 bit) |iifkind| Input interface kind | |oifkind| Output interface kind| |time| Absolute time of packet reception| Integer (32 bit) or string |day| Day of week| Integer (8 bit) or string |hour| Hour of day| String |==================== .Meta expression specific types [options="header"] |================== |Type | Description |iface_index | Interface index (32 bit number). Can be specified numerically or as name of an existing interface. |ifname| Interface name (16 byte string). Does not have to exist. |iface_type| Interface type (16 bit number). |uid| User ID (32 bit number). Can be specified numerically or as user name. |gid| Group ID (32 bit number). Can be specified numerically or as group name. |realm| Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms. |devgroup_type| Device group (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/group. |pkt_type| Packet type: *host* (addressed to local host), *broadcast* (to all), *multicast* (to group), *other* (addressed to another host). |ifkind| Interface kind (16 byte string). See TYPES in ip-link(8) for a list. |time| Either an integer or a date in ISO format. For example: "2019-06-06 17:00". Hour and seconds are optional and can be omitted if desired. If omitted, midnight will be assumed. The following three would be equivalent: "2019-06-06", "2019-06-06 00:00" and "2019-06-06 00:00:00". When an integer is given, it is assumed to be a UNIX timestamp. |day| Either a day of week ("Monday", "Tuesday", etc.), or an integer between 0 and 6. Strings are matched case-insensitively, and a full match is not expected (e.g. "Mon" would match "Monday"). When an integer is given, 0 is Sunday and 6 is Saturday. |hour| A string representing an hour in 24-hour format. Seconds can optionally be specified. For example, 17:00 and 17:00:00 would be equivalent. |============================= .Using meta expressions ----------------------- # qualified meta expression filter output meta oif eth0 filter forward meta iifkind { "tun", "veth" } # unqualified meta expression filter output oif eth0 # incoming packet was subject to ipsec processing raw prerouting meta ipsec exists accept ----------------------- SOCKET EXPRESSION ~~~~~~~~~~~~~~~~~ [verse] *socket* {*transparent* | *mark* | *wildcard*} *socket* *cgroupv2* *level* 'NUM' Socket expression can be used to search for an existing open TCP/UDP socket and its attributes that can be associated with a packet. It looks for an established or non-zero bound listening socket (possibly with a non-local address). You can also use it to match on the socket cgroupv2 at a given ancestor level, e.g. if the socket belongs to cgroupv2 'a/b', ancestor level 1 checks for a matching on cgroup 'a' and ancestor level 2 checks for a matching on cgroup 'b'. .Available socket attributes [options="header"] |================== |Name |Description| Type |transparent| Value of the IP_TRANSPARENT socket option in the found socket. It can be 0 or 1.| boolean (1 bit) |mark| Value of the socket mark (SOL_SOCKET, SO_MARK). | mark |wildcard| Indicates whether the socket is wildcard-bound (e.g. 0.0.0.0 or ::0). | boolean (1 bit) |cgroupv2| cgroup version 2 for this socket (path from /sys/fs/cgroup)| cgroupv2 |================== .Using socket expression ------------------------ # Mark packets that correspond to a transparent socket. "socket wildcard 0" # means that zero-bound listener sockets are NOT matched (which is usually # exactly what you want). table inet x { chain y { type filter hook prerouting priority mangle; policy accept; socket transparent 1 socket wildcard 0 mark set 0x00000001 accept } } # Trace packets that corresponds to a socket with a mark value of 15 table inet x { chain y { type filter hook prerouting priority mangle; policy accept; socket mark 0x0000000f nftrace set 1 } } # Set packet mark to socket mark table inet x { chain y { type filter hook prerouting priority mangle; policy accept; tcp dport 8080 mark set socket mark } } # Count packets for cgroupv2 "user.slice" at level 1 table inet x { chain y { type filter hook input priority filter; policy accept; socket cgroupv2 level 1 "user.slice" counter } } ---------------------- OSF EXPRESSION ~~~~~~~~~~~~~~ [verse] *osf* [*ttl* {*loose* | *skip*}] {*name* | *version*} The osf expression does passive operating system fingerprinting. This expression compares some data (Window Size, MSS, options and their order, DF, and others) from packets with the SYN bit set. .Available osf attributes [options="header"] |================== |Name |Description| Type |ttl| Do TTL checks on the packet to determine the operating system.| string |version| Do OS version checks on the packet.| |name| Name of the OS signature to match. All signatures can be found at pf.os file. Use "unknown" for OS signatures that the expression could not detect.| string |================== .Available ttl values --------------------- If no TTL attribute is passed, make a true IP header and fingerprint TTL true comparison. This generally works for LANs. * loose: Check if the IP header's TTL is less than the fingerprint one. Works for globally-routable addresses. * skip: Do not compare the TTL at all. --------------------- .Using osf expression --------------------- # Accept packets that match the "Linux" OS genre signature without comparing TTL. table inet x { chain y { type filter hook input priority filter; policy accept; osf ttl skip name "Linux" } } ----------------------- FIB EXPRESSIONS ~~~~~~~~~~~~~~~ [verse] *fib* {*saddr* | *daddr* | *mark* | *iif* | *oif*} [*.* ...] {*oif* | *oifname* | *type*} A fib expression queries the fib (forwarding information base) to obtain information such as the output interface index a particular address would use. The input is a tuple of elements that is used as input to the fib lookup functions. .fib expression specific types [options="header"] |================== |Keyword| Description| Type |oif| Output interface index| integer (32 bit) |oifname| Output interface name| string |type| Address type | fib_addrtype |======================= Use *nft* *describe* *fib_addrtype* to get a list of all address types. .Using fib expressions ---------------------- # drop packets without a reverse path filter prerouting fib saddr . iif oif missing drop In this example, 'saddr . iif' looks up routing information based on the source address and the input interface. oif picks the output interface index from the routing information. If no route was found for the source address/input interface combination, the output interface index is zero. In case the input interface is specified as part of the input key, the output interface index is always the same as the input interface index or zero. If only 'saddr oif' is given, then oif can be any interface index or zero. # drop packets to address not configured on incoming interface filter prerouting fib daddr . iif type != { local, broadcast, multicast } drop # perform lookup in a specific 'blackhole' table (0xdead, needs ip appropriate ip rule) filter prerouting meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : jump prohibited, unreachable : drop } ---------------------- ROUTING EXPRESSIONS ~~~~~~~~~~~~~~~~~~~ [verse] *rt* [*ip* | *ip6*] {*classid* | *nexthop* | *mtu* | *ipsec*} A routing expression refers to routing data associated with a packet. .Routing expression types [options="header"] |======================= |Keyword| Description| Type |classid| Routing realm| realm |nexthop| Routing nexthop| ipv4_addr/ipv6_addr |mtu| TCP maximum segment size of route | integer (16 bit) |ipsec| route via ipsec tunnel or transport | boolean |================================= .Routing expression specific types [options="header"] |======================= |Type| Description |realm| Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms. |======================== .Using routing expressions -------------------------- # IP family independent rt expression filter output rt classid 10 # IP family dependent rt expressions ip filter output rt nexthop 192.168.0.1 ip6 filter output rt nexthop fd00::1 inet filter output rt ip nexthop 192.168.0.1 inet filter output rt ip6 nexthop fd00::1 # outgoing packet will be encapsulated/encrypted by ipsec filter output rt ipsec exists -------------------------- IPSEC EXPRESSIONS ~~~~~~~~~~~~~~~~~ [verse] *ipsec* {*in* | *out*} [ *spnum* 'NUM' ] {*reqid* | *spi*} *ipsec* {*in* | *out*} [ *spnum* 'NUM' ] {*ip* | *ip6*} {*saddr* | *daddr*} An ipsec expression refers to ipsec data associated with a packet. The 'in' or 'out' keyword needs to be used to specify if the expression should examine inbound or outbound policies. The 'in' keyword can be used in the prerouting, input and forward hooks. The 'out' keyword applies to forward, output and postrouting hooks. The optional keyword spnum can be used to match a specific state in a chain, it defaults to 0. .Ipsec expression types [options="header"] |======================= |Keyword| Description| Type |reqid| Request ID| integer (32 bit) |spi| Security Parameter Index| integer (32 bit) |saddr| Source address of the tunnel| ipv4_addr/ipv6_addr |daddr| Destination address of the tunnel| ipv4_addr/ipv6_addr |================================= *Note:* When using xfrm_interface, this expression is not useable in output hook as the plain packet does not traverse it with IPsec info attached - use a chain in postrouting hook instead. NUMGEN EXPRESSION ~~~~~~~~~~~~~~~~~ [verse] *numgen* {*inc* | *random*} *mod* 'NUM' [ *offset* 'NUM' ] Create a number generator. The *inc* or *random* keywords control its operation mode: In *inc* mode, the last returned value is simply incremented. In *random* mode, a new random number is returned. The value after *mod* keyword specifies an upper boundary (read: modulus) which is not reached by returned numbers. The optional *offset* allows one to increment the returned value by a fixed offset. A typical use-case for *numgen* is load-balancing: .Using numgen expression ------------------------ # round-robin between 192.168.10.100 and 192.168.20.200: add rule nat prerouting dnat to numgen inc mod 2 map \ { 0 : 192.168.10.100, 1 : 192.168.20.200 } # probability-based with odd bias using intervals: add rule nat prerouting dnat to numgen random mod 10 map \ { 0-2 : 192.168.10.100, 3-9 : 192.168.20.200 } ------------------------ HASH EXPRESSIONS ~~~~~~~~~~~~~~~~ [verse] *jhash* {*ip saddr* | *ip6 daddr* | *tcp dport* | *udp sport* | *ether saddr*} [*.* ...] *mod* 'NUM' [ *seed* 'NUM' ] [ *offset* 'NUM' ] *symhash* *mod* 'NUM' [ *offset* 'NUM' ] Use a hashing function to generate a number. The functions available are *jhash*, known as Jenkins Hash, and *symhash*, for Symmetric Hash. The *jhash* requires an expression to determine the parameters of the packet header to apply the hashing, concatenations are possible as well. The value after *mod* keyword specifies an upper boundary (read: modulus) which is not reached by returned numbers. The optional *seed* is used to specify an init value used as seed in the hashing function. The optional *offset* allows one to increment the returned value by a fixed offset. A typical use-case for *jhash* and *symhash* is load-balancing: .Using hash expressions ------------------------ # load balance based on source ip between 2 ip addresses: add rule nat prerouting dnat to jhash ip saddr mod 2 map \ { 0 : 192.168.10.100, 1 : 192.168.20.200 } # symmetric load balancing between 2 ip addresses: add rule nat prerouting dnat to symhash mod 2 map \ { 0 : 192.168.10.100, 1 : 192.168.20.200 } ------------------------