summaryrefslogtreecommitdiffstats
path: root/man7/packet.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/packet.7')
-rw-r--r--man7/packet.7694
1 files changed, 694 insertions, 0 deletions
diff --git a/man7/packet.7 b/man7/packet.7
new file mode 100644
index 0000000..b2a264c
--- /dev/null
+++ b/man7/packet.7
@@ -0,0 +1,694 @@
+.\" SPDX-License-Identifier: Linux-man-pages-1-para
+.\"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\"
+.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
+.\"
+.TH packet 7 2023-07-15 "Linux man-pages 6.05.01"
+.SH NAME
+packet \- packet interface on device level
+.SH SYNOPSIS
+.nf
+.B #include <sys/socket.h>
+.B #include <linux/if_packet.h>
+.B #include <net/ethernet.h> /* the L2 protocols */
+.PP
+.BI "packet_socket = socket(AF_PACKET, int " socket_type ", int "protocol );
+.fi
+.SH DESCRIPTION
+Packet sockets are used to receive or send raw packets at the device driver
+(OSI Layer 2) level.
+They allow the user to implement protocol modules in user space
+on top of the physical layer.
+.PP
+The
+.I socket_type
+is either
+.B SOCK_RAW
+for raw packets including the link-level header or
+.B SOCK_DGRAM
+for cooked packets with the link-level header removed.
+The link-level header information is available in a common format in a
+.I sockaddr_ll
+structure.
+.I protocol
+is the IEEE 802.3 protocol number in network byte order.
+See the
+.I <linux/if_ether.h>
+include file for a list of allowed protocols.
+When protocol
+is set to
+.BR htons(ETH_P_ALL) ,
+then all protocols are received.
+All incoming packets of that protocol type will be passed to the packet
+socket before they are passed to the protocols implemented in the kernel.
+If
+.I protocol
+is set to zero,
+no packets are received.
+.BR bind (2)
+can optionally be called with a nonzero
+.I sll_protocol
+to start receiving packets for the protocols specified.
+.PP
+In order to create a packet socket, a process must have the
+.B CAP_NET_RAW
+capability in the user namespace that governs its network namespace.
+.PP
+.B SOCK_RAW
+packets are passed to and from the device driver without any changes in
+the packet data.
+When receiving a packet, the address is still parsed and
+passed in a standard
+.I sockaddr_ll
+address structure.
+When transmitting a packet, the user-supplied buffer
+should contain the physical-layer header.
+That packet is then
+queued unmodified to the network driver of the interface defined by the
+destination address.
+Some device drivers always add other headers.
+.B SOCK_RAW
+is similar to but not compatible with the obsolete
+.B AF_INET/SOCK_PACKET
+of Linux 2.0.
+.PP
+.B SOCK_DGRAM
+operates on a slightly higher level.
+The physical header is removed before the packet is passed to the user.
+Packets sent through a
+.B SOCK_DGRAM
+packet socket get a suitable physical-layer header based on the
+information in the
+.I sockaddr_ll
+destination address before they are queued.
+.PP
+By default, all packets of the specified protocol type
+are passed to a packet socket.
+To get packets only from a specific interface use
+.BR bind (2)
+specifying an address in a
+.I struct sockaddr_ll
+to bind the packet socket to an interface.
+Fields used for binding are
+.I sll_family
+(should be
+.BR AF_PACKET ),
+.IR sll_protocol ,
+and
+.IR sll_ifindex .
+.PP
+The
+.BR connect (2)
+operation is not supported on packet sockets.
+.PP
+When the
+.B MSG_TRUNC
+flag is passed to
+.BR recvmsg (2),
+.BR recv (2),
+or
+.BR recvfrom (2),
+the real length of the packet on the wire is always returned,
+even when it is longer than the buffer.
+.SS Address types
+The
+.I sockaddr_ll
+structure is a device-independent physical-layer address.
+.PP
+.in +4n
+.EX
+struct sockaddr_ll {
+ unsigned short sll_family; /* Always AF_PACKET */
+ unsigned short sll_protocol; /* Physical\-layer protocol */
+ int sll_ifindex; /* Interface number */
+ unsigned short sll_hatype; /* ARP hardware type */
+ unsigned char sll_pkttype; /* Packet type */
+ unsigned char sll_halen; /* Length of address */
+ unsigned char sll_addr[8]; /* Physical\-layer address */
+};
+.EE
+.in
+.PP
+The fields of this structure are as follows:
+.TP
+.I sll_protocol
+is the standard ethernet protocol type in network byte order as defined
+in the
+.I <linux/if_ether.h>
+include file.
+It defaults to the socket's protocol.
+.TP
+.I sll_ifindex
+is the interface index of the interface
+(see
+.BR netdevice (7));
+0 matches any interface (only permitted for binding).
+.I sll_hatype
+is an ARP type as defined in the
+.I <linux/if_arp.h>
+include file.
+.TP
+.I sll_pkttype
+contains the packet type.
+Valid types are
+.B PACKET_HOST
+for a packet addressed to the local host,
+.B PACKET_BROADCAST
+for a physical-layer broadcast packet,
+.B PACKET_MULTICAST
+for a packet sent to a physical-layer multicast address,
+.B PACKET_OTHERHOST
+for a packet to some other host that has been caught by a device driver
+in promiscuous mode, and
+.B PACKET_OUTGOING
+for a packet originating from the local host that is looped back to a packet
+socket.
+These types make sense only for receiving.
+.TP
+.I sll_addr
+.TQ
+.I sll_halen
+contain the physical-layer (e.g., IEEE 802.3) address and its length.
+The exact interpretation depends on the device.
+.PP
+When you send packets, it is enough to specify
+.IR sll_family ,
+.IR sll_addr ,
+.IR sll_halen ,
+.IR sll_ifindex ,
+and
+.IR sll_protocol .
+The other fields should be 0.
+.I sll_hatype
+and
+.I sll_pkttype
+are set on received packets for your information.
+.SS Socket options
+Packet socket options are configured by calling
+.BR setsockopt (2)
+with level
+.BR SOL_PACKET .
+.TP
+.B PACKET_ADD_MEMBERSHIP
+.PD 0
+.TP
+.B PACKET_DROP_MEMBERSHIP
+.PD
+Packet sockets can be used to configure physical-layer multicasting
+and promiscuous mode.
+.B PACKET_ADD_MEMBERSHIP
+adds a binding and
+.B PACKET_DROP_MEMBERSHIP
+drops it.
+They both expect a
+.I packet_mreq
+structure as argument:
+.IP
+.in +4n
+.EX
+struct packet_mreq {
+ int mr_ifindex; /* interface index */
+ unsigned short mr_type; /* action */
+ unsigned short mr_alen; /* address length */
+ unsigned char mr_address[8]; /* physical\-layer address */
+};
+.EE
+.in
+.IP
+.I mr_ifindex
+contains the interface index for the interface whose status
+should be changed.
+The
+.I mr_type
+field specifies which action to perform.
+.B PACKET_MR_PROMISC
+enables receiving all packets on a shared medium (often known as
+"promiscuous mode"),
+.B PACKET_MR_MULTICAST
+binds the socket to the physical-layer multicast group specified in
+.I mr_address
+and
+.IR mr_alen ,
+and
+.B PACKET_MR_ALLMULTI
+sets the socket up to receive all multicast packets arriving at
+the interface.
+.IP
+In addition, the traditional ioctls
+.BR SIOCSIFFLAGS ,
+.BR SIOCADDMULTI ,
+.B SIOCDELMULTI
+can be used for the same purpose.
+.TP
+.BR PACKET_AUXDATA " (since Linux 2.6.21)"
+.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc
+If this binary option is enabled, the packet socket passes a metadata
+structure along with each packet in the
+.BR recvmsg (2)
+control field.
+The structure can be read with
+.BR cmsg (3).
+It is defined as
+.IP
+.in +4n
+.EX
+struct tpacket_auxdata {
+ __u32 tp_status;
+ __u32 tp_len; /* packet length */
+ __u32 tp_snaplen; /* captured length */
+ __u16 tp_mac;
+ __u16 tp_net;
+ __u16 tp_vlan_tci;
+ __u16 tp_vlan_tpid; /* Since Linux 3.14; earlier, these
+ were unused padding bytes */
+.\" commit a0cdfcf39362410d5ea983f4daf67b38de129408 added tp_vlan_tpid
+};
+.EE
+.in
+.TP
+.BR PACKET_FANOUT " (since Linux 3.1)"
+.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
+To scale processing across threads, packet sockets can form a fanout
+group.
+In this mode, each matching packet is enqueued onto only one
+socket in the group.
+A socket joins a fanout group by calling
+.BR setsockopt (2)
+with level
+.B SOL_PACKET
+and option
+.BR PACKET_FANOUT .
+Each network namespace can have up to 65536 independent groups.
+A socket selects a group by encoding the ID in the first 16 bits of
+the integer option value.
+The first packet socket to join a group implicitly creates it.
+To successfully join an existing group, subsequent packet sockets
+must have the same protocol, device settings, fanout mode, and
+flags (see below).
+Packet sockets can leave a fanout group only by closing the socket.
+The group is deleted when the last socket is closed.
+.IP
+Fanout supports multiple algorithms to spread traffic between sockets,
+as follows:
+.RS
+.IP \[bu] 3
+The default mode,
+.BR PACKET_FANOUT_HASH ,
+sends packets from the same flow to the same socket to maintain
+per-flow ordering.
+For each packet, it chooses a socket by taking the packet flow hash
+modulo the number of sockets in the group, where a flow hash is a hash
+over network-layer address and optional transport-layer port fields.
+.IP \[bu]
+The load-balance mode
+.B PACKET_FANOUT_LB
+implements a round-robin algorithm.
+.IP \[bu]
+.B PACKET_FANOUT_CPU
+selects the socket based on the CPU that the packet arrived on.
+.IP \[bu]
+.B PACKET_FANOUT_ROLLOVER
+processes all data on a single socket, moving to the next when one
+becomes backlogged.
+.IP \[bu]
+.B PACKET_FANOUT_RND
+selects the socket using a pseudo-random number generator.
+.IP \[bu]
+.B PACKET_FANOUT_QM
+.\" commit 2d36097d26b5991d71a2cf4a20c1a158f0f1bfcd
+(available since Linux 3.14)
+selects the socket using the recorded queue_mapping of the received skb.
+.RE
+.IP
+Fanout modes can take additional options.
+IP fragmentation causes packets from the same flow to have different
+flow hashes.
+The flag
+.BR PACKET_FANOUT_FLAG_DEFRAG ,
+if set, causes packets to be defragmented before fanout is applied, to
+preserve order even in this case.
+Fanout mode and options are communicated in the second 16 bits of the
+integer option value.
+The flag
+.B PACKET_FANOUT_FLAG_ROLLOVER
+enables the roll over mechanism as a backup strategy: if the
+original fanout algorithm selects a backlogged socket, the packet
+rolls over to the next available one.
+.TP
+.BR PACKET_LOSS " (with " PACKET_TX_RING )
+When a malformed packet is encountered on a transmit ring,
+the default is to reset its
+.I tp_status
+to
+.B TP_STATUS_WRONG_FORMAT
+and abort the transmission immediately.
+The malformed packet blocks itself and subsequently enqueued packets from
+being sent.
+The format error must be fixed, the associated
+.I tp_status
+reset to
+.BR TP_STATUS_SEND_REQUEST ,
+and the transmission process restarted via
+.BR send (2).
+However, if
+.B PACKET_LOSS
+is set, any malformed packet will be skipped, its
+.I tp_status
+reset to
+.BR TP_STATUS_AVAILABLE ,
+and the transmission process continued.
+.TP
+.BR PACKET_RESERVE " (with " PACKET_RX_RING )
+By default, a packet receive ring writes packets immediately following the
+metadata structure and alignment padding.
+This integer option reserves additional headroom.
+.TP
+.B PACKET_RX_RING
+Create a memory-mapped ring buffer for asynchronous packet reception.
+The packet socket reserves a contiguous region of application address
+space, lays it out into an array of packet slots and copies packets
+(up to
+.IR tp_snaplen )
+into subsequent slots.
+Each packet is preceded by a metadata structure similar to
+.IR tpacket_auxdata .
+The protocol fields encode the offset to the data
+from the start of the metadata header.
+.I tp_net
+stores the offset to the network layer.
+If the packet socket is of type
+.BR SOCK_DGRAM ,
+then
+.I tp_mac
+is the same.
+If it is of type
+.BR SOCK_RAW ,
+then that field stores the offset to the link-layer frame.
+Packet socket and application communicate the head and tail of the ring
+through the
+.I tp_status
+field.
+The packet socket owns all slots with
+.I tp_status
+equal to
+.BR TP_STATUS_KERNEL .
+After filling a slot, it changes the status of the slot to transfer
+ownership to the application.
+During normal operation, the new
+.I tp_status
+value has at least the
+.B TP_STATUS_USER
+bit set to signal that a received packet has been stored.
+When the application has finished processing a packet, it transfers
+ownership of the slot back to the socket by setting
+.I tp_status
+equal to
+.BR TP_STATUS_KERNEL .
+.IP
+Packet sockets implement multiple variants of the packet ring.
+The implementation details are described in
+.I Documentation/networking/packet_mmap.rst
+in the Linux kernel source tree.
+.TP
+.B PACKET_STATISTICS
+Retrieve packet socket statistics in the form of a structure
+.IP
+.in +4n
+.EX
+struct tpacket_stats {
+ unsigned int tp_packets; /* Total packet count */
+ unsigned int tp_drops; /* Dropped packet count */
+};
+.EE
+.in
+.IP
+Receiving statistics resets the internal counters.
+The statistics structure differs when using a ring of variant
+.BR TPACKET_V3 .
+.TP
+.BR PACKET_TIMESTAMP " (with " PACKET_RX_RING "; since Linux 2.6.36)"
+.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649
+The packet receive ring always stores a timestamp in the metadata header.
+By default, this is a software generated timestamp generated when the
+packet is copied into the ring.
+This integer option selects the type of timestamp.
+Besides the default, it support the two hardware formats described in
+.I Documentation/networking/timestamping.rst
+in the Linux kernel source tree.
+.TP
+.BR PACKET_TX_RING " (since Linux 2.6.31)"
+.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1
+Create a memory-mapped ring buffer for packet transmission.
+This option is similar to
+.B PACKET_RX_RING
+and takes the same arguments.
+The application writes packets into slots with
+.I tp_status
+equal to
+.B TP_STATUS_AVAILABLE
+and schedules them for transmission by changing
+.I tp_status
+to
+.BR TP_STATUS_SEND_REQUEST .
+When packets are ready to be transmitted, the application calls
+.BR send (2)
+or a variant thereof.
+The
+.I buf
+and
+.I len
+fields of this call are ignored.
+If an address is passed using
+.BR sendto (2)
+or
+.BR sendmsg (2),
+then that overrides the socket default.
+On successful transmission, the socket resets
+.I tp_status
+to
+.BR TP_STATUS_AVAILABLE .
+It immediately aborts the transmission on error unless
+.B PACKET_LOSS
+is set.
+.TP
+.BR PACKET_VERSION " (with " PACKET_RX_RING "; since Linux 2.6.27)"
+.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279
+By default,
+.B PACKET_RX_RING
+creates a packet receive ring of variant
+.BR TPACKET_V1 .
+To create another variant, configure the desired variant by setting this
+integer option before creating the ring.
+.TP
+.BR PACKET_QDISC_BYPASS " (since Linux 3.14)"
+.\" commit d346a3fae3ff1d99f5d0c819bf86edf9094a26a1
+By default, packets sent through packet sockets pass through the kernel's
+qdisc (traffic control) layer, which is fine for the vast majority of use
+cases.
+For traffic generator appliances using packet sockets
+that intend to brute-force flood the network\[em]for example,
+to test devices under load in a similar
+fashion to pktgen\[em]this layer can be bypassed by setting
+this integer option to 1.
+A side effect is that packet buffering in the qdisc layer is avoided,
+which will lead to increased drops when network
+device transmit queues are busy;
+therefore, use at your own risk.
+.SS Ioctls
+.B SIOCGSTAMP
+can be used to receive the timestamp of the last received packet.
+Argument is a
+.I struct timeval
+variable.
+.\" FIXME Document SIOCGSTAMPNS
+.PP
+In addition, all standard ioctls defined in
+.BR netdevice (7)
+and
+.BR socket (7)
+are valid on packet sockets.
+.SS Error handling
+Packet sockets do no error handling other than errors occurred
+while passing the packet to the device driver.
+They don't have the concept of a pending error.
+.SH ERRORS
+.TP
+.B EADDRNOTAVAIL
+Unknown multicast group address passed.
+.TP
+.B EFAULT
+User passed invalid memory address.
+.TP
+.B EINVAL
+Invalid argument.
+.TP
+.B EMSGSIZE
+Packet is bigger than interface MTU.
+.TP
+.B ENETDOWN
+Interface is not up.
+.TP
+.B ENOBUFS
+Not enough memory to allocate the packet.
+.TP
+.B ENODEV
+Unknown device name or interface index specified in interface address.
+.TP
+.B ENOENT
+No packet received.
+.TP
+.B ENOTCONN
+No interface address passed.
+.TP
+.B ENXIO
+Interface address contained an invalid interface index.
+.TP
+.B EPERM
+User has insufficient privileges to carry out this operation.
+.PP
+In addition, other errors may be generated by the low-level driver.
+.SH VERSIONS
+.B AF_PACKET
+is a new feature in Linux 2.2.
+Earlier Linux versions supported only
+.BR SOCK_PACKET .
+.SH NOTES
+For portable programs it is suggested to use
+.B AF_PACKET
+via
+.BR pcap (3);
+although this covers only a subset of the
+.B AF_PACKET
+features.
+.PP
+The
+.B SOCK_DGRAM
+packet sockets make no attempt to create or parse the IEEE 802.2 LLC
+header for a IEEE 802.3 frame.
+When
+.B ETH_P_802_3
+is specified as protocol for sending the kernel creates the
+802.3 frame and fills out the length field; the user has to supply the LLC
+header to get a fully conforming packet.
+Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
+fields; instead they are supplied to the user as protocol
+.B ETH_P_802_2
+with the LLC header prefixed.
+It is thus not possible to bind to
+.BR ETH_P_802_3 ;
+bind to
+.B ETH_P_802_2
+instead and do the protocol multiplex yourself.
+The default for sending is the standard Ethernet DIX
+encapsulation with the protocol filled in.
+.PP
+Packet sockets are not subject to the input or output firewall chains.
+.SS Compatibility
+In Linux 2.0, the only way to get a packet socket was with the call:
+.PP
+.in +4n
+.EX
+socket(AF_INET, SOCK_PACKET, protocol)
+.EE
+.in
+.PP
+This is still supported, but deprecated and strongly discouraged.
+The main difference between the two methods is that
+.B SOCK_PACKET
+uses the old
+.I struct sockaddr_pkt
+to specify an interface, which doesn't provide physical-layer
+independence.
+.PP
+.in +4n
+.EX
+struct sockaddr_pkt {
+ unsigned short spkt_family;
+ unsigned char spkt_device[14];
+ unsigned short spkt_protocol;
+};
+.EE
+.in
+.PP
+.I spkt_family
+contains
+the device type,
+.I spkt_protocol
+is the IEEE 802.3 protocol type as defined in
+.I <sys/if_ether.h>
+and
+.I spkt_device
+is the device name as a null-terminated string, for example, eth0.
+.PP
+This structure is obsolete and should not be used in new code.
+.SH BUGS
+.SS LLC header handling
+The IEEE 802.2/803.3 LLC handling could be considered as a bug.
+.SS MSG_TRUNC issues
+The
+.B MSG_TRUNC
+.BR recvmsg (2)
+extension is an ugly hack and should be replaced by a control message.
+There is currently no way to get the original destination address of
+packets via
+.BR SOCK_DGRAM .
+.SS spkt_device device name truncation
+The
+.I spkt_device
+field of
+.I sockaddr_pkt
+has a size of 14 bytes,
+which is less than the constant
+.B IFNAMSIZ
+defined in
+.I <net/if.h>
+which is 16 bytes and describes the system limit for a network interface name.
+This means the names of network devices longer than 14 bytes
+will be truncated to fit into
+.IR spkt_device .
+All these lengths include the terminating null byte (\[aq]\e0\[aq])).
+.PP
+Issues from this with old code typically show up with
+very long interface names used by the
+.B Predictable Network Interface Names
+feature enabled by default in many modern Linux distributions.
+.PP
+The preferred solution is to rewrite code to avoid
+.BR SOCK_PACKET .
+Possible user solutions are to disable
+.B Predictable Network Interface Names
+or to rename the interface to a name of at most 13 bytes,
+for example using the
+.BR ip (8)
+tool.
+.SS Documentation issues
+Socket filters are not documented.
+.\" .SH CREDITS
+.\" This man page was written by Andi Kleen with help from Matthew Wilcox.
+.\" AF_PACKET in Linux 2.2 was implemented
+.\" by Alexey Kuznetsov, based on code by Alan Cox and others.
+.SH SEE ALSO
+.BR socket (2),
+.BR pcap (3),
+.BR capabilities (7),
+.BR ip (7),
+.BR raw (7),
+.BR socket (7),
+.BR ip (8),
+.PP
+RFC\ 894 for the standard IP Ethernet encapsulation.
+RFC\ 1700 for the IEEE 802.3 IP encapsulation.
+.PP
+The
+.I <linux/if_ether.h>
+include file for physical-layer protocols.
+.PP
+The Linux kernel source tree.
+.I Documentation/networking/filter.rst
+describes how to apply Berkeley Packet Filters to packet sockets.
+.I tools/testing/selftests/net/psock_tpacket.c
+contains example source code for all available versions of
+.B PACKET_RX_RING
+and
+.BR PACKET_TX_RING .