summaryrefslogtreecommitdiffstats
path: root/Documentation/userspace-api/netlink
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-11 08:27:49 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-11 08:27:49 +0000
commitace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch)
treeb2d64bc10158fdd5497876388cd68142ca374ed3 /Documentation/userspace-api/netlink
parentInitial commit. (diff)
downloadlinux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz
linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip
Adding upstream version 6.6.15.upstream/6.6.15
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/userspace-api/netlink')
-rw-r--r--Documentation/userspace-api/netlink/c-code-gen.rst107
-rw-r--r--Documentation/userspace-api/netlink/genetlink-legacy.rst268
-rw-r--r--Documentation/userspace-api/netlink/index.rst19
-rw-r--r--Documentation/userspace-api/netlink/intro-specs.rst159
-rw-r--r--Documentation/userspace-api/netlink/intro.rst683
-rw-r--r--Documentation/userspace-api/netlink/netlink-raw.rst58
-rw-r--r--Documentation/userspace-api/netlink/specs.rst458
7 files changed, 1752 insertions, 0 deletions
diff --git a/Documentation/userspace-api/netlink/c-code-gen.rst b/Documentation/userspace-api/netlink/c-code-gen.rst
new file mode 100644
index 0000000000..89de42c133
--- /dev/null
+++ b/Documentation/userspace-api/netlink/c-code-gen.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+==============================
+Netlink spec C code generation
+==============================
+
+This document describes how Netlink specifications are used to render
+C code (uAPI, policies etc.). It also defines the additional properties
+allowed in older families by the ``genetlink-c`` protocol level,
+to control the naming.
+
+For brevity this document refers to ``name`` properties of various
+objects by the object type. For example ``$attr`` is the value
+of ``name`` in an attribute, and ``$family`` is the name of the
+family (the global ``name`` property).
+
+The upper case is used to denote literal values, e.g. ``$family-CMD``
+means the concatenation of ``$family``, a dash character, and the literal
+``CMD``.
+
+The names of ``#defines`` and enum values are always converted to upper case,
+and with dashes (``-``) replaced by underscores (``_``).
+
+If the constructed name is a C keyword, an extra underscore is
+appended (``do`` -> ``do_``).
+
+Globals
+=======
+
+``c-family-name`` controls the name of the ``#define`` for the family
+name, default is ``$family-FAMILY-NAME``.
+
+``c-version-name`` controls the name of the ``#define`` for the version
+of the family, default is ``$family-FAMILY-VERSION``.
+
+``max-by-define`` selects if max values for enums are defined as a
+``#define`` rather than inside the enum.
+
+Definitions
+===========
+
+Constants
+---------
+
+Every constant is rendered as a ``#define``.
+The name of the constant is ``$family-$constant`` and the value
+is rendered as a string or integer according to its type in the spec.
+
+Enums and flags
+---------------
+
+Enums are named ``$family-$enum``. The full name can be set directly
+or suppressed by specifying the ``enum-name`` property.
+Default entry name is ``$family-$enum-$entry``.
+If ``name-prefix`` is specified it replaces the ``$family-$enum``
+portion of the entry name.
+
+Boolean ``render-max`` controls creation of the max values
+(which are enabled by default for attribute enums).
+
+Attributes
+==========
+
+Each attribute set (excluding fractional sets) is rendered as an enum.
+
+Attribute enums are traditionally unnamed in netlink headers.
+If naming is desired ``enum-name`` can be used to specify the name.
+
+The default attribute name prefix is ``$family-A`` if the name of the set
+is the same as the name of the family and ``$family-A-$set`` if the names
+differ. The prefix can be overridden by the ``name-prefix`` property of a set.
+The rest of the section will refer to the prefix as ``$pfx``.
+
+Attributes are named ``$pfx-$attribute``.
+
+Attribute enums end with two special values ``__$pfx-MAX`` and ``$pfx-MAX``
+which are used for sizing attribute tables.
+These two names can be specified directly with the ``attr-cnt-name``
+and ``attr-max-name`` properties respectively.
+
+If ``max-by-define`` is set to ``true`` at the global level ``attr-max-name``
+will be specified as a ``#define`` rather than an enum value.
+
+Operations
+==========
+
+Operations are named ``$family-CMD-$operation``.
+If ``name-prefix`` is specified it replaces the ``$family-CMD``
+portion of the name.
+
+Similarly to attribute enums operation enums end with special count and max
+attributes. For operations those attributes can be renamed with
+``cmd-cnt-name`` and ``cmd-max-name``. Max will be a define if ``max-by-define``
+is ``true``.
+
+Multicast groups
+================
+
+Each multicast group gets a define rendered into the kernel uAPI header.
+The name of the define is ``$family-MCGRP-$group``, and can be overwritten
+with the ``c-define-name`` property.
+
+Code generation
+===============
+
+uAPI header is assumed to come from ``<linux/$family.h>`` in the default header
+search path. It can be changed using the ``uapi-header`` global property.
diff --git a/Documentation/userspace-api/netlink/genetlink-legacy.rst b/Documentation/userspace-api/netlink/genetlink-legacy.rst
new file mode 100644
index 0000000000..40b82ad5d5
--- /dev/null
+++ b/Documentation/userspace-api/netlink/genetlink-legacy.rst
@@ -0,0 +1,268 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=================================================================
+Netlink specification support for legacy Generic Netlink families
+=================================================================
+
+This document describes the many additional quirks and properties
+required to describe older Generic Netlink families which form
+the ``genetlink-legacy`` protocol level.
+
+Specification
+=============
+
+Attribute type nests
+--------------------
+
+New Netlink families should use ``multi-attr`` to define arrays.
+Older families (e.g. ``genetlink`` control family) attempted to
+define array types reusing attribute type to carry information.
+
+For reference the ``multi-attr`` array may look like this::
+
+ [ARRAY-ATTR]
+ [INDEX (optionally)]
+ [MEMBER1]
+ [MEMBER2]
+ [SOME-OTHER-ATTR]
+ [ARRAY-ATTR]
+ [INDEX (optionally)]
+ [MEMBER1]
+ [MEMBER2]
+
+where ``ARRAY-ATTR`` is the array entry type.
+
+array-nest
+~~~~~~~~~~
+
+``array-nest`` creates the following structure::
+
+ [SOME-OTHER-ATTR]
+ [ARRAY-ATTR]
+ [ENTRY]
+ [MEMBER1]
+ [MEMBER2]
+ [ENTRY]
+ [MEMBER1]
+ [MEMBER2]
+
+It wraps the entire array in an extra attribute (hence limiting its size
+to 64kB). The ``ENTRY`` nests are special and have the index of the entry
+as their type instead of normal attribute type.
+
+type-value
+~~~~~~~~~~
+
+``type-value`` is a construct which uses attribute types to carry
+information about a single object (often used when array is dumped
+entry-by-entry).
+
+``type-value`` can have multiple levels of nesting, for example
+genetlink's policy dumps create the following structures::
+
+ [POLICY-IDX]
+ [ATTR-IDX]
+ [POLICY-INFO-ATTR1]
+ [POLICY-INFO-ATTR2]
+
+Where the first level of nest has the policy index as it's attribute
+type, it contains a single nest which has the attribute index as its
+type. Inside the attr-index nest are the policy attributes. Modern
+Netlink families should have instead defined this as a flat structure,
+the nesting serves no good purpose here.
+
+Operations
+==========
+
+Enum (message ID) model
+-----------------------
+
+unified
+~~~~~~~
+
+Modern families use the ``unified`` message ID model, which uses
+a single enumeration for all messages within family. Requests and
+responses share the same message ID. Notifications have separate
+IDs from the same space. For example given the following list
+of operations:
+
+.. code-block:: yaml
+
+ -
+ name: a
+ value: 1
+ do: ...
+ -
+ name: b
+ do: ...
+ -
+ name: c
+ value: 4
+ notify: a
+ -
+ name: d
+ do: ...
+
+Requests and responses for operation ``a`` will have the ID of 1,
+the requests and responses of ``b`` - 2 (since there is no explicit
+``value`` it's previous operation ``+ 1``). Notification ``c`` will
+use the ID of 4, operation ``d`` 5 etc.
+
+directional
+~~~~~~~~~~~
+
+The ``directional`` model splits the ID assignment by the direction of
+the message. Messages from and to the kernel can't be confused with
+each other so this conserves the ID space (at the cost of making
+the programming more cumbersome).
+
+In this case ``value`` attribute should be specified in the ``request``
+``reply`` sections of the operations (if an operation has both ``do``
+and ``dump`` the IDs are shared, ``value`` should be set in ``do``).
+For notifications the ``value`` is provided at the op level but it
+only allocates a ``reply`` (i.e. a "from-kernel" ID). Let's look
+at an example:
+
+.. code-block:: yaml
+
+ -
+ name: a
+ do:
+ request:
+ value: 2
+ attributes: ...
+ reply:
+ value: 1
+ attributes: ...
+ -
+ name: b
+ notify: a
+ -
+ name: c
+ notify: a
+ value: 7
+ -
+ name: d
+ do: ...
+
+In this case ``a`` will use 2 when sending the message to the kernel
+and expects message with ID 1 in response. Notification ``b`` allocates
+a "from-kernel" ID which is 2. ``c`` allocates "from-kernel" ID of 7.
+If operation ``d`` does not set ``values`` explicitly in the spec
+it will be allocated 3 for the request (``a`` is the previous operation
+with a request section and the value of 2) and 8 for response (``c`` is
+the previous operation in the "from-kernel" direction).
+
+Other quirks
+============
+
+Structures
+----------
+
+Legacy families can define C structures both to be used as the contents of
+an attribute and as a fixed message header. Structures are defined in
+``definitions`` and referenced in operations or attributes.
+
+members
+~~~~~~~
+
+ - ``name`` - The attribute name of the struct member
+ - ``type`` - One of the scalar types ``u8``, ``u16``, ``u32``, ``u64``, ``s8``,
+ ``s16``, ``s32``, ``s64``, ``string`` or ``binary``.
+ - ``byte-order`` - ``big-endian`` or ``little-endian``
+ - ``doc``, ``enum``, ``enum-as-flags``, ``display-hint`` - Same as for
+ :ref:`attribute definitions <attribute_properties>`
+
+Note that structures defined in YAML are implicitly packed according to C
+conventions. For example, the following struct is 4 bytes, not 6 bytes:
+
+.. code-block:: c
+
+ struct {
+ u8 a;
+ u16 b;
+ u8 c;
+ }
+
+Any padding must be explicitly added and C-like languages should infer the
+need for explicit padding from whether the members are naturally aligned.
+
+Here is the struct definition from above, declared in YAML:
+
+.. code-block:: yaml
+
+ definitions:
+ -
+ name: message-header
+ type: struct
+ members:
+ -
+ name: a
+ type: u8
+ -
+ name: b
+ type: u16
+ -
+ name: c
+ type: u8
+
+Fixed Headers
+~~~~~~~~~~~~~
+
+Fixed message headers can be added to operations using ``fixed-header``.
+The default ``fixed-header`` can be set in ``operations`` and it can be set
+or overridden for each operation.
+
+.. code-block:: yaml
+
+ operations:
+ fixed-header: message-header
+ list:
+ -
+ name: get
+ fixed-header: custom-header
+ attribute-set: message-attrs
+
+Attributes
+~~~~~~~~~~
+
+A ``binary`` attribute can be interpreted as a C structure using a
+``struct`` property with the name of the structure definition. The
+``struct`` property implies ``sub-type: struct`` so it is not necessary to
+specify a sub-type.
+
+.. code-block:: yaml
+
+ attribute-sets:
+ -
+ name: stats-attrs
+ attributes:
+ -
+ name: stats
+ type: binary
+ struct: vport-stats
+
+C Arrays
+--------
+
+Legacy families also use ``binary`` attributes to encapsulate C arrays. The
+``sub-type`` is used to identify the type of scalar to extract.
+
+.. code-block:: yaml
+
+ attributes:
+ -
+ name: ports
+ type: binary
+ sub-type: u32
+
+Multi-message DO
+----------------
+
+New Netlink families should never respond to a DO operation with multiple
+replies, with ``NLM_F_MULTI`` set. Use a filtered dump instead.
+
+At the spec level we can define a ``dumps`` property for the ``do``,
+perhaps with values of ``combine`` and ``multi-object`` depending
+on how the parsing should be implemented (parse into a single reply
+vs list of objects i.e. pretty much a dump).
diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst
new file mode 100644
index 0000000000..62725dafbb
--- /dev/null
+++ b/Documentation/userspace-api/netlink/index.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+================
+Netlink Handbook
+================
+
+Netlink documentation for users.
+
+.. toctree::
+ :maxdepth: 2
+
+ intro
+ intro-specs
+ specs
+ c-code-gen
+ genetlink-legacy
+ netlink-raw
+
+See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>`.
diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst
new file mode 100644
index 0000000000..bada896994
--- /dev/null
+++ b/Documentation/userspace-api/netlink/intro-specs.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=====================================
+Using Netlink protocol specifications
+=====================================
+
+This document is a quick starting guide for using Netlink protocol
+specifications. For more detailed description of the specs see :doc:`specs`.
+
+Simple CLI
+==========
+
+Kernel comes with a simple CLI tool which should be useful when
+developing Netlink related code. The tool is implemented in Python
+and can use a YAML specification to issue Netlink requests
+to the kernel. Only Generic Netlink is supported.
+
+The tool is located at ``tools/net/ynl/cli.py``. It accepts
+a handul of arguments, the most important ones are:
+
+ - ``--spec`` - point to the spec file
+ - ``--do $name`` / ``--dump $name`` - issue request ``$name``
+ - ``--json $attrs`` - provide attributes for the request
+ - ``--subscribe $group`` - receive notifications from ``$group``
+
+YAML specs can be found under ``Documentation/netlink/specs/``.
+
+Example use::
+
+ $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \
+ --do rings-get \
+ --json '{"header":{"dev-index": 18}}'
+ {'header': {'dev-index': 18, 'dev-name': 'eni1np1'},
+ 'rx': 0,
+ 'rx-jumbo': 0,
+ 'rx-jumbo-max': 4096,
+ 'rx-max': 4096,
+ 'rx-mini': 0,
+ 'rx-mini-max': 4096,
+ 'tx': 0,
+ 'tx-max': 4096,
+ 'tx-push': 0}
+
+The input arguments are parsed as JSON, while the output is only
+Python-pretty-printed. This is because some Netlink types can't
+be expressed as JSON directly. If such attributes are needed in
+the input some hacking of the script will be necessary.
+
+The spec and Netlink internals are factored out as a standalone
+library - it should be easy to write Python tools / tests reusing
+code from ``cli.py``.
+
+Generating kernel code
+======================
+
+``tools/net/ynl/ynl-regen.sh`` scans the kernel tree in search of
+auto-generated files which need to be updated. Using this tool is the easiest
+way to generate / update auto-generated code.
+
+By default code is re-generated only if spec is newer than the source,
+to force regeneration use ``-f``.
+
+``ynl-regen.sh`` searches for ``YNL-GEN`` in the contents of files
+(note that it only scans files in the git index, that is only files
+tracked by git!) For instance the ``fou_nl.c`` kernel source contains::
+
+ /* Documentation/netlink/specs/fou.yaml */
+ /* YNL-GEN kernel source */
+
+``ynl-regen.sh`` will find this marker and replace the file with
+kernel source based on fou.yaml.
+
+The simplest way to generate a new file based on a spec is to add
+the two marker lines like above to a file, add that file to git,
+and run the regeneration tool. Grep the tree for ``YNL-GEN``
+to see other examples.
+
+The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py``
+but it takes a few arguments so calling it directly for each file
+quickly becomes tedious.
+
+YNL lib
+=======
+
+``tools/net/ynl/lib/`` contains an implementation of a C library
+(based on libmnl) which integrates with code generated by
+``tools/net/ynl/ynl-gen-c.py`` to create easy to use netlink wrappers.
+
+YNL basics
+----------
+
+The YNL library consists of two parts - the generic code (functions
+prefix by ``ynl_``) and per-family auto-generated code (prefixed
+with the name of the family).
+
+To create a YNL socket call ynl_sock_create() passing the family
+struct (family structs are exported by the auto-generated code).
+ynl_sock_destroy() closes the socket.
+
+YNL requests
+------------
+
+Steps for issuing YNL requests are best explained on an example.
+All the functions and types in this example come from the auto-generated
+code (for the netdev family in this case):
+
+.. code-block:: c
+
+ // 0. Request and response pointers
+ struct netdev_dev_get_req *req;
+ struct netdev_dev_get_rsp *d;
+
+ // 1. Allocate a request
+ req = netdev_dev_get_req_alloc();
+ // 2. Set request parameters (as needed)
+ netdev_dev_get_req_set_ifindex(req, ifindex);
+
+ // 3. Issues the request
+ d = netdev_dev_get(ys, req);
+ // 4. Free the request arguments
+ netdev_dev_get_req_free(req);
+ // 5. Error check (the return value from step 3)
+ if (!d) {
+ // 6. Print the YNL-generated error
+ fprintf(stderr, "YNL: %s\n", ys->err.msg);
+ return -1;
+ }
+
+ // ... do stuff with the response @d
+
+ // 7. Free response
+ netdev_dev_get_rsp_free(d);
+
+YNL dumps
+---------
+
+Performing dumps follows similar pattern as requests.
+Dumps return a list of objects terminated by a special marker,
+or NULL on error. Use ``ynl_dump_foreach()`` to iterate over
+the result.
+
+YNL notifications
+-----------------
+
+YNL lib supports using the same socket for notifications and
+requests. In case notifications arrive during processing of a request
+they are queued internally and can be retrieved at a later time.
+
+To subscribed to notifications use ``ynl_subscribe()``.
+The notifications have to be read out from the socket,
+``ynl_socket_get_fd()`` returns the underlying socket fd which can
+be plugged into appropriate asynchronous IO API like ``poll``,
+or ``select``.
+
+Notifications can be retrieved using ``ynl_ntf_dequeue()`` and have
+to be freed using ``ynl_ntf_free()``. Since we don't know the notification
+type upfront the notifications are returned as ``struct ynl_ntf_base_type *``
+and user is expected to cast them to the appropriate full type based
+on the ``cmd`` member.
diff --git a/Documentation/userspace-api/netlink/intro.rst b/Documentation/userspace-api/netlink/intro.rst
new file mode 100644
index 0000000000..7b1d401210
--- /dev/null
+++ b/Documentation/userspace-api/netlink/intro.rst
@@ -0,0 +1,683 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=======================
+Introduction to Netlink
+=======================
+
+Netlink is often described as an ioctl() replacement.
+It aims to replace fixed-format C structures as supplied
+to ioctl() with a format which allows an easy way to add
+or extended the arguments.
+
+To achieve this Netlink uses a minimal fixed-format metadata header
+followed by multiple attributes in the TLV (type, length, value) format.
+
+Unfortunately the protocol has evolved over the years, in an organic
+and undocumented fashion, making it hard to coherently explain.
+To make the most practical sense this document starts by describing
+netlink as it is used today and dives into more "historical" uses
+in later sections.
+
+Opening a socket
+================
+
+Netlink communication happens over sockets, a socket needs to be
+opened first:
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+The use of sockets allows for a natural way of exchanging information
+in both directions (to and from the kernel). The operations are still
+performed synchronously when applications send() the request but
+a separate recv() system call is needed to read the reply.
+
+A very simplified flow of a Netlink "call" will therefore look
+something like:
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+ /* format the request */
+ send(fd, &request, sizeof(request));
+ n = recv(fd, &response, RSP_BUFFER_SIZE);
+ /* interpret the response */
+
+Netlink also provides natural support for "dumping", i.e. communicating
+to user space all objects of a certain type (e.g. dumping all network
+interfaces).
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+ /* format the dump request */
+ send(fd, &request, sizeof(request));
+ while (1) {
+ n = recv(fd, &buffer, RSP_BUFFER_SIZE);
+ /* one recv() call can read multiple messages, hence the loop below */
+ for (nl_msg in buffer) {
+ if (nl_msg.nlmsg_type == NLMSG_DONE)
+ goto dump_finished;
+ /* process the object */
+ }
+ }
+ dump_finished:
+
+The first two arguments of the socket() call require little explanation -
+it is opening a Netlink socket, with all headers provided by the user
+(hence NETLINK, RAW). The last argument is the protocol within Netlink.
+This field used to identify the subsystem with which the socket will
+communicate.
+
+Classic vs Generic Netlink
+--------------------------
+
+Initial implementation of Netlink depended on a static allocation
+of IDs to subsystems and provided little supporting infrastructure.
+Let us refer to those protocols collectively as **Classic Netlink**.
+The list of them is defined on top of the ``include/uapi/linux/netlink.h``
+file, they include among others - general networking (NETLINK_ROUTE),
+iSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT).
+
+**Generic Netlink** (introduced in 2005) allows for dynamic registration of
+subsystems (and subsystem ID allocation), introspection and simplifies
+implementing the kernel side of the interface.
+
+The following section describes how to use Generic Netlink, as the
+number of subsystems using Generic Netlink outnumbers the older
+protocols by an order of magnitude. There are also no plans for adding
+more Classic Netlink protocols to the kernel.
+Basic information on how communicating with core networking parts of
+the Linux kernel (or another of the 20 subsystems using Classic
+Netlink) differs from Generic Netlink is provided later in this document.
+
+Generic Netlink
+===============
+
+In addition to the Netlink fixed metadata header each Netlink protocol
+defines its own fixed metadata header. (Similarly to how network
+headers stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.)
+
+A Netlink message always starts with struct nlmsghdr, which is followed
+by a protocol-specific header. In case of Generic Netlink the protocol
+header is struct genlmsghdr.
+
+The practical meaning of the fields in case of Generic Netlink is as follows:
+
+.. code-block:: c
+
+ struct nlmsghdr {
+ __u32 nlmsg_len; /* Length of message including headers */
+ __u16 nlmsg_type; /* Generic Netlink Family (subsystem) ID */
+ __u16 nlmsg_flags; /* Flags - request or dump */
+ __u32 nlmsg_seq; /* Sequence number */
+ __u32 nlmsg_pid; /* Port ID, set to 0 */
+ };
+ struct genlmsghdr {
+ __u8 cmd; /* Command, as defined by the Family */
+ __u8 version; /* Irrelevant, set to 1 */
+ __u16 reserved; /* Reserved, set to 0 */
+ };
+ /* TLV attributes follow... */
+
+In Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify
+which operation within the subsystem the message was referring to
+(e.g. get information about a netdev). Generic Netlink needs to mux
+multiple subsystems in a single protocol so it uses this field to
+identify the subsystem, and :c:member:`genlmsghdr.cmd` identifies
+the operation instead. (See :ref:`res_fam` for
+information on how to find the Family ID of the subsystem of interest.)
+Note that the first 16 values (0 - 15) of this field are reserved for
+control messages both in Classic Netlink and Generic Netlink.
+See :ref:`nl_msg_type` for more details.
+
+There are 3 usual types of message exchanges on a Netlink socket:
+
+ - performing a single action (``do``);
+ - dumping information (``dump``);
+ - getting asynchronous notifications (``multicast``).
+
+Classic Netlink is very flexible and presumably allows other types
+of exchanges to happen, but in practice those are the three that get
+used.
+
+Asynchronous notifications are sent by the kernel and received by
+the user sockets which subscribed to them. ``do`` and ``dump`` requests
+are initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should
+be set as follows:
+
+ - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK``
+ - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP``
+
+:c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically
+increasing value. The value gets echoed back in responses and doesn't
+matter in practice, but setting it to an increasing value for each
+message sent is considered good hygiene. The purpose of the field is
+matching responses to requests. Asynchronous notifications will have
+:c:member:`nlmsghdr.nlmsg_seq` of ``0``.
+
+:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address.
+This field can be set to ``0`` when talking to the kernel.
+See :ref:`nlmsg_pid` for the (uncommon) uses of the field.
+
+The expected use for :c:member:`genlmsghdr.version` was to allow
+versioning of the APIs provided by the subsystems. No subsystem to
+date made significant use of this field, so setting it to ``1`` seems
+like a safe bet.
+
+.. _nl_msg_type:
+
+Netlink message types
+---------------------
+
+As previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries
+protocol specific values but the first 16 identifiers are reserved
+(first subsystem specific message type should be equal to
+``NLMSG_MIN_TYPE`` which is ``0x10``).
+
+There are only 4 Netlink control messages defined:
+
+ - ``NLMSG_NOOP`` - ignore the message, not used in practice;
+ - ``NLMSG_ERROR`` - carries the return code of an operation;
+ - ``NLMSG_DONE`` - marks the end of a dump;
+ - ``NLMSG_OVERRUN`` - socket buffer has overflown, not used to date.
+
+``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance.
+They carry return codes for operations. Note that unless
+the ``NLM_F_ACK`` flag is set on the request Netlink will not respond
+with ``NLMSG_ERROR`` if there is no error. To avoid having to special-case
+this quirk it is recommended to always set ``NLM_F_ACK``.
+
+The format of ``NLMSG_ERROR`` is described by struct nlmsgerr::
+
+ ----------------------------------------------
+ | struct nlmsghdr - response header |
+ ----------------------------------------------
+ | int error |
+ ----------------------------------------------
+ | struct nlmsghdr - original request header |
+ ----------------------------------------------
+ | ** optionally (1) payload of the request |
+ ----------------------------------------------
+ | ** optionally (2) extended ACK |
+ ----------------------------------------------
+
+There are two instances of struct nlmsghdr here, first of the response
+and second of the request. ``NLMSG_ERROR`` carries the information about
+the request which led to the error. This could be useful when trying
+to match requests to responses or re-parse the request to dump it into
+logs.
+
+The payload of the request is not echoed in messages reporting success
+(``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set.
+The latter is common
+and perhaps recommended as having to read a copy of every request back
+from the kernel is rather wasteful. The absence of request payload
+is indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`.
+
+The second optional element of ``NLMSG_ERROR`` are the extended ACK
+attributes. See :ref:`ext_ack` for more details. The presence
+of extended ACK is indicated by ``NLM_F_ACK_TLVS`` in
+:c:member:`nlmsghdr.nlmsg_flags`.
+
+``NLMSG_DONE`` is simpler, the request is never echoed but the extended
+ACK attributes may be present::
+
+ ----------------------------------------------
+ | struct nlmsghdr - response header |
+ ----------------------------------------------
+ | int error |
+ ----------------------------------------------
+ | ** optionally extended ACK |
+ ----------------------------------------------
+
+.. _res_fam:
+
+Resolving the Family ID
+-----------------------
+
+This section explains how to find the Family ID of a subsystem.
+It also serves as an example of Generic Netlink communication.
+
+Generic Netlink is itself a subsystem exposed via the Generic Netlink API.
+To avoid a circular dependency Generic Netlink has a statically allocated
+Family ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``).
+The Generic Netlink family implements a command used to find out information
+about other families (``CTRL_CMD_GETFAMILY``).
+
+To get information about the Generic Netlink family named for example
+``"test1"`` we need to send a message on the previously opened Generic Netlink
+socket. The message should target the Generic Netlink Family (1), be a
+``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this
+call would make the kernel respond with information about *all* the families
+it knows about. Last but not least the name of the family in question has
+to be specified (4) as an attribute with the appropriate type::
+
+ struct nlmsghdr:
+ __u32 nlmsg_len: 32
+ __u16 nlmsg_type: GENL_ID_CTRL // (1)
+ __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK // (2)
+ __u32 nlmsg_seq: 1
+ __u32 nlmsg_pid: 0
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_GETFAMILY // (3)
+ __u8 version: 2 /* or 1, doesn't matter */
+ __u16 reserved: 0
+
+ struct nlattr: // (4)
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ char data: \0\0
+
+The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len`
+and :c:member:`nlattr.nla_len`) always *include* the header.
+Attribute headers in netlink must be aligned to 4 bytes from the start
+of the message, hence the extra ``\0\0`` after ``CTRL_ATTR_FAMILY_NAME``.
+The attribute lengths *exclude* the padding.
+
+If the family is found kernel will reply with two messages, the response
+with all the information about the family::
+
+ /* Message #1 - reply */
+ struct nlmsghdr:
+ __u32 nlmsg_len: 136
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: 0
+ __u32 nlmsg_seq: 1 /* echoed from our request */
+ __u32 nlmsg_pid: 5831 /* The PID of our user space process */
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_GETFAMILY
+ __u8 version: 2
+ __u16 reserved: 0
+
+ struct nlattr:
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 6
+ __u16 nla_type: CTRL_ATTR_FAMILY_ID
+ __u16: 123 /* The Family ID we are after */
+
+ (padding:)
+ char data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 9
+ __u16 nla_type: CTRL_ATTR_FAMILY_VERSION
+ __u16: 1
+
+ /* ... etc, more attributes will follow. */
+
+And the error code (success) since ``NLM_F_ACK`` had been set on the request::
+
+ /* Message #2 - the ACK */
+ struct nlmsghdr:
+ __u32 nlmsg_len: 36
+ __u16 nlmsg_type: NLMSG_ERROR
+ __u16 nlmsg_flags: NLM_F_CAPPED /* There won't be a payload */
+ __u32 nlmsg_seq: 1 /* echoed from our request */
+ __u32 nlmsg_pid: 5831 /* The PID of our user space process */
+
+ int error: 0
+
+ struct nlmsghdr: /* Copy of the request header as we sent it */
+ __u32 nlmsg_len: 32
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK
+ __u32 nlmsg_seq: 1
+ __u32 nlmsg_pid: 0
+
+The order of attributes (struct nlattr) is not guaranteed so the user
+has to walk the attributes and parse them.
+
+Note that Generic Netlink sockets are not associated or bound to a single
+family. A socket can be used to exchange messages with many different
+families, selecting the recipient family on message-by-message basis using
+the :c:member:`nlmsghdr.nlmsg_type` field.
+
+.. _ext_ack:
+
+Extended ACK
+------------
+
+Extended ACK controls reporting of additional error/warning TLVs
+in ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward
+compatibility this feature has to be explicitly enabled by setting
+the ``NETLINK_EXT_ACK`` setsockopt() to ``1``.
+
+Types of extended ack attributes are defined in enum nlmsgerr_attrs.
+The most commonly used attributes are ``NLMSGERR_ATTR_MSG``,
+``NLMSGERR_ATTR_OFFS`` and ``NLMSGERR_ATTR_MISS_*``.
+
+``NLMSGERR_ATTR_MSG`` carries a message in English describing
+the encountered problem. These messages are far more detailed
+than what can be expressed thru standard UNIX error codes.
+
+``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem.
+
+``NLMSGERR_ATTR_MISS_TYPE`` and ``NLMSGERR_ATTR_MISS_NEST``
+inform about a missing attribute.
+
+Extended ACKs can be reported on errors as well as in case of success.
+The latter should be treated as a warning.
+
+Extended ACKs greatly improve the usability of Netlink and should
+always be enabled, appropriately parsed and reported to the user.
+
+Advanced topics
+===============
+
+Dump consistency
+----------------
+
+Some of the data structures kernel uses for storing objects make
+it hard to provide an atomic snapshot of all the objects in a dump
+(without impacting the fast-paths updating them).
+
+Kernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump
+(including the ``NLMSG_DONE`` message) if the dump was interrupted and
+may be inconsistent (e.g. missing objects). User space should retry
+the dump if it sees the flag set.
+
+Introspection
+-------------
+
+The basic introspection abilities are enabled by access to the Family
+object as reported in :ref:`res_fam`. User can query information about
+the Generic Netlink family, including which operations are supported
+by the kernel and what attributes the kernel understands.
+Family information includes the highest ID of an attribute kernel can parse,
+a separate command (``CTRL_CMD_GETPOLICY``) provides detailed information
+about supported attributes, including ranges of values the kernel accepts.
+
+Querying family information is useful in cases when user space needs
+to make sure that the kernel has support for a feature before issuing
+a request.
+
+.. _nlmsg_pid:
+
+nlmsg_pid
+---------
+
+:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address.
+It is referred to as Port ID, sometimes Process ID because for historical
+reasons if the application does not select (bind() to) an explicit Port ID
+kernel will automatically assign it the ID equal to its Process ID
+(as reported by the getpid() system call).
+
+Similarly to the bind() semantics of the TCP/IP network protocols the value
+of zero means "assign automatically", hence it is common for applications
+to leave the :c:member:`nlmsghdr.nlmsg_pid` field initialized to ``0``.
+
+The field is still used today in rare cases when kernel needs to send
+a unicast notification. User space application can use bind() to associate
+its socket with a specific PID, it then communicates its PID to the kernel.
+This way the kernel can reach the specific user space process.
+
+This sort of communication is utilized in UMH (User Mode Helper)-like
+scenarios when kernel needs to trigger user space processing or ask user
+space for a policy decision.
+
+Multicast notifications
+-----------------------
+
+One of the strengths of Netlink is the ability to send event notifications
+to user space. This is a unidirectional form of communication (kernel ->
+user) and does not involve any control messages like ``NLMSG_ERROR`` or
+``NLMSG_DONE``.
+
+For example the Generic Netlink family itself defines a set of multicast
+notifications about registered families. When a new family is added the
+sockets subscribed to the notifications will get the following message::
+
+ struct nlmsghdr:
+ __u32 nlmsg_len: 136
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: 0
+ __u32 nlmsg_seq: 0
+ __u32 nlmsg_pid: 0
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_NEWFAMILY
+ __u8 version: 2
+ __u16 reserved: 0
+
+ struct nlattr:
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 6
+ __u16 nla_type: CTRL_ATTR_FAMILY_ID
+ __u16: 123 /* The Family ID we are after */
+
+ (padding:)
+ char data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 9
+ __u16 nla_type: CTRL_ATTR_FAMILY_VERSION
+ __u16: 1
+
+ /* ... etc, more attributes will follow. */
+
+The notification contains the same information as the response
+to the ``CTRL_CMD_GETFAMILY`` request.
+
+The Netlink headers of the notification are mostly 0 and irrelevant.
+The :c:member:`nlmsghdr.nlmsg_seq` may be either zero or a monotonically
+increasing notification sequence number maintained by the family.
+
+To receive notifications the user socket must subscribe to the relevant
+notification group. Much like the Family ID, the Group ID for a given
+multicast group is dynamic and can be found inside the Family information.
+The ``CTRL_ATTR_MCAST_GROUPS`` attribute contains nests with names
+(``CTRL_ATTR_MCAST_GRP_NAME``) and IDs (``CTRL_ATTR_MCAST_GRP_ID``) of
+the groups family.
+
+Once the Group ID is known a setsockopt() call adds the socket to the group:
+
+.. code-block:: c
+
+ unsigned int group_id;
+
+ /* .. find the group ID... */
+
+ setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP,
+ &group_id, sizeof(group_id));
+
+The socket will now receive notifications.
+
+It is recommended to use separate sockets for receiving notifications
+and sending requests to the kernel. The asynchronous nature of notifications
+means that they may get mixed in with the responses making the message
+handling much harder.
+
+Buffer sizing
+-------------
+
+Netlink sockets are datagram sockets rather than stream sockets,
+meaning that each message must be received in its entirety by a single
+recv()/recvmsg() system call. If the buffer provided by the user is too
+short, the message will be truncated and the ``MSG_TRUNC`` flag set
+in struct msghdr (struct msghdr is the second argument
+of the recvmsg() system call, *not* a Netlink header).
+
+Upon truncation the remaining part of the message is discarded.
+
+Netlink expects that the user buffer will be at least 8kB or a page
+size of the CPU architecture, whichever is bigger. Particular Netlink
+families may, however, require a larger buffer. 32kB buffer is recommended
+for most efficient handling of dumps (larger buffer fits more dumped
+objects and therefore fewer recvmsg() calls are needed).
+
+.. _classic_netlink:
+
+Classic Netlink
+===============
+
+The main differences between Classic and Generic Netlink are the dynamic
+allocation of subsystem identifiers and availability of introspection.
+In theory the protocol does not differ significantly, however, in practice
+Classic Netlink experimented with concepts which were abandoned in Generic
+Netlink (really, they usually only found use in a small corner of a single
+subsystem). This section is meant as an explainer of a few of such concepts,
+with the explicit goal of giving the Generic Netlink
+users the confidence to ignore them when reading the uAPI headers.
+
+Most of the concepts and examples here refer to the ``NETLINK_ROUTE`` family,
+which covers much of the configuration of the Linux networking stack.
+Real documentation of that family, deserves a chapter (or a book) of its own.
+
+Families
+--------
+
+Netlink refers to subsystems as families. This is a remnant of using
+sockets and the concept of protocol families, which are part of message
+demultiplexing in ``NETLINK_ROUTE``.
+
+Sadly every layer of encapsulation likes to refer to whatever it's carrying
+as "families" making the term very confusing:
+
+ 1. AF_NETLINK is a bona fide socket protocol family
+ 2. AF_NETLINK's documentation refers to what comes after its own
+ header (struct nlmsghdr) in a message as a "Family Header"
+ 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows
+ struct nlmsghdr), yet it also calls its users "Families".
+
+Note that the Generic Netlink Family IDs are in a different "ID space"
+and overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO``
+has the Classic Netlink protocol ID of 21 which Generic Netlink will
+happily allocate to one of its families as well).
+
+Strict checking
+---------------
+
+The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking
+in ``NETLINK_ROUTE``. It was needed because historically kernel did not
+validate the fields of structures it didn't process. This made it impossible
+to start using those fields later without risking regressions in applications
+which initialized them incorrectly or not at all.
+
+``NETLINK_GET_STRICT_CHK`` declares that the application is initializing
+all fields correctly. It also opts into validating that message does not
+contain trailing data and requests that kernel rejects attributes with
+type higher than largest attribute type known to the kernel.
+
+``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``.
+
+Unknown attributes
+------------------
+
+Historically Netlink ignored all unknown attributes. The thinking was that
+it would free the application from having to probe what kernel supports.
+The application could make a request to change the state and check which
+parts of the request "stuck".
+
+This is no longer the case for new Generic Netlink families and those opting
+in to strict checking. See enum netlink_validation for validation types
+performed.
+
+Fixed metadata and structures
+-----------------------------
+
+Classic Netlink made liberal use of fixed-format structures within
+the messages. Messages would commonly have a structure with
+a considerable number of fields after struct nlmsghdr. It was also
+common to put structures with multiple members inside attributes,
+without breaking each member into an attribute of its own.
+
+This has caused problems with validation and extensibility and
+therefore using binary structures is actively discouraged for new
+attributes.
+
+Request types
+-------------
+
+``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``,
+and ``SET``. Each object can handle all or some of those requests
+(objects being netdevs, routes, addresses, qdiscs etc.) Request type
+is defined by the 2 lowest bits of the message type, so commands for
+new objects would always be allocated with a stride of 4.
+
+Each object would also have its own fixed metadata shared by all request
+types (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address
+requests, struct tcmsg for qdisc requests).
+
+Even though other protocols and Generic Netlink commands often use
+the same verbs in their message names (``GET``, ``SET``) the concept
+of request types did not find wider adoption.
+
+Notification echo
+-----------------
+
+``NLM_F_ECHO`` requests for notifications resulting from the request
+to be queued onto the requesting socket. This is useful to discover
+the impact of the request.
+
+Note that this feature is not universally implemented.
+
+Other request-type-specific flags
+---------------------------------
+
+Classic Netlink defined various flags for its ``GET``, ``NEW``
+and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr.
+Since request types have not been generalized the request type specific
+flags are rarely used (and considered deprecated for new families).
+
+For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into
+``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used.
+
+For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK``
+only by FDB some operations.
+
+The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately,
+the meaning is not crystal clear. The following description is based on the
+best guess of the intention of the authors, and in practice all families
+stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace
+an existing object, if no matching object exists the operation should fail.
+``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already
+existed.
+``NLM_F_CREATE`` asks for the object to be created if it does not
+exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``.
+
+A comment in the main Netlink uAPI header states::
+
+ 4.4BSD ADD NLM_F_CREATE|NLM_F_EXCL
+ 4.4BSD CHANGE NLM_F_REPLACE
+
+ True CHANGE NLM_F_CREATE|NLM_F_REPLACE
+ Append NLM_F_CREATE
+ Check NLM_F_EXCL
+
+which seems to indicate that those flags predate request types.
+``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead
+of ``SET`` commands.
+``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists
+without creating it, presumably predating ``GET`` commands.
+
+``NLM_F_APPEND`` indicates that if one key can have multiple objects associated
+with it (e.g. multiple next-hop objects for a route) the new object should be
+added to the list rather than replacing the entire list.
+
+uAPI reference
+==============
+
+.. kernel-doc:: include/uapi/linux/netlink.h
diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst
new file mode 100644
index 0000000000..f07fb9b9c1
--- /dev/null
+++ b/Documentation/userspace-api/netlink/netlink-raw.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+======================================================
+Netlink specification support for raw Netlink families
+======================================================
+
+This document describes the additional properties required by raw Netlink
+families such as ``NETLINK_ROUTE`` which use the ``netlink-raw`` protocol
+specification.
+
+Specification
+=============
+
+The netlink-raw schema extends the :doc:`genetlink-legacy <genetlink-legacy>`
+schema with properties that are needed to specify the protocol numbers and
+multicast IDs used by raw netlink families. See :ref:`classic_netlink` for more
+information.
+
+Globals
+-------
+
+protonum
+~~~~~~~~
+
+The ``protonum`` property is used to specify the protocol number to use when
+opening a netlink socket.
+
+.. code-block:: yaml
+
+ # SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+ name: rt-addr
+ protocol: netlink-raw
+ protonum: 0 # part of the NETLINK_ROUTE protocol
+
+
+Multicast group properties
+--------------------------
+
+value
+~~~~~
+
+The ``value`` property is used to specify the group ID to use for multicast
+group registration.
+
+.. code-block:: yaml
+
+ mcast-groups:
+ list:
+ -
+ name: rtnlgrp-ipv4-ifaddr
+ value: 5
+ -
+ name: rtnlgrp-ipv6-ifaddr
+ value: 9
+ -
+ name: rtnlgrp-mctp-ifaddr
+ value: 34
diff --git a/Documentation/userspace-api/netlink/specs.rst b/Documentation/userspace-api/netlink/specs.rst
new file mode 100644
index 0000000000..cc4e243099
--- /dev/null
+++ b/Documentation/userspace-api/netlink/specs.rst
@@ -0,0 +1,458 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=========================================
+Netlink protocol specifications (in YAML)
+=========================================
+
+Netlink protocol specifications are complete, machine readable descriptions of
+Netlink protocols written in YAML. The goal of the specifications is to allow
+separating Netlink parsing from user space logic and minimize the amount of
+hand written Netlink code for each new family, command, attribute.
+Netlink specs should be complete and not depend on any other spec
+or C header file, making it easy to use in languages which can't include
+kernel headers directly.
+
+Internally kernel uses the YAML specs to generate:
+
+ - the C uAPI header
+ - documentation of the protocol as a ReST file
+ - policy tables for input attribute validation
+ - operation tables
+
+YAML specifications can be found under ``Documentation/netlink/specs/``
+
+This document describes details of the schema.
+See :doc:`intro-specs` for a practical starting guide.
+
+All specs must be licensed under
+``((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)``
+to allow for easy adoption in user space code.
+
+Compatibility levels
+====================
+
+There are four schema levels for Netlink specs, from the simplest used
+by new families to the most complex covering all the quirks of the old ones.
+Each next level inherits the attributes of the previous level, meaning that
+user capable of parsing more complex ``genetlink`` schemas is also compatible
+with simpler ones. The levels are:
+
+ - ``genetlink`` - most streamlined, should be used by all new families
+ - ``genetlink-c`` - superset of ``genetlink`` with extra attributes allowing
+ customization of define and enum type and value names; this schema should
+ be equivalent to ``genetlink`` for all implementations which don't interact
+ directly with C uAPI headers
+ - ``genetlink-legacy`` - Generic Netlink catch all schema supporting quirks of
+ all old genetlink families, strange attribute formats, binary structures etc.
+ - ``netlink-raw`` - catch all schema supporting pre-Generic Netlink protocols
+ such as ``NETLINK_ROUTE``
+
+The definition of the schemas (in ``jsonschema``) can be found
+under ``Documentation/netlink/``.
+
+Schema structure
+================
+
+YAML schema has the following conceptual sections:
+
+ - globals
+ - definitions
+ - attributes
+ - operations
+ - multicast groups
+
+Most properties in the schema accept (or in fact require) a ``doc``
+sub-property documenting the defined object.
+
+The following sections describe the properties of the most modern ``genetlink``
+schema. See the documentation of :doc:`genetlink-c <c-code-gen>`
+for information on how C names are derived from name properties.
+
+See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>` for
+information on the Netlink specification properties that are only relevant to
+the kernel space and not part of the user space API.
+
+genetlink
+=========
+
+Globals
+-------
+
+Attributes listed directly at the root level of the spec file.
+
+name
+~~~~
+
+Name of the family. Name identifies the family in a unique way, since
+the Family IDs are allocated dynamically.
+
+version
+~~~~~~~
+
+Generic Netlink family version, default is 1.
+
+protocol
+~~~~~~~~
+
+The schema level, default is ``genetlink``, which is the only value
+allowed for new ``genetlink`` families.
+
+definitions
+-----------
+
+Array of type and constant definitions.
+
+name
+~~~~
+
+Name of the type / constant.
+
+type
+~~~~
+
+One of the following types:
+
+ - const - a single, standalone constant
+ - enum - defines an integer enumeration, with values for each entry
+ incrementing by 1, (e.g. 0, 1, 2, 3)
+ - flags - defines an integer enumeration, with values for each entry
+ occupying a bit, starting from bit 0, (e.g. 1, 2, 4, 8)
+
+value
+~~~~~
+
+The value for the ``const``.
+
+value-start
+~~~~~~~~~~~
+
+The first value for ``enum`` and ``flags``, allows overriding the default
+start value of ``0`` (for ``enum``) and starting bit (for ``flags``).
+For ``flags`` ``value-start`` selects the starting bit, not the shifted value.
+
+Sparse enumerations are not supported.
+
+entries
+~~~~~~~
+
+Array of names of the entries for ``enum`` and ``flags``.
+
+header
+~~~~~~
+
+For C-compatible languages, header which already defines this value.
+In case the definition is shared by multiple families (e.g. ``IFNAMSIZ``)
+code generators for C-compatible languages may prefer to add an appropriate
+include instead of rendering a new definition.
+
+attribute-sets
+--------------
+
+This property contains information about netlink attributes of the family.
+All families have at least one attribute set, most have multiple.
+``attribute-sets`` is an array, with each entry describing a single set.
+
+Note that the spec is "flattened" and is not meant to visually resemble
+the format of the netlink messages (unlike certain ad-hoc documentation
+formats seen in kernel comments). In the spec subordinate attribute sets
+are not defined inline as a nest, but defined in a separate attribute set
+referred to with a ``nested-attributes`` property of the container.
+
+Spec may also contain fractional sets - sets which contain a ``subset-of``
+property. Such sets describe a section of a full set, allowing narrowing down
+which attributes are allowed in a nest or refining the validation criteria.
+Fractional sets can only be used in nests. They are not rendered to the uAPI
+in any fashion.
+
+name
+~~~~
+
+Uniquely identifies the attribute set, operations and nested attributes
+refer to the sets by the ``name``.
+
+subset-of
+~~~~~~~~~
+
+Re-defines a portion of another set (a fractional set).
+Allows narrowing down fields and changing validation criteria
+or even types of attributes depending on the nest in which they
+are contained. The ``value`` of each attribute in the fractional
+set is implicitly the same as in the main set.
+
+attributes
+~~~~~~~~~~
+
+List of attributes in the set.
+
+.. _attribute_properties:
+
+Attribute properties
+--------------------
+
+name
+~~~~
+
+Identifies the attribute, unique within the set.
+
+type
+~~~~
+
+Netlink attribute type, see :ref:`attr_types`.
+
+.. _assign_val:
+
+value
+~~~~~
+
+Numerical attribute ID, used in serialized Netlink messages.
+The ``value`` property can be skipped, in which case the attribute ID
+will be the value of the previous attribute plus one (recursively)
+and ``1`` for the first attribute in the attribute set.
+
+Attributes (and operations) use ``1`` as the default value for the first
+entry (unlike enums in definitions which start from ``0``) because
+entry ``0`` is almost always reserved as undefined. Spec can explicitly
+set value to ``0`` if needed.
+
+Note that the ``value`` of an attribute is defined only in its main set
+(not in subsets).
+
+enum
+~~~~
+
+For integer types specifies that values in the attribute belong
+to an ``enum`` or ``flags`` from the ``definitions`` section.
+
+enum-as-flags
+~~~~~~~~~~~~~
+
+Treat ``enum`` as ``flags`` regardless of its type in ``definitions``.
+When both ``enum`` and ``flags`` forms are needed ``definitions`` should
+contain an ``enum`` and attributes which need the ``flags`` form should
+use this attribute.
+
+nested-attributes
+~~~~~~~~~~~~~~~~~
+
+Identifies the attribute space for attributes nested within given attribute.
+Only valid for complex attributes which may have sub-attributes.
+
+multi-attr (arrays)
+~~~~~~~~~~~~~~~~~~~
+
+Boolean property signifying that the attribute may be present multiple times.
+Allowing an attribute to repeat is the recommended way of implementing arrays
+(no extra nesting).
+
+byte-order
+~~~~~~~~~~
+
+For integer types specifies attribute byte order - ``little-endian``
+or ``big-endian``.
+
+checks
+~~~~~~
+
+Input validation constraints used by the kernel. User space should query
+the policy of the running kernel using Generic Netlink introspection,
+rather than depend on what is specified in the spec file.
+
+The validation policy in the kernel is formed by combining the type
+definition (``type`` and ``nested-attributes``) and the ``checks``.
+
+sub-type
+~~~~~~~~
+
+Legacy families have special ways of expressing arrays. ``sub-type`` can be
+used to define the type of array members in case array members are not
+fully defined as attributes (in a bona fide attribute space). For instance
+a C array of u32 values can be specified with ``type: binary`` and
+``sub-type: u32``. Binary types and legacy array formats are described in
+more detail in :doc:`genetlink-legacy`.
+
+display-hint
+~~~~~~~~~~~~
+
+Optional format indicator that is intended only for choosing the right
+formatting mechanism when displaying values of this type. Currently supported
+hints are ``hex``, ``mac``, ``fddi``, ``ipv4``, ``ipv6`` and ``uuid``.
+
+operations
+----------
+
+This section describes messages passed between the kernel and the user space.
+There are three types of entries in this section - operations, notifications
+and events.
+
+Operations describe the most common request - response communication. User
+sends a request and kernel replies. Each operation may contain any combination
+of the two modes familiar to netlink users - ``do`` and ``dump``.
+``do`` and ``dump`` in turn contain a combination of ``request`` and
+``response`` properties. If no explicit message with attributes is passed
+in a given direction (e.g. a ``dump`` which does not accept filter, or a ``do``
+of a SET operation to which the kernel responds with just the netlink error
+code) ``request`` or ``response`` section can be skipped.
+``request`` and ``response`` sections list the attributes allowed in a message.
+The list contains only the names of attributes from a set referred
+to by the ``attribute-set`` property.
+
+Notifications and events both refer to the asynchronous messages sent by
+the kernel to members of a multicast group. The difference between the
+two is that a notification shares its contents with a GET operation
+(the name of the GET operation is specified in the ``notify`` property).
+This arrangement is commonly used for notifications about
+objects where the notification carries the full object definition.
+
+Events are more focused and carry only a subset of information rather than full
+object state (a made up example would be a link state change event with just
+the interface name and the new link state). Events contain the ``event``
+property. Events are considered less idiomatic for netlink and notifications
+should be preferred.
+
+list
+~~~~
+
+The only property of ``operations`` for ``genetlink``, holds the list of
+operations, notifications etc.
+
+Operation properties
+--------------------
+
+name
+~~~~
+
+Identifies the operation.
+
+value
+~~~~~
+
+Numerical message ID, used in serialized Netlink messages.
+The same enumeration rules are applied as to
+:ref:`attribute values<assign_val>`.
+
+attribute-set
+~~~~~~~~~~~~~
+
+Specifies the attribute set contained within the message.
+
+do
+~~~
+
+Specification for the ``doit`` request. Should contain ``request``, ``reply``
+or both of these properties, each holding a :ref:`attr_list`.
+
+dump
+~~~~
+
+Specification for the ``dumpit`` request. Should contain ``request``, ``reply``
+or both of these properties, each holding a :ref:`attr_list`.
+
+notify
+~~~~~~
+
+Designates the message as a notification. Contains the name of the operation
+(possibly the same as the operation holding this property) which shares
+the contents with the notification (``do``).
+
+event
+~~~~~
+
+Specification of attributes in the event, holds a :ref:`attr_list`.
+``event`` property is mutually exclusive with ``notify``.
+
+mcgrp
+~~~~~
+
+Used with ``event`` and ``notify``, specifies which multicast group
+message belongs to.
+
+.. _attr_list:
+
+Message attribute list
+----------------------
+
+``request``, ``reply`` and ``event`` properties have a single ``attributes``
+property which holds the list of attribute names.
+
+Messages can also define ``pre`` and ``post`` properties which will be rendered
+as ``pre_doit`` and ``post_doit`` calls in the kernel (these properties should
+be ignored by user space).
+
+mcast-groups
+------------
+
+This section lists the multicast groups of the family.
+
+list
+~~~~
+
+The only property of ``mcast-groups`` for ``genetlink``, holds the list
+of groups.
+
+Multicast group properties
+--------------------------
+
+name
+~~~~
+
+Uniquely identifies the multicast group in the family. Similarly to
+Family ID, Multicast Group ID needs to be resolved at runtime, based
+on the name.
+
+.. _attr_types:
+
+Attribute types
+===============
+
+This section describes the attribute types supported by the ``genetlink``
+compatibility level. Refer to documentation of different levels for additional
+attribute types.
+
+Scalar integer types
+--------------------
+
+Fixed-width integer types:
+``u8``, ``u16``, ``u32``, ``u64``, ``s8``, ``s16``, ``s32``, ``s64``.
+
+Note that types smaller than 32 bit should be avoided as using them
+does not save any memory in Netlink messages (due to alignment).
+See :ref:`pad_type` for padding of 64 bit attributes.
+
+The payload of the attribute is the integer in host order unless ``byte-order``
+specifies otherwise.
+
+.. _pad_type:
+
+pad
+---
+
+Special attribute type used for padding attributes which require alignment
+bigger than standard 4B alignment required by netlink (e.g. 64 bit integers).
+There can only be a single attribute of the ``pad`` type in any attribute set
+and it should be automatically used for padding when needed.
+
+flag
+----
+
+Attribute with no payload, its presence is the entire information.
+
+binary
+------
+
+Raw binary data attribute, the contents are opaque to generic code.
+
+string
+------
+
+Character string. Unless ``checks`` has ``unterminated-ok`` set to ``true``
+the string is required to be null terminated.
+``max-len`` in ``checks`` indicates the longest possible string,
+if not present the length of the string is unbounded.
+
+Note that ``max-len`` does not count the terminating character.
+
+nest
+----
+
+Attribute containing other (nested) attributes.
+``nested-attributes`` specifies which attribute set is used inside.