diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
commit | ace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch) | |
tree | b2d64bc10158fdd5497876388cd68142ca374ed3 /Documentation/userspace-api/netlink | |
parent | Initial commit. (diff) | |
download | linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip |
Adding upstream version 6.6.15.upstream/6.6.15
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/userspace-api/netlink')
-rw-r--r-- | Documentation/userspace-api/netlink/c-code-gen.rst | 107 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/genetlink-legacy.rst | 268 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/index.rst | 19 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/intro-specs.rst | 159 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/intro.rst | 683 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/netlink-raw.rst | 58 | ||||
-rw-r--r-- | Documentation/userspace-api/netlink/specs.rst | 458 |
7 files changed, 1752 insertions, 0 deletions
diff --git a/Documentation/userspace-api/netlink/c-code-gen.rst b/Documentation/userspace-api/netlink/c-code-gen.rst new file mode 100644 index 0000000000..89de42c133 --- /dev/null +++ b/Documentation/userspace-api/netlink/c-code-gen.rst @@ -0,0 +1,107 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +============================== +Netlink spec C code generation +============================== + +This document describes how Netlink specifications are used to render +C code (uAPI, policies etc.). It also defines the additional properties +allowed in older families by the ``genetlink-c`` protocol level, +to control the naming. + +For brevity this document refers to ``name`` properties of various +objects by the object type. For example ``$attr`` is the value +of ``name`` in an attribute, and ``$family`` is the name of the +family (the global ``name`` property). + +The upper case is used to denote literal values, e.g. ``$family-CMD`` +means the concatenation of ``$family``, a dash character, and the literal +``CMD``. + +The names of ``#defines`` and enum values are always converted to upper case, +and with dashes (``-``) replaced by underscores (``_``). + +If the constructed name is a C keyword, an extra underscore is +appended (``do`` -> ``do_``). + +Globals +======= + +``c-family-name`` controls the name of the ``#define`` for the family +name, default is ``$family-FAMILY-NAME``. + +``c-version-name`` controls the name of the ``#define`` for the version +of the family, default is ``$family-FAMILY-VERSION``. + +``max-by-define`` selects if max values for enums are defined as a +``#define`` rather than inside the enum. + +Definitions +=========== + +Constants +--------- + +Every constant is rendered as a ``#define``. +The name of the constant is ``$family-$constant`` and the value +is rendered as a string or integer according to its type in the spec. + +Enums and flags +--------------- + +Enums are named ``$family-$enum``. The full name can be set directly +or suppressed by specifying the ``enum-name`` property. +Default entry name is ``$family-$enum-$entry``. +If ``name-prefix`` is specified it replaces the ``$family-$enum`` +portion of the entry name. + +Boolean ``render-max`` controls creation of the max values +(which are enabled by default for attribute enums). + +Attributes +========== + +Each attribute set (excluding fractional sets) is rendered as an enum. + +Attribute enums are traditionally unnamed in netlink headers. +If naming is desired ``enum-name`` can be used to specify the name. + +The default attribute name prefix is ``$family-A`` if the name of the set +is the same as the name of the family and ``$family-A-$set`` if the names +differ. The prefix can be overridden by the ``name-prefix`` property of a set. +The rest of the section will refer to the prefix as ``$pfx``. + +Attributes are named ``$pfx-$attribute``. + +Attribute enums end with two special values ``__$pfx-MAX`` and ``$pfx-MAX`` +which are used for sizing attribute tables. +These two names can be specified directly with the ``attr-cnt-name`` +and ``attr-max-name`` properties respectively. + +If ``max-by-define`` is set to ``true`` at the global level ``attr-max-name`` +will be specified as a ``#define`` rather than an enum value. + +Operations +========== + +Operations are named ``$family-CMD-$operation``. +If ``name-prefix`` is specified it replaces the ``$family-CMD`` +portion of the name. + +Similarly to attribute enums operation enums end with special count and max +attributes. For operations those attributes can be renamed with +``cmd-cnt-name`` and ``cmd-max-name``. Max will be a define if ``max-by-define`` +is ``true``. + +Multicast groups +================ + +Each multicast group gets a define rendered into the kernel uAPI header. +The name of the define is ``$family-MCGRP-$group``, and can be overwritten +with the ``c-define-name`` property. + +Code generation +=============== + +uAPI header is assumed to come from ``<linux/$family.h>`` in the default header +search path. It can be changed using the ``uapi-header`` global property. diff --git a/Documentation/userspace-api/netlink/genetlink-legacy.rst b/Documentation/userspace-api/netlink/genetlink-legacy.rst new file mode 100644 index 0000000000..40b82ad5d5 --- /dev/null +++ b/Documentation/userspace-api/netlink/genetlink-legacy.rst @@ -0,0 +1,268 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +================================================================= +Netlink specification support for legacy Generic Netlink families +================================================================= + +This document describes the many additional quirks and properties +required to describe older Generic Netlink families which form +the ``genetlink-legacy`` protocol level. + +Specification +============= + +Attribute type nests +-------------------- + +New Netlink families should use ``multi-attr`` to define arrays. +Older families (e.g. ``genetlink`` control family) attempted to +define array types reusing attribute type to carry information. + +For reference the ``multi-attr`` array may look like this:: + + [ARRAY-ATTR] + [INDEX (optionally)] + [MEMBER1] + [MEMBER2] + [SOME-OTHER-ATTR] + [ARRAY-ATTR] + [INDEX (optionally)] + [MEMBER1] + [MEMBER2] + +where ``ARRAY-ATTR`` is the array entry type. + +array-nest +~~~~~~~~~~ + +``array-nest`` creates the following structure:: + + [SOME-OTHER-ATTR] + [ARRAY-ATTR] + [ENTRY] + [MEMBER1] + [MEMBER2] + [ENTRY] + [MEMBER1] + [MEMBER2] + +It wraps the entire array in an extra attribute (hence limiting its size +to 64kB). The ``ENTRY`` nests are special and have the index of the entry +as their type instead of normal attribute type. + +type-value +~~~~~~~~~~ + +``type-value`` is a construct which uses attribute types to carry +information about a single object (often used when array is dumped +entry-by-entry). + +``type-value`` can have multiple levels of nesting, for example +genetlink's policy dumps create the following structures:: + + [POLICY-IDX] + [ATTR-IDX] + [POLICY-INFO-ATTR1] + [POLICY-INFO-ATTR2] + +Where the first level of nest has the policy index as it's attribute +type, it contains a single nest which has the attribute index as its +type. Inside the attr-index nest are the policy attributes. Modern +Netlink families should have instead defined this as a flat structure, +the nesting serves no good purpose here. + +Operations +========== + +Enum (message ID) model +----------------------- + +unified +~~~~~~~ + +Modern families use the ``unified`` message ID model, which uses +a single enumeration for all messages within family. Requests and +responses share the same message ID. Notifications have separate +IDs from the same space. For example given the following list +of operations: + +.. code-block:: yaml + + - + name: a + value: 1 + do: ... + - + name: b + do: ... + - + name: c + value: 4 + notify: a + - + name: d + do: ... + +Requests and responses for operation ``a`` will have the ID of 1, +the requests and responses of ``b`` - 2 (since there is no explicit +``value`` it's previous operation ``+ 1``). Notification ``c`` will +use the ID of 4, operation ``d`` 5 etc. + +directional +~~~~~~~~~~~ + +The ``directional`` model splits the ID assignment by the direction of +the message. Messages from and to the kernel can't be confused with +each other so this conserves the ID space (at the cost of making +the programming more cumbersome). + +In this case ``value`` attribute should be specified in the ``request`` +``reply`` sections of the operations (if an operation has both ``do`` +and ``dump`` the IDs are shared, ``value`` should be set in ``do``). +For notifications the ``value`` is provided at the op level but it +only allocates a ``reply`` (i.e. a "from-kernel" ID). Let's look +at an example: + +.. code-block:: yaml + + - + name: a + do: + request: + value: 2 + attributes: ... + reply: + value: 1 + attributes: ... + - + name: b + notify: a + - + name: c + notify: a + value: 7 + - + name: d + do: ... + +In this case ``a`` will use 2 when sending the message to the kernel +and expects message with ID 1 in response. Notification ``b`` allocates +a "from-kernel" ID which is 2. ``c`` allocates "from-kernel" ID of 7. +If operation ``d`` does not set ``values`` explicitly in the spec +it will be allocated 3 for the request (``a`` is the previous operation +with a request section and the value of 2) and 8 for response (``c`` is +the previous operation in the "from-kernel" direction). + +Other quirks +============ + +Structures +---------- + +Legacy families can define C structures both to be used as the contents of +an attribute and as a fixed message header. Structures are defined in +``definitions`` and referenced in operations or attributes. + +members +~~~~~~~ + + - ``name`` - The attribute name of the struct member + - ``type`` - One of the scalar types ``u8``, ``u16``, ``u32``, ``u64``, ``s8``, + ``s16``, ``s32``, ``s64``, ``string`` or ``binary``. + - ``byte-order`` - ``big-endian`` or ``little-endian`` + - ``doc``, ``enum``, ``enum-as-flags``, ``display-hint`` - Same as for + :ref:`attribute definitions <attribute_properties>` + +Note that structures defined in YAML are implicitly packed according to C +conventions. For example, the following struct is 4 bytes, not 6 bytes: + +.. code-block:: c + + struct { + u8 a; + u16 b; + u8 c; + } + +Any padding must be explicitly added and C-like languages should infer the +need for explicit padding from whether the members are naturally aligned. + +Here is the struct definition from above, declared in YAML: + +.. code-block:: yaml + + definitions: + - + name: message-header + type: struct + members: + - + name: a + type: u8 + - + name: b + type: u16 + - + name: c + type: u8 + +Fixed Headers +~~~~~~~~~~~~~ + +Fixed message headers can be added to operations using ``fixed-header``. +The default ``fixed-header`` can be set in ``operations`` and it can be set +or overridden for each operation. + +.. code-block:: yaml + + operations: + fixed-header: message-header + list: + - + name: get + fixed-header: custom-header + attribute-set: message-attrs + +Attributes +~~~~~~~~~~ + +A ``binary`` attribute can be interpreted as a C structure using a +``struct`` property with the name of the structure definition. The +``struct`` property implies ``sub-type: struct`` so it is not necessary to +specify a sub-type. + +.. code-block:: yaml + + attribute-sets: + - + name: stats-attrs + attributes: + - + name: stats + type: binary + struct: vport-stats + +C Arrays +-------- + +Legacy families also use ``binary`` attributes to encapsulate C arrays. The +``sub-type`` is used to identify the type of scalar to extract. + +.. code-block:: yaml + + attributes: + - + name: ports + type: binary + sub-type: u32 + +Multi-message DO +---------------- + +New Netlink families should never respond to a DO operation with multiple +replies, with ``NLM_F_MULTI`` set. Use a filtered dump instead. + +At the spec level we can define a ``dumps`` property for the ``do``, +perhaps with values of ``combine`` and ``multi-object`` depending +on how the parsing should be implemented (parse into a single reply +vs list of objects i.e. pretty much a dump). diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst new file mode 100644 index 0000000000..62725dafbb --- /dev/null +++ b/Documentation/userspace-api/netlink/index.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +================ +Netlink Handbook +================ + +Netlink documentation for users. + +.. toctree:: + :maxdepth: 2 + + intro + intro-specs + specs + c-code-gen + genetlink-legacy + netlink-raw + +See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>`. diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst new file mode 100644 index 0000000000..bada896994 --- /dev/null +++ b/Documentation/userspace-api/netlink/intro-specs.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +===================================== +Using Netlink protocol specifications +===================================== + +This document is a quick starting guide for using Netlink protocol +specifications. For more detailed description of the specs see :doc:`specs`. + +Simple CLI +========== + +Kernel comes with a simple CLI tool which should be useful when +developing Netlink related code. The tool is implemented in Python +and can use a YAML specification to issue Netlink requests +to the kernel. Only Generic Netlink is supported. + +The tool is located at ``tools/net/ynl/cli.py``. It accepts +a handul of arguments, the most important ones are: + + - ``--spec`` - point to the spec file + - ``--do $name`` / ``--dump $name`` - issue request ``$name`` + - ``--json $attrs`` - provide attributes for the request + - ``--subscribe $group`` - receive notifications from ``$group`` + +YAML specs can be found under ``Documentation/netlink/specs/``. + +Example use:: + + $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \ + --do rings-get \ + --json '{"header":{"dev-index": 18}}' + {'header': {'dev-index': 18, 'dev-name': 'eni1np1'}, + 'rx': 0, + 'rx-jumbo': 0, + 'rx-jumbo-max': 4096, + 'rx-max': 4096, + 'rx-mini': 0, + 'rx-mini-max': 4096, + 'tx': 0, + 'tx-max': 4096, + 'tx-push': 0} + +The input arguments are parsed as JSON, while the output is only +Python-pretty-printed. This is because some Netlink types can't +be expressed as JSON directly. If such attributes are needed in +the input some hacking of the script will be necessary. + +The spec and Netlink internals are factored out as a standalone +library - it should be easy to write Python tools / tests reusing +code from ``cli.py``. + +Generating kernel code +====================== + +``tools/net/ynl/ynl-regen.sh`` scans the kernel tree in search of +auto-generated files which need to be updated. Using this tool is the easiest +way to generate / update auto-generated code. + +By default code is re-generated only if spec is newer than the source, +to force regeneration use ``-f``. + +``ynl-regen.sh`` searches for ``YNL-GEN`` in the contents of files +(note that it only scans files in the git index, that is only files +tracked by git!) For instance the ``fou_nl.c`` kernel source contains:: + + /* Documentation/netlink/specs/fou.yaml */ + /* YNL-GEN kernel source */ + +``ynl-regen.sh`` will find this marker and replace the file with +kernel source based on fou.yaml. + +The simplest way to generate a new file based on a spec is to add +the two marker lines like above to a file, add that file to git, +and run the regeneration tool. Grep the tree for ``YNL-GEN`` +to see other examples. + +The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py`` +but it takes a few arguments so calling it directly for each file +quickly becomes tedious. + +YNL lib +======= + +``tools/net/ynl/lib/`` contains an implementation of a C library +(based on libmnl) which integrates with code generated by +``tools/net/ynl/ynl-gen-c.py`` to create easy to use netlink wrappers. + +YNL basics +---------- + +The YNL library consists of two parts - the generic code (functions +prefix by ``ynl_``) and per-family auto-generated code (prefixed +with the name of the family). + +To create a YNL socket call ynl_sock_create() passing the family +struct (family structs are exported by the auto-generated code). +ynl_sock_destroy() closes the socket. + +YNL requests +------------ + +Steps for issuing YNL requests are best explained on an example. +All the functions and types in this example come from the auto-generated +code (for the netdev family in this case): + +.. code-block:: c + + // 0. Request and response pointers + struct netdev_dev_get_req *req; + struct netdev_dev_get_rsp *d; + + // 1. Allocate a request + req = netdev_dev_get_req_alloc(); + // 2. Set request parameters (as needed) + netdev_dev_get_req_set_ifindex(req, ifindex); + + // 3. Issues the request + d = netdev_dev_get(ys, req); + // 4. Free the request arguments + netdev_dev_get_req_free(req); + // 5. Error check (the return value from step 3) + if (!d) { + // 6. Print the YNL-generated error + fprintf(stderr, "YNL: %s\n", ys->err.msg); + return -1; + } + + // ... do stuff with the response @d + + // 7. Free response + netdev_dev_get_rsp_free(d); + +YNL dumps +--------- + +Performing dumps follows similar pattern as requests. +Dumps return a list of objects terminated by a special marker, +or NULL on error. Use ``ynl_dump_foreach()`` to iterate over +the result. + +YNL notifications +----------------- + +YNL lib supports using the same socket for notifications and +requests. In case notifications arrive during processing of a request +they are queued internally and can be retrieved at a later time. + +To subscribed to notifications use ``ynl_subscribe()``. +The notifications have to be read out from the socket, +``ynl_socket_get_fd()`` returns the underlying socket fd which can +be plugged into appropriate asynchronous IO API like ``poll``, +or ``select``. + +Notifications can be retrieved using ``ynl_ntf_dequeue()`` and have +to be freed using ``ynl_ntf_free()``. Since we don't know the notification +type upfront the notifications are returned as ``struct ynl_ntf_base_type *`` +and user is expected to cast them to the appropriate full type based +on the ``cmd`` member. diff --git a/Documentation/userspace-api/netlink/intro.rst b/Documentation/userspace-api/netlink/intro.rst new file mode 100644 index 0000000000..7b1d401210 --- /dev/null +++ b/Documentation/userspace-api/netlink/intro.rst @@ -0,0 +1,683 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +======================= +Introduction to Netlink +======================= + +Netlink is often described as an ioctl() replacement. +It aims to replace fixed-format C structures as supplied +to ioctl() with a format which allows an easy way to add +or extended the arguments. + +To achieve this Netlink uses a minimal fixed-format metadata header +followed by multiple attributes in the TLV (type, length, value) format. + +Unfortunately the protocol has evolved over the years, in an organic +and undocumented fashion, making it hard to coherently explain. +To make the most practical sense this document starts by describing +netlink as it is used today and dives into more "historical" uses +in later sections. + +Opening a socket +================ + +Netlink communication happens over sockets, a socket needs to be +opened first: + +.. code-block:: c + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + +The use of sockets allows for a natural way of exchanging information +in both directions (to and from the kernel). The operations are still +performed synchronously when applications send() the request but +a separate recv() system call is needed to read the reply. + +A very simplified flow of a Netlink "call" will therefore look +something like: + +.. code-block:: c + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + + /* format the request */ + send(fd, &request, sizeof(request)); + n = recv(fd, &response, RSP_BUFFER_SIZE); + /* interpret the response */ + +Netlink also provides natural support for "dumping", i.e. communicating +to user space all objects of a certain type (e.g. dumping all network +interfaces). + +.. code-block:: c + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + + /* format the dump request */ + send(fd, &request, sizeof(request)); + while (1) { + n = recv(fd, &buffer, RSP_BUFFER_SIZE); + /* one recv() call can read multiple messages, hence the loop below */ + for (nl_msg in buffer) { + if (nl_msg.nlmsg_type == NLMSG_DONE) + goto dump_finished; + /* process the object */ + } + } + dump_finished: + +The first two arguments of the socket() call require little explanation - +it is opening a Netlink socket, with all headers provided by the user +(hence NETLINK, RAW). The last argument is the protocol within Netlink. +This field used to identify the subsystem with which the socket will +communicate. + +Classic vs Generic Netlink +-------------------------- + +Initial implementation of Netlink depended on a static allocation +of IDs to subsystems and provided little supporting infrastructure. +Let us refer to those protocols collectively as **Classic Netlink**. +The list of them is defined on top of the ``include/uapi/linux/netlink.h`` +file, they include among others - general networking (NETLINK_ROUTE), +iSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT). + +**Generic Netlink** (introduced in 2005) allows for dynamic registration of +subsystems (and subsystem ID allocation), introspection and simplifies +implementing the kernel side of the interface. + +The following section describes how to use Generic Netlink, as the +number of subsystems using Generic Netlink outnumbers the older +protocols by an order of magnitude. There are also no plans for adding +more Classic Netlink protocols to the kernel. +Basic information on how communicating with core networking parts of +the Linux kernel (or another of the 20 subsystems using Classic +Netlink) differs from Generic Netlink is provided later in this document. + +Generic Netlink +=============== + +In addition to the Netlink fixed metadata header each Netlink protocol +defines its own fixed metadata header. (Similarly to how network +headers stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.) + +A Netlink message always starts with struct nlmsghdr, which is followed +by a protocol-specific header. In case of Generic Netlink the protocol +header is struct genlmsghdr. + +The practical meaning of the fields in case of Generic Netlink is as follows: + +.. code-block:: c + + struct nlmsghdr { + __u32 nlmsg_len; /* Length of message including headers */ + __u16 nlmsg_type; /* Generic Netlink Family (subsystem) ID */ + __u16 nlmsg_flags; /* Flags - request or dump */ + __u32 nlmsg_seq; /* Sequence number */ + __u32 nlmsg_pid; /* Port ID, set to 0 */ + }; + struct genlmsghdr { + __u8 cmd; /* Command, as defined by the Family */ + __u8 version; /* Irrelevant, set to 1 */ + __u16 reserved; /* Reserved, set to 0 */ + }; + /* TLV attributes follow... */ + +In Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify +which operation within the subsystem the message was referring to +(e.g. get information about a netdev). Generic Netlink needs to mux +multiple subsystems in a single protocol so it uses this field to +identify the subsystem, and :c:member:`genlmsghdr.cmd` identifies +the operation instead. (See :ref:`res_fam` for +information on how to find the Family ID of the subsystem of interest.) +Note that the first 16 values (0 - 15) of this field are reserved for +control messages both in Classic Netlink and Generic Netlink. +See :ref:`nl_msg_type` for more details. + +There are 3 usual types of message exchanges on a Netlink socket: + + - performing a single action (``do``); + - dumping information (``dump``); + - getting asynchronous notifications (``multicast``). + +Classic Netlink is very flexible and presumably allows other types +of exchanges to happen, but in practice those are the three that get +used. + +Asynchronous notifications are sent by the kernel and received by +the user sockets which subscribed to them. ``do`` and ``dump`` requests +are initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should +be set as follows: + + - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK`` + - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP`` + +:c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically +increasing value. The value gets echoed back in responses and doesn't +matter in practice, but setting it to an increasing value for each +message sent is considered good hygiene. The purpose of the field is +matching responses to requests. Asynchronous notifications will have +:c:member:`nlmsghdr.nlmsg_seq` of ``0``. + +:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. +This field can be set to ``0`` when talking to the kernel. +See :ref:`nlmsg_pid` for the (uncommon) uses of the field. + +The expected use for :c:member:`genlmsghdr.version` was to allow +versioning of the APIs provided by the subsystems. No subsystem to +date made significant use of this field, so setting it to ``1`` seems +like a safe bet. + +.. _nl_msg_type: + +Netlink message types +--------------------- + +As previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries +protocol specific values but the first 16 identifiers are reserved +(first subsystem specific message type should be equal to +``NLMSG_MIN_TYPE`` which is ``0x10``). + +There are only 4 Netlink control messages defined: + + - ``NLMSG_NOOP`` - ignore the message, not used in practice; + - ``NLMSG_ERROR`` - carries the return code of an operation; + - ``NLMSG_DONE`` - marks the end of a dump; + - ``NLMSG_OVERRUN`` - socket buffer has overflown, not used to date. + +``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance. +They carry return codes for operations. Note that unless +the ``NLM_F_ACK`` flag is set on the request Netlink will not respond +with ``NLMSG_ERROR`` if there is no error. To avoid having to special-case +this quirk it is recommended to always set ``NLM_F_ACK``. + +The format of ``NLMSG_ERROR`` is described by struct nlmsgerr:: + + ---------------------------------------------- + | struct nlmsghdr - response header | + ---------------------------------------------- + | int error | + ---------------------------------------------- + | struct nlmsghdr - original request header | + ---------------------------------------------- + | ** optionally (1) payload of the request | + ---------------------------------------------- + | ** optionally (2) extended ACK | + ---------------------------------------------- + +There are two instances of struct nlmsghdr here, first of the response +and second of the request. ``NLMSG_ERROR`` carries the information about +the request which led to the error. This could be useful when trying +to match requests to responses or re-parse the request to dump it into +logs. + +The payload of the request is not echoed in messages reporting success +(``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set. +The latter is common +and perhaps recommended as having to read a copy of every request back +from the kernel is rather wasteful. The absence of request payload +is indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`. + +The second optional element of ``NLMSG_ERROR`` are the extended ACK +attributes. See :ref:`ext_ack` for more details. The presence +of extended ACK is indicated by ``NLM_F_ACK_TLVS`` in +:c:member:`nlmsghdr.nlmsg_flags`. + +``NLMSG_DONE`` is simpler, the request is never echoed but the extended +ACK attributes may be present:: + + ---------------------------------------------- + | struct nlmsghdr - response header | + ---------------------------------------------- + | int error | + ---------------------------------------------- + | ** optionally extended ACK | + ---------------------------------------------- + +.. _res_fam: + +Resolving the Family ID +----------------------- + +This section explains how to find the Family ID of a subsystem. +It also serves as an example of Generic Netlink communication. + +Generic Netlink is itself a subsystem exposed via the Generic Netlink API. +To avoid a circular dependency Generic Netlink has a statically allocated +Family ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``). +The Generic Netlink family implements a command used to find out information +about other families (``CTRL_CMD_GETFAMILY``). + +To get information about the Generic Netlink family named for example +``"test1"`` we need to send a message on the previously opened Generic Netlink +socket. The message should target the Generic Netlink Family (1), be a +``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this +call would make the kernel respond with information about *all* the families +it knows about. Last but not least the name of the family in question has +to be specified (4) as an attribute with the appropriate type:: + + struct nlmsghdr: + __u32 nlmsg_len: 32 + __u16 nlmsg_type: GENL_ID_CTRL // (1) + __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK // (2) + __u32 nlmsg_seq: 1 + __u32 nlmsg_pid: 0 + + struct genlmsghdr: + __u8 cmd: CTRL_CMD_GETFAMILY // (3) + __u8 version: 2 /* or 1, doesn't matter */ + __u16 reserved: 0 + + struct nlattr: // (4) + __u16 nla_len: 10 + __u16 nla_type: CTRL_ATTR_FAMILY_NAME + char data: test1\0 + + (padding:) + char data: \0\0 + +The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len` +and :c:member:`nlattr.nla_len`) always *include* the header. +Attribute headers in netlink must be aligned to 4 bytes from the start +of the message, hence the extra ``\0\0`` after ``CTRL_ATTR_FAMILY_NAME``. +The attribute lengths *exclude* the padding. + +If the family is found kernel will reply with two messages, the response +with all the information about the family:: + + /* Message #1 - reply */ + struct nlmsghdr: + __u32 nlmsg_len: 136 + __u16 nlmsg_type: GENL_ID_CTRL + __u16 nlmsg_flags: 0 + __u32 nlmsg_seq: 1 /* echoed from our request */ + __u32 nlmsg_pid: 5831 /* The PID of our user space process */ + + struct genlmsghdr: + __u8 cmd: CTRL_CMD_GETFAMILY + __u8 version: 2 + __u16 reserved: 0 + + struct nlattr: + __u16 nla_len: 10 + __u16 nla_type: CTRL_ATTR_FAMILY_NAME + char data: test1\0 + + (padding:) + data: \0\0 + + struct nlattr: + __u16 nla_len: 6 + __u16 nla_type: CTRL_ATTR_FAMILY_ID + __u16: 123 /* The Family ID we are after */ + + (padding:) + char data: \0\0 + + struct nlattr: + __u16 nla_len: 9 + __u16 nla_type: CTRL_ATTR_FAMILY_VERSION + __u16: 1 + + /* ... etc, more attributes will follow. */ + +And the error code (success) since ``NLM_F_ACK`` had been set on the request:: + + /* Message #2 - the ACK */ + struct nlmsghdr: + __u32 nlmsg_len: 36 + __u16 nlmsg_type: NLMSG_ERROR + __u16 nlmsg_flags: NLM_F_CAPPED /* There won't be a payload */ + __u32 nlmsg_seq: 1 /* echoed from our request */ + __u32 nlmsg_pid: 5831 /* The PID of our user space process */ + + int error: 0 + + struct nlmsghdr: /* Copy of the request header as we sent it */ + __u32 nlmsg_len: 32 + __u16 nlmsg_type: GENL_ID_CTRL + __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK + __u32 nlmsg_seq: 1 + __u32 nlmsg_pid: 0 + +The order of attributes (struct nlattr) is not guaranteed so the user +has to walk the attributes and parse them. + +Note that Generic Netlink sockets are not associated or bound to a single +family. A socket can be used to exchange messages with many different +families, selecting the recipient family on message-by-message basis using +the :c:member:`nlmsghdr.nlmsg_type` field. + +.. _ext_ack: + +Extended ACK +------------ + +Extended ACK controls reporting of additional error/warning TLVs +in ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward +compatibility this feature has to be explicitly enabled by setting +the ``NETLINK_EXT_ACK`` setsockopt() to ``1``. + +Types of extended ack attributes are defined in enum nlmsgerr_attrs. +The most commonly used attributes are ``NLMSGERR_ATTR_MSG``, +``NLMSGERR_ATTR_OFFS`` and ``NLMSGERR_ATTR_MISS_*``. + +``NLMSGERR_ATTR_MSG`` carries a message in English describing +the encountered problem. These messages are far more detailed +than what can be expressed thru standard UNIX error codes. + +``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem. + +``NLMSGERR_ATTR_MISS_TYPE`` and ``NLMSGERR_ATTR_MISS_NEST`` +inform about a missing attribute. + +Extended ACKs can be reported on errors as well as in case of success. +The latter should be treated as a warning. + +Extended ACKs greatly improve the usability of Netlink and should +always be enabled, appropriately parsed and reported to the user. + +Advanced topics +=============== + +Dump consistency +---------------- + +Some of the data structures kernel uses for storing objects make +it hard to provide an atomic snapshot of all the objects in a dump +(without impacting the fast-paths updating them). + +Kernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump +(including the ``NLMSG_DONE`` message) if the dump was interrupted and +may be inconsistent (e.g. missing objects). User space should retry +the dump if it sees the flag set. + +Introspection +------------- + +The basic introspection abilities are enabled by access to the Family +object as reported in :ref:`res_fam`. User can query information about +the Generic Netlink family, including which operations are supported +by the kernel and what attributes the kernel understands. +Family information includes the highest ID of an attribute kernel can parse, +a separate command (``CTRL_CMD_GETPOLICY``) provides detailed information +about supported attributes, including ranges of values the kernel accepts. + +Querying family information is useful in cases when user space needs +to make sure that the kernel has support for a feature before issuing +a request. + +.. _nlmsg_pid: + +nlmsg_pid +--------- + +:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. +It is referred to as Port ID, sometimes Process ID because for historical +reasons if the application does not select (bind() to) an explicit Port ID +kernel will automatically assign it the ID equal to its Process ID +(as reported by the getpid() system call). + +Similarly to the bind() semantics of the TCP/IP network protocols the value +of zero means "assign automatically", hence it is common for applications +to leave the :c:member:`nlmsghdr.nlmsg_pid` field initialized to ``0``. + +The field is still used today in rare cases when kernel needs to send +a unicast notification. User space application can use bind() to associate +its socket with a specific PID, it then communicates its PID to the kernel. +This way the kernel can reach the specific user space process. + +This sort of communication is utilized in UMH (User Mode Helper)-like +scenarios when kernel needs to trigger user space processing or ask user +space for a policy decision. + +Multicast notifications +----------------------- + +One of the strengths of Netlink is the ability to send event notifications +to user space. This is a unidirectional form of communication (kernel -> +user) and does not involve any control messages like ``NLMSG_ERROR`` or +``NLMSG_DONE``. + +For example the Generic Netlink family itself defines a set of multicast +notifications about registered families. When a new family is added the +sockets subscribed to the notifications will get the following message:: + + struct nlmsghdr: + __u32 nlmsg_len: 136 + __u16 nlmsg_type: GENL_ID_CTRL + __u16 nlmsg_flags: 0 + __u32 nlmsg_seq: 0 + __u32 nlmsg_pid: 0 + + struct genlmsghdr: + __u8 cmd: CTRL_CMD_NEWFAMILY + __u8 version: 2 + __u16 reserved: 0 + + struct nlattr: + __u16 nla_len: 10 + __u16 nla_type: CTRL_ATTR_FAMILY_NAME + char data: test1\0 + + (padding:) + data: \0\0 + + struct nlattr: + __u16 nla_len: 6 + __u16 nla_type: CTRL_ATTR_FAMILY_ID + __u16: 123 /* The Family ID we are after */ + + (padding:) + char data: \0\0 + + struct nlattr: + __u16 nla_len: 9 + __u16 nla_type: CTRL_ATTR_FAMILY_VERSION + __u16: 1 + + /* ... etc, more attributes will follow. */ + +The notification contains the same information as the response +to the ``CTRL_CMD_GETFAMILY`` request. + +The Netlink headers of the notification are mostly 0 and irrelevant. +The :c:member:`nlmsghdr.nlmsg_seq` may be either zero or a monotonically +increasing notification sequence number maintained by the family. + +To receive notifications the user socket must subscribe to the relevant +notification group. Much like the Family ID, the Group ID for a given +multicast group is dynamic and can be found inside the Family information. +The ``CTRL_ATTR_MCAST_GROUPS`` attribute contains nests with names +(``CTRL_ATTR_MCAST_GRP_NAME``) and IDs (``CTRL_ATTR_MCAST_GRP_ID``) of +the groups family. + +Once the Group ID is known a setsockopt() call adds the socket to the group: + +.. code-block:: c + + unsigned int group_id; + + /* .. find the group ID... */ + + setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, + &group_id, sizeof(group_id)); + +The socket will now receive notifications. + +It is recommended to use separate sockets for receiving notifications +and sending requests to the kernel. The asynchronous nature of notifications +means that they may get mixed in with the responses making the message +handling much harder. + +Buffer sizing +------------- + +Netlink sockets are datagram sockets rather than stream sockets, +meaning that each message must be received in its entirety by a single +recv()/recvmsg() system call. If the buffer provided by the user is too +short, the message will be truncated and the ``MSG_TRUNC`` flag set +in struct msghdr (struct msghdr is the second argument +of the recvmsg() system call, *not* a Netlink header). + +Upon truncation the remaining part of the message is discarded. + +Netlink expects that the user buffer will be at least 8kB or a page +size of the CPU architecture, whichever is bigger. Particular Netlink +families may, however, require a larger buffer. 32kB buffer is recommended +for most efficient handling of dumps (larger buffer fits more dumped +objects and therefore fewer recvmsg() calls are needed). + +.. _classic_netlink: + +Classic Netlink +=============== + +The main differences between Classic and Generic Netlink are the dynamic +allocation of subsystem identifiers and availability of introspection. +In theory the protocol does not differ significantly, however, in practice +Classic Netlink experimented with concepts which were abandoned in Generic +Netlink (really, they usually only found use in a small corner of a single +subsystem). This section is meant as an explainer of a few of such concepts, +with the explicit goal of giving the Generic Netlink +users the confidence to ignore them when reading the uAPI headers. + +Most of the concepts and examples here refer to the ``NETLINK_ROUTE`` family, +which covers much of the configuration of the Linux networking stack. +Real documentation of that family, deserves a chapter (or a book) of its own. + +Families +-------- + +Netlink refers to subsystems as families. This is a remnant of using +sockets and the concept of protocol families, which are part of message +demultiplexing in ``NETLINK_ROUTE``. + +Sadly every layer of encapsulation likes to refer to whatever it's carrying +as "families" making the term very confusing: + + 1. AF_NETLINK is a bona fide socket protocol family + 2. AF_NETLINK's documentation refers to what comes after its own + header (struct nlmsghdr) in a message as a "Family Header" + 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows + struct nlmsghdr), yet it also calls its users "Families". + +Note that the Generic Netlink Family IDs are in a different "ID space" +and overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO`` +has the Classic Netlink protocol ID of 21 which Generic Netlink will +happily allocate to one of its families as well). + +Strict checking +--------------- + +The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking +in ``NETLINK_ROUTE``. It was needed because historically kernel did not +validate the fields of structures it didn't process. This made it impossible +to start using those fields later without risking regressions in applications +which initialized them incorrectly or not at all. + +``NETLINK_GET_STRICT_CHK`` declares that the application is initializing +all fields correctly. It also opts into validating that message does not +contain trailing data and requests that kernel rejects attributes with +type higher than largest attribute type known to the kernel. + +``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``. + +Unknown attributes +------------------ + +Historically Netlink ignored all unknown attributes. The thinking was that +it would free the application from having to probe what kernel supports. +The application could make a request to change the state and check which +parts of the request "stuck". + +This is no longer the case for new Generic Netlink families and those opting +in to strict checking. See enum netlink_validation for validation types +performed. + +Fixed metadata and structures +----------------------------- + +Classic Netlink made liberal use of fixed-format structures within +the messages. Messages would commonly have a structure with +a considerable number of fields after struct nlmsghdr. It was also +common to put structures with multiple members inside attributes, +without breaking each member into an attribute of its own. + +This has caused problems with validation and extensibility and +therefore using binary structures is actively discouraged for new +attributes. + +Request types +------------- + +``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``, +and ``SET``. Each object can handle all or some of those requests +(objects being netdevs, routes, addresses, qdiscs etc.) Request type +is defined by the 2 lowest bits of the message type, so commands for +new objects would always be allocated with a stride of 4. + +Each object would also have its own fixed metadata shared by all request +types (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address +requests, struct tcmsg for qdisc requests). + +Even though other protocols and Generic Netlink commands often use +the same verbs in their message names (``GET``, ``SET``) the concept +of request types did not find wider adoption. + +Notification echo +----------------- + +``NLM_F_ECHO`` requests for notifications resulting from the request +to be queued onto the requesting socket. This is useful to discover +the impact of the request. + +Note that this feature is not universally implemented. + +Other request-type-specific flags +--------------------------------- + +Classic Netlink defined various flags for its ``GET``, ``NEW`` +and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr. +Since request types have not been generalized the request type specific +flags are rarely used (and considered deprecated for new families). + +For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into +``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used. + +For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK`` +only by FDB some operations. + +The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately, +the meaning is not crystal clear. The following description is based on the +best guess of the intention of the authors, and in practice all families +stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace +an existing object, if no matching object exists the operation should fail. +``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already +existed. +``NLM_F_CREATE`` asks for the object to be created if it does not +exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``. + +A comment in the main Netlink uAPI header states:: + + 4.4BSD ADD NLM_F_CREATE|NLM_F_EXCL + 4.4BSD CHANGE NLM_F_REPLACE + + True CHANGE NLM_F_CREATE|NLM_F_REPLACE + Append NLM_F_CREATE + Check NLM_F_EXCL + +which seems to indicate that those flags predate request types. +``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead +of ``SET`` commands. +``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists +without creating it, presumably predating ``GET`` commands. + +``NLM_F_APPEND`` indicates that if one key can have multiple objects associated +with it (e.g. multiple next-hop objects for a route) the new object should be +added to the list rather than replacing the entire list. + +uAPI reference +============== + +.. kernel-doc:: include/uapi/linux/netlink.h diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst new file mode 100644 index 0000000000..f07fb9b9c1 --- /dev/null +++ b/Documentation/userspace-api/netlink/netlink-raw.rst @@ -0,0 +1,58 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +====================================================== +Netlink specification support for raw Netlink families +====================================================== + +This document describes the additional properties required by raw Netlink +families such as ``NETLINK_ROUTE`` which use the ``netlink-raw`` protocol +specification. + +Specification +============= + +The netlink-raw schema extends the :doc:`genetlink-legacy <genetlink-legacy>` +schema with properties that are needed to specify the protocol numbers and +multicast IDs used by raw netlink families. See :ref:`classic_netlink` for more +information. + +Globals +------- + +protonum +~~~~~~~~ + +The ``protonum`` property is used to specify the protocol number to use when +opening a netlink socket. + +.. code-block:: yaml + + # SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + + name: rt-addr + protocol: netlink-raw + protonum: 0 # part of the NETLINK_ROUTE protocol + + +Multicast group properties +-------------------------- + +value +~~~~~ + +The ``value`` property is used to specify the group ID to use for multicast +group registration. + +.. code-block:: yaml + + mcast-groups: + list: + - + name: rtnlgrp-ipv4-ifaddr + value: 5 + - + name: rtnlgrp-ipv6-ifaddr + value: 9 + - + name: rtnlgrp-mctp-ifaddr + value: 34 diff --git a/Documentation/userspace-api/netlink/specs.rst b/Documentation/userspace-api/netlink/specs.rst new file mode 100644 index 0000000000..cc4e243099 --- /dev/null +++ b/Documentation/userspace-api/netlink/specs.rst @@ -0,0 +1,458 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +========================================= +Netlink protocol specifications (in YAML) +========================================= + +Netlink protocol specifications are complete, machine readable descriptions of +Netlink protocols written in YAML. The goal of the specifications is to allow +separating Netlink parsing from user space logic and minimize the amount of +hand written Netlink code for each new family, command, attribute. +Netlink specs should be complete and not depend on any other spec +or C header file, making it easy to use in languages which can't include +kernel headers directly. + +Internally kernel uses the YAML specs to generate: + + - the C uAPI header + - documentation of the protocol as a ReST file + - policy tables for input attribute validation + - operation tables + +YAML specifications can be found under ``Documentation/netlink/specs/`` + +This document describes details of the schema. +See :doc:`intro-specs` for a practical starting guide. + +All specs must be licensed under +``((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)`` +to allow for easy adoption in user space code. + +Compatibility levels +==================== + +There are four schema levels for Netlink specs, from the simplest used +by new families to the most complex covering all the quirks of the old ones. +Each next level inherits the attributes of the previous level, meaning that +user capable of parsing more complex ``genetlink`` schemas is also compatible +with simpler ones. The levels are: + + - ``genetlink`` - most streamlined, should be used by all new families + - ``genetlink-c`` - superset of ``genetlink`` with extra attributes allowing + customization of define and enum type and value names; this schema should + be equivalent to ``genetlink`` for all implementations which don't interact + directly with C uAPI headers + - ``genetlink-legacy`` - Generic Netlink catch all schema supporting quirks of + all old genetlink families, strange attribute formats, binary structures etc. + - ``netlink-raw`` - catch all schema supporting pre-Generic Netlink protocols + such as ``NETLINK_ROUTE`` + +The definition of the schemas (in ``jsonschema``) can be found +under ``Documentation/netlink/``. + +Schema structure +================ + +YAML schema has the following conceptual sections: + + - globals + - definitions + - attributes + - operations + - multicast groups + +Most properties in the schema accept (or in fact require) a ``doc`` +sub-property documenting the defined object. + +The following sections describe the properties of the most modern ``genetlink`` +schema. See the documentation of :doc:`genetlink-c <c-code-gen>` +for information on how C names are derived from name properties. + +See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>` for +information on the Netlink specification properties that are only relevant to +the kernel space and not part of the user space API. + +genetlink +========= + +Globals +------- + +Attributes listed directly at the root level of the spec file. + +name +~~~~ + +Name of the family. Name identifies the family in a unique way, since +the Family IDs are allocated dynamically. + +version +~~~~~~~ + +Generic Netlink family version, default is 1. + +protocol +~~~~~~~~ + +The schema level, default is ``genetlink``, which is the only value +allowed for new ``genetlink`` families. + +definitions +----------- + +Array of type and constant definitions. + +name +~~~~ + +Name of the type / constant. + +type +~~~~ + +One of the following types: + + - const - a single, standalone constant + - enum - defines an integer enumeration, with values for each entry + incrementing by 1, (e.g. 0, 1, 2, 3) + - flags - defines an integer enumeration, with values for each entry + occupying a bit, starting from bit 0, (e.g. 1, 2, 4, 8) + +value +~~~~~ + +The value for the ``const``. + +value-start +~~~~~~~~~~~ + +The first value for ``enum`` and ``flags``, allows overriding the default +start value of ``0`` (for ``enum``) and starting bit (for ``flags``). +For ``flags`` ``value-start`` selects the starting bit, not the shifted value. + +Sparse enumerations are not supported. + +entries +~~~~~~~ + +Array of names of the entries for ``enum`` and ``flags``. + +header +~~~~~~ + +For C-compatible languages, header which already defines this value. +In case the definition is shared by multiple families (e.g. ``IFNAMSIZ``) +code generators for C-compatible languages may prefer to add an appropriate +include instead of rendering a new definition. + +attribute-sets +-------------- + +This property contains information about netlink attributes of the family. +All families have at least one attribute set, most have multiple. +``attribute-sets`` is an array, with each entry describing a single set. + +Note that the spec is "flattened" and is not meant to visually resemble +the format of the netlink messages (unlike certain ad-hoc documentation +formats seen in kernel comments). In the spec subordinate attribute sets +are not defined inline as a nest, but defined in a separate attribute set +referred to with a ``nested-attributes`` property of the container. + +Spec may also contain fractional sets - sets which contain a ``subset-of`` +property. Such sets describe a section of a full set, allowing narrowing down +which attributes are allowed in a nest or refining the validation criteria. +Fractional sets can only be used in nests. They are not rendered to the uAPI +in any fashion. + +name +~~~~ + +Uniquely identifies the attribute set, operations and nested attributes +refer to the sets by the ``name``. + +subset-of +~~~~~~~~~ + +Re-defines a portion of another set (a fractional set). +Allows narrowing down fields and changing validation criteria +or even types of attributes depending on the nest in which they +are contained. The ``value`` of each attribute in the fractional +set is implicitly the same as in the main set. + +attributes +~~~~~~~~~~ + +List of attributes in the set. + +.. _attribute_properties: + +Attribute properties +-------------------- + +name +~~~~ + +Identifies the attribute, unique within the set. + +type +~~~~ + +Netlink attribute type, see :ref:`attr_types`. + +.. _assign_val: + +value +~~~~~ + +Numerical attribute ID, used in serialized Netlink messages. +The ``value`` property can be skipped, in which case the attribute ID +will be the value of the previous attribute plus one (recursively) +and ``1`` for the first attribute in the attribute set. + +Attributes (and operations) use ``1`` as the default value for the first +entry (unlike enums in definitions which start from ``0``) because +entry ``0`` is almost always reserved as undefined. Spec can explicitly +set value to ``0`` if needed. + +Note that the ``value`` of an attribute is defined only in its main set +(not in subsets). + +enum +~~~~ + +For integer types specifies that values in the attribute belong +to an ``enum`` or ``flags`` from the ``definitions`` section. + +enum-as-flags +~~~~~~~~~~~~~ + +Treat ``enum`` as ``flags`` regardless of its type in ``definitions``. +When both ``enum`` and ``flags`` forms are needed ``definitions`` should +contain an ``enum`` and attributes which need the ``flags`` form should +use this attribute. + +nested-attributes +~~~~~~~~~~~~~~~~~ + +Identifies the attribute space for attributes nested within given attribute. +Only valid for complex attributes which may have sub-attributes. + +multi-attr (arrays) +~~~~~~~~~~~~~~~~~~~ + +Boolean property signifying that the attribute may be present multiple times. +Allowing an attribute to repeat is the recommended way of implementing arrays +(no extra nesting). + +byte-order +~~~~~~~~~~ + +For integer types specifies attribute byte order - ``little-endian`` +or ``big-endian``. + +checks +~~~~~~ + +Input validation constraints used by the kernel. User space should query +the policy of the running kernel using Generic Netlink introspection, +rather than depend on what is specified in the spec file. + +The validation policy in the kernel is formed by combining the type +definition (``type`` and ``nested-attributes``) and the ``checks``. + +sub-type +~~~~~~~~ + +Legacy families have special ways of expressing arrays. ``sub-type`` can be +used to define the type of array members in case array members are not +fully defined as attributes (in a bona fide attribute space). For instance +a C array of u32 values can be specified with ``type: binary`` and +``sub-type: u32``. Binary types and legacy array formats are described in +more detail in :doc:`genetlink-legacy`. + +display-hint +~~~~~~~~~~~~ + +Optional format indicator that is intended only for choosing the right +formatting mechanism when displaying values of this type. Currently supported +hints are ``hex``, ``mac``, ``fddi``, ``ipv4``, ``ipv6`` and ``uuid``. + +operations +---------- + +This section describes messages passed between the kernel and the user space. +There are three types of entries in this section - operations, notifications +and events. + +Operations describe the most common request - response communication. User +sends a request and kernel replies. Each operation may contain any combination +of the two modes familiar to netlink users - ``do`` and ``dump``. +``do`` and ``dump`` in turn contain a combination of ``request`` and +``response`` properties. If no explicit message with attributes is passed +in a given direction (e.g. a ``dump`` which does not accept filter, or a ``do`` +of a SET operation to which the kernel responds with just the netlink error +code) ``request`` or ``response`` section can be skipped. +``request`` and ``response`` sections list the attributes allowed in a message. +The list contains only the names of attributes from a set referred +to by the ``attribute-set`` property. + +Notifications and events both refer to the asynchronous messages sent by +the kernel to members of a multicast group. The difference between the +two is that a notification shares its contents with a GET operation +(the name of the GET operation is specified in the ``notify`` property). +This arrangement is commonly used for notifications about +objects where the notification carries the full object definition. + +Events are more focused and carry only a subset of information rather than full +object state (a made up example would be a link state change event with just +the interface name and the new link state). Events contain the ``event`` +property. Events are considered less idiomatic for netlink and notifications +should be preferred. + +list +~~~~ + +The only property of ``operations`` for ``genetlink``, holds the list of +operations, notifications etc. + +Operation properties +-------------------- + +name +~~~~ + +Identifies the operation. + +value +~~~~~ + +Numerical message ID, used in serialized Netlink messages. +The same enumeration rules are applied as to +:ref:`attribute values<assign_val>`. + +attribute-set +~~~~~~~~~~~~~ + +Specifies the attribute set contained within the message. + +do +~~~ + +Specification for the ``doit`` request. Should contain ``request``, ``reply`` +or both of these properties, each holding a :ref:`attr_list`. + +dump +~~~~ + +Specification for the ``dumpit`` request. Should contain ``request``, ``reply`` +or both of these properties, each holding a :ref:`attr_list`. + +notify +~~~~~~ + +Designates the message as a notification. Contains the name of the operation +(possibly the same as the operation holding this property) which shares +the contents with the notification (``do``). + +event +~~~~~ + +Specification of attributes in the event, holds a :ref:`attr_list`. +``event`` property is mutually exclusive with ``notify``. + +mcgrp +~~~~~ + +Used with ``event`` and ``notify``, specifies which multicast group +message belongs to. + +.. _attr_list: + +Message attribute list +---------------------- + +``request``, ``reply`` and ``event`` properties have a single ``attributes`` +property which holds the list of attribute names. + +Messages can also define ``pre`` and ``post`` properties which will be rendered +as ``pre_doit`` and ``post_doit`` calls in the kernel (these properties should +be ignored by user space). + +mcast-groups +------------ + +This section lists the multicast groups of the family. + +list +~~~~ + +The only property of ``mcast-groups`` for ``genetlink``, holds the list +of groups. + +Multicast group properties +-------------------------- + +name +~~~~ + +Uniquely identifies the multicast group in the family. Similarly to +Family ID, Multicast Group ID needs to be resolved at runtime, based +on the name. + +.. _attr_types: + +Attribute types +=============== + +This section describes the attribute types supported by the ``genetlink`` +compatibility level. Refer to documentation of different levels for additional +attribute types. + +Scalar integer types +-------------------- + +Fixed-width integer types: +``u8``, ``u16``, ``u32``, ``u64``, ``s8``, ``s16``, ``s32``, ``s64``. + +Note that types smaller than 32 bit should be avoided as using them +does not save any memory in Netlink messages (due to alignment). +See :ref:`pad_type` for padding of 64 bit attributes. + +The payload of the attribute is the integer in host order unless ``byte-order`` +specifies otherwise. + +.. _pad_type: + +pad +--- + +Special attribute type used for padding attributes which require alignment +bigger than standard 4B alignment required by netlink (e.g. 64 bit integers). +There can only be a single attribute of the ``pad`` type in any attribute set +and it should be automatically used for padding when needed. + +flag +---- + +Attribute with no payload, its presence is the entire information. + +binary +------ + +Raw binary data attribute, the contents are opaque to generic code. + +string +------ + +Character string. Unless ``checks`` has ``unterminated-ok`` set to ``true`` +the string is required to be null terminated. +``max-len`` in ``checks`` indicates the longest possible string, +if not present the length of the string is unbounded. + +Note that ``max-len`` does not count the terminating character. + +nest +---- + +Attribute containing other (nested) attributes. +``nested-attributes`` specifies which attribute set is used inside. |