summaryrefslogtreecommitdiffstats
path: root/doc/developer/xrefs.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/developer/xrefs.rst')
-rw-r--r--doc/developer/xrefs.rst215
1 files changed, 215 insertions, 0 deletions
diff --git a/doc/developer/xrefs.rst b/doc/developer/xrefs.rst
new file mode 100644
index 0000000..e8e07df
--- /dev/null
+++ b/doc/developer/xrefs.rst
@@ -0,0 +1,215 @@
+.. _xrefs:
+
+Introspection (xrefs)
+=====================
+
+The FRR library provides an introspection facility called "xrefs." The intent
+is to provide structured access to annotated entities in the compiled binary,
+such as log messages and thread scheduling calls.
+
+Enabling and use
+----------------
+
+Support for emitting an xref is included in the macros for the specific
+entities, e.g. :c:func:`zlog_info` contains the relevant statements. The only
+requirement for the system to work is a GNU compatible linker that supports
+section start/end symbols. (The only known linker on any system FRR supports
+that does not do this is the Solaris linker.)
+
+To verify xrefs have been included in a binary or dynamic library, run
+``readelf -n binary``. For individual object files, it's
+``readelf -S object.o | grep xref_array`` instead.
+
+Structure and contents
+----------------------
+
+As a slight improvement to security and fault detection, xrefs are divided into
+a ``const struct xref *`` and an optional ``struct xrefdata *``. The required
+const part contains:
+
+.. c:member:: enum xref_type xref.type
+
+ Identifies what kind of object the xref points to.
+
+.. c:member:: int line
+.. c:member:: const char *xref.file
+.. c:member:: const char *xref.func
+
+ Source code location of the xref. ``func`` will be ``<global>`` for
+ xrefs outside of a function.
+
+.. c:member:: struct xrefdata *xref.xrefdata
+
+ The optional writable part of the xref. NULL if no non-const part exists.
+
+The optional non-const part has:
+
+.. c:member:: const struct xref *xrefdata.xref
+
+ Pointer back to the constant part. Since circular pointers are close to
+ impossible to emit from inside a function body's static variables, this
+ is initialized at startup.
+
+.. c:member:: char xrefdata.uid[16]
+
+ Unique identifier, see below.
+
+.. c:member:: const char *xrefdata.hashstr
+.. c:member:: uint32_t xrefdata.hashu32[2]
+
+ Input to unique identifier calculation. These should encompass all
+ details needed to make an xref unique. If more than one string should
+ be considered, use string concatenation for the initializer.
+
+Both structures can be extended by embedding them in a larger type-specific
+struct, e.g. ``struct xref_logmsg *``.
+
+Unique identifiers
+------------------
+
+All xrefs that have a writable ``struct xrefdata *`` part are assigned an
+unique identifier, which is formed as base32 (crockford) SHA256 on:
+
+- the source filename
+- the ``hashstr`` field
+- the ``hashu32`` fields
+
+.. note::
+
+ Function names and line numbers are intentionally not included to allow
+ moving items within a file without affecting the identifier.
+
+For running executables, this hash is calculated once at startup. When
+directly reading from an ELF file with external tooling, the value must be
+calculated when necessary.
+
+The identifiers have the form ``AXXXX-XXXXX`` where ``X`` is
+``0-9, A-Z except I,L,O,U`` and ``A`` is ``G-Z except I,L,O,U`` (i.e. the
+identifiers always start with a letter.) When reading identifiers from user
+input, ``I`` and ``L`` should be replaced with ``1`` and ``O`` should be
+replaced with ``0``. There are 49 bits of entropy in this identifier.
+
+Underlying machinery
+--------------------
+
+Xrefs are nothing other than global variables with some extra glue to make
+them possible to find from the outside by looking at the binary. The first
+non-obvious part is that they can occur inside of functions, since they're
+defined as ``static``. They don't have a visible name -- they don't need one.
+
+To make finding these variables possible, another global variable, a pointer
+to the first one, is created in the same way. However, it is put in a special
+ELF section through ``__attribute__((section("xref_array")))``. This is the
+section you can see with readelf.
+
+Finally, on the level of a whole executable or library, the linker will stuff
+the individual pointers consecutive to each other since they're in the same
+section — hence the array. Start and end of this array is given by the
+linker-autogenerated ``__start_xref_array`` and ``__stop_xref_array`` symbols.
+Using these, both a constructor to run at startup as well as an ELF note are
+created.
+
+The ELF note is the entrypoint for externally retrieving xrefs from a binary
+without having to run it. It can be found by walking through the ELF data
+structures even if the binary has been fully stripped of debug and section
+information. SystemTap's SDT probes & LTTng's trace points work in the same
+way (though they emit 1 note for each probe, while xrefs only emit one note
+in total which refers to the array.) Using xrefs does not impact SystemTap
+or LTTng, the notes have identifiers they can be distinguished by.
+
+The ELF structure of a linked binary (library or executable) will look like
+this::
+
+ $ readelf --wide -l -n lib/.libs/libfrr.so
+
+ Elf file type is DYN (Shared object file)
+ Entry point 0x67d21
+ There are 12 program headers, starting at offset 64
+
+ Program Headers:
+ Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
+ PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002a0 0x0002a0 R 0x8
+ INTERP 0x125560 0x0000000000125560 0x0000000000125560 0x00001c 0x00001c R 0x10
+ [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
+ LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x02aff0 0x02aff0 R 0x1000
+ LOAD 0x02b000 0x000000000002b000 0x000000000002b000 0x0b2889 0x0b2889 R E 0x1000
+ LOAD 0x0de000 0x00000000000de000 0x00000000000de000 0x070048 0x070048 R 0x1000
+ LOAD 0x14e428 0x000000000014f428 0x000000000014f428 0x00fb70 0x01a2b8 RW 0x1000
+ DYNAMIC 0x157a40 0x0000000000158a40 0x0000000000158a40 0x000270 0x000270 RW 0x8
+ NOTE 0x0002e0 0x00000000000002e0 0x00000000000002e0 0x00004c 0x00004c R 0x4
+ TLS 0x14e428 0x000000000014f428 0x000000000014f428 0x000000 0x000008 R 0x8
+ GNU_EH_FRAME 0x12557c 0x000000000012557c 0x000000000012557c 0x00819c 0x00819c R 0x4
+ GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
+ GNU_RELRO 0x14e428 0x000000000014f428 0x000000000014f428 0x009bd8 0x009bd8 R 0x1
+
+ (...)
+
+ Displaying notes found in: .note.gnu.build-id
+ Owner Data size Description
+ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 6a1f66be38b523095ebd6ec13cc15820cede903d
+
+ Displaying notes found in: .note.FRR
+ Owner Data size Description
+ FRRouting 0x00000010 Unknown note type: (0x46455258) description data: 6c eb 15 00 00 00 00 00 74 ec 15 00 00 00 00 00
+
+Where 0x15eb6c…0x15ec74 are the offsets (relative to the note itself) where
+the xref array is in the file. Also note the owner is clearly marked as
+"FRRouting" and the type is "XREF" in hex.
+
+For SystemTap's use of ELF notes, refer to
+https://libstapsdt.readthedocs.io/en/latest/how-it-works/internals.html as an
+entry point.
+
+.. note::
+
+ Due to GCC bug 41091, the "xref_array" section is not correctly generated
+ for C++ code when compiled by GCC. A workaround is present for runtime
+ functionality, but to extract the xrefs from a C++ source file, it needs
+ to be built with clang (or a future fixed version of GCC) instead.
+
+Extraction tool
+---------------
+
+The FRR source contains a matching tool to extract xref data from compiled ELF
+binaries in ``python/xrelfo.py``. This tool uses CPython extensions
+implemented in ``clippy`` and must therefore be executed with that.
+
+``xrelfo.py`` processes input from one or more ELF file (.o, .so, executable),
+libtool object (.lo, .la, executable wrapper script) or JSON (output from
+``xrelfo.py``) and generates an output JSON file. During standard FRR build,
+it is invoked on all binaries and libraries and the result is combined into
+``frr.json``.
+
+ELF files from any operating system, CPU architecture and endianness can be
+processed on any host. Any issues with this are bugs in ``xrelfo.py``
+(or clippy's ELF code.)
+
+``xrelfo.py`` also performs some sanity checking, particularly on log
+messages. The following options are available:
+
+.. option:: -o OUTPUT
+
+ Filename to write JSON output to. As a convention, a ``.xref`` filename
+ extension is used.
+
+.. option:: -Wlog-format
+
+ Performs extra checks on log message format strings, particularly checks
+ for ``\t`` and ``\n`` characters (which should not be used in log messages).
+
+.. option:: -Wlog-args
+
+ Generates cleanup hints for format string arguments where
+ :c:func:`printfrr()` extensions could be used, e.g. replacing ``inet_ntoa``
+ with ``%pI4``.
+
+.. option:: --profile
+
+ Runs the Python profiler to identify hotspots in the ``xrelfo.py`` code.
+
+``xrelfo.py`` uses information about C structure definitions saved in
+``python/xrefstructs.json``. This file is included with the FRR sources and
+only needs to be regenerated when some of the ``struct xref_*`` definitions
+are changed (which should be almost never). The file is written by
+``python/tiabwarfo.py``, which uses ``pahole`` to extract the necessary data
+from DWARF information.