diff options
Diffstat (limited to 'doc/developer/xrefs.rst')
-rw-r--r-- | doc/developer/xrefs.rst | 215 |
1 files changed, 215 insertions, 0 deletions
diff --git a/doc/developer/xrefs.rst b/doc/developer/xrefs.rst new file mode 100644 index 0000000..e8e07df --- /dev/null +++ b/doc/developer/xrefs.rst @@ -0,0 +1,215 @@ +.. _xrefs: + +Introspection (xrefs) +===================== + +The FRR library provides an introspection facility called "xrefs." The intent +is to provide structured access to annotated entities in the compiled binary, +such as log messages and thread scheduling calls. + +Enabling and use +---------------- + +Support for emitting an xref is included in the macros for the specific +entities, e.g. :c:func:`zlog_info` contains the relevant statements. The only +requirement for the system to work is a GNU compatible linker that supports +section start/end symbols. (The only known linker on any system FRR supports +that does not do this is the Solaris linker.) + +To verify xrefs have been included in a binary or dynamic library, run +``readelf -n binary``. For individual object files, it's +``readelf -S object.o | grep xref_array`` instead. + +Structure and contents +---------------------- + +As a slight improvement to security and fault detection, xrefs are divided into +a ``const struct xref *`` and an optional ``struct xrefdata *``. The required +const part contains: + +.. c:member:: enum xref_type xref.type + + Identifies what kind of object the xref points to. + +.. c:member:: int line +.. c:member:: const char *xref.file +.. c:member:: const char *xref.func + + Source code location of the xref. ``func`` will be ``<global>`` for + xrefs outside of a function. + +.. c:member:: struct xrefdata *xref.xrefdata + + The optional writable part of the xref. NULL if no non-const part exists. + +The optional non-const part has: + +.. c:member:: const struct xref *xrefdata.xref + + Pointer back to the constant part. Since circular pointers are close to + impossible to emit from inside a function body's static variables, this + is initialized at startup. + +.. c:member:: char xrefdata.uid[16] + + Unique identifier, see below. + +.. c:member:: const char *xrefdata.hashstr +.. c:member:: uint32_t xrefdata.hashu32[2] + + Input to unique identifier calculation. These should encompass all + details needed to make an xref unique. If more than one string should + be considered, use string concatenation for the initializer. + +Both structures can be extended by embedding them in a larger type-specific +struct, e.g. ``struct xref_logmsg *``. + +Unique identifiers +------------------ + +All xrefs that have a writable ``struct xrefdata *`` part are assigned an +unique identifier, which is formed as base32 (crockford) SHA256 on: + +- the source filename +- the ``hashstr`` field +- the ``hashu32`` fields + +.. note:: + + Function names and line numbers are intentionally not included to allow + moving items within a file without affecting the identifier. + +For running executables, this hash is calculated once at startup. When +directly reading from an ELF file with external tooling, the value must be +calculated when necessary. + +The identifiers have the form ``AXXXX-XXXXX`` where ``X`` is +``0-9, A-Z except I,L,O,U`` and ``A`` is ``G-Z except I,L,O,U`` (i.e. the +identifiers always start with a letter.) When reading identifiers from user +input, ``I`` and ``L`` should be replaced with ``1`` and ``O`` should be +replaced with ``0``. There are 49 bits of entropy in this identifier. + +Underlying machinery +-------------------- + +Xrefs are nothing other than global variables with some extra glue to make +them possible to find from the outside by looking at the binary. The first +non-obvious part is that they can occur inside of functions, since they're +defined as ``static``. They don't have a visible name -- they don't need one. + +To make finding these variables possible, another global variable, a pointer +to the first one, is created in the same way. However, it is put in a special +ELF section through ``__attribute__((section("xref_array")))``. This is the +section you can see with readelf. + +Finally, on the level of a whole executable or library, the linker will stuff +the individual pointers consecutive to each other since they're in the same +section — hence the array. Start and end of this array is given by the +linker-autogenerated ``__start_xref_array`` and ``__stop_xref_array`` symbols. +Using these, both a constructor to run at startup as well as an ELF note are +created. + +The ELF note is the entrypoint for externally retrieving xrefs from a binary +without having to run it. It can be found by walking through the ELF data +structures even if the binary has been fully stripped of debug and section +information. SystemTap's SDT probes & LTTng's trace points work in the same +way (though they emit 1 note for each probe, while xrefs only emit one note +in total which refers to the array.) Using xrefs does not impact SystemTap +or LTTng, the notes have identifiers they can be distinguished by. + +The ELF structure of a linked binary (library or executable) will look like +this:: + + $ readelf --wide -l -n lib/.libs/libfrr.so + + Elf file type is DYN (Shared object file) + Entry point 0x67d21 + There are 12 program headers, starting at offset 64 + + Program Headers: + Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align + PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002a0 0x0002a0 R 0x8 + INTERP 0x125560 0x0000000000125560 0x0000000000125560 0x00001c 0x00001c R 0x10 + [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] + LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x02aff0 0x02aff0 R 0x1000 + LOAD 0x02b000 0x000000000002b000 0x000000000002b000 0x0b2889 0x0b2889 R E 0x1000 + LOAD 0x0de000 0x00000000000de000 0x00000000000de000 0x070048 0x070048 R 0x1000 + LOAD 0x14e428 0x000000000014f428 0x000000000014f428 0x00fb70 0x01a2b8 RW 0x1000 + DYNAMIC 0x157a40 0x0000000000158a40 0x0000000000158a40 0x000270 0x000270 RW 0x8 + NOTE 0x0002e0 0x00000000000002e0 0x00000000000002e0 0x00004c 0x00004c R 0x4 + TLS 0x14e428 0x000000000014f428 0x000000000014f428 0x000000 0x000008 R 0x8 + GNU_EH_FRAME 0x12557c 0x000000000012557c 0x000000000012557c 0x00819c 0x00819c R 0x4 + GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 + GNU_RELRO 0x14e428 0x000000000014f428 0x000000000014f428 0x009bd8 0x009bd8 R 0x1 + + (...) + + Displaying notes found in: .note.gnu.build-id + Owner Data size Description + GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 6a1f66be38b523095ebd6ec13cc15820cede903d + + Displaying notes found in: .note.FRR + Owner Data size Description + FRRouting 0x00000010 Unknown note type: (0x46455258) description data: 6c eb 15 00 00 00 00 00 74 ec 15 00 00 00 00 00 + +Where 0x15eb6c…0x15ec74 are the offsets (relative to the note itself) where +the xref array is in the file. Also note the owner is clearly marked as +"FRRouting" and the type is "XREF" in hex. + +For SystemTap's use of ELF notes, refer to +https://libstapsdt.readthedocs.io/en/latest/how-it-works/internals.html as an +entry point. + +.. note:: + + Due to GCC bug 41091, the "xref_array" section is not correctly generated + for C++ code when compiled by GCC. A workaround is present for runtime + functionality, but to extract the xrefs from a C++ source file, it needs + to be built with clang (or a future fixed version of GCC) instead. + +Extraction tool +--------------- + +The FRR source contains a matching tool to extract xref data from compiled ELF +binaries in ``python/xrelfo.py``. This tool uses CPython extensions +implemented in ``clippy`` and must therefore be executed with that. + +``xrelfo.py`` processes input from one or more ELF file (.o, .so, executable), +libtool object (.lo, .la, executable wrapper script) or JSON (output from +``xrelfo.py``) and generates an output JSON file. During standard FRR build, +it is invoked on all binaries and libraries and the result is combined into +``frr.json``. + +ELF files from any operating system, CPU architecture and endianness can be +processed on any host. Any issues with this are bugs in ``xrelfo.py`` +(or clippy's ELF code.) + +``xrelfo.py`` also performs some sanity checking, particularly on log +messages. The following options are available: + +.. option:: -o OUTPUT + + Filename to write JSON output to. As a convention, a ``.xref`` filename + extension is used. + +.. option:: -Wlog-format + + Performs extra checks on log message format strings, particularly checks + for ``\t`` and ``\n`` characters (which should not be used in log messages). + +.. option:: -Wlog-args + + Generates cleanup hints for format string arguments where + :c:func:`printfrr()` extensions could be used, e.g. replacing ``inet_ntoa`` + with ``%pI4``. + +.. option:: --profile + + Runs the Python profiler to identify hotspots in the ``xrelfo.py`` code. + +``xrelfo.py`` uses information about C structure definitions saved in +``python/xrefstructs.json``. This file is included with the FRR sources and +only needs to be regenerated when some of the ``struct xref_*`` definitions +are changed (which should be almost never). The file is written by +``python/tiabwarfo.py``, which uses ``pahole`` to extract the necessary data +from DWARF information. |