summaryrefslogtreecommitdiffstats
path: root/doc/developer/xrefs.rst
blob: e8e07dfe1d55a7f6560a6ed6bd5e17dcff345cae (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
.. _xrefs:

Introspection (xrefs)
=====================

The FRR library provides an introspection facility called "xrefs."  The intent
is to provide structured access to annotated entities in the compiled binary,
such as log messages and thread scheduling calls.

Enabling and use
----------------

Support for emitting an xref is included in the macros for the specific
entities, e.g. :c:func:`zlog_info` contains the relevant statements.  The only
requirement for the system to work is a GNU compatible linker that supports
section start/end symbols.  (The only known linker on any system FRR supports
that does not do this is the Solaris linker.)

To verify xrefs have been included in a binary or dynamic library, run
``readelf -n binary``.  For individual object files, it's
``readelf -S object.o | grep xref_array`` instead.

Structure and contents
----------------------

As a slight improvement to security and fault detection, xrefs are divided into
a ``const struct xref *`` and an optional ``struct xrefdata *``.  The required
const part contains:

.. c:member:: enum xref_type xref.type

   Identifies what kind of object the xref points to.

.. c:member:: int line
.. c:member:: const char *xref.file
.. c:member:: const char *xref.func

   Source code location of the xref.  ``func`` will be ``<global>`` for
   xrefs outside of a function.

.. c:member:: struct xrefdata *xref.xrefdata

   The optional writable part of the xref.  NULL if no non-const part exists.

The optional non-const part has:

.. c:member:: const struct xref *xrefdata.xref

   Pointer back to the constant part.  Since circular pointers are close to
   impossible to emit from inside a function body's static variables, this
   is initialized at startup.

.. c:member:: char xrefdata.uid[16]

   Unique identifier, see below.

.. c:member:: const char *xrefdata.hashstr
.. c:member:: uint32_t xrefdata.hashu32[2]

   Input to unique identifier calculation.  These should encompass all
   details needed to make an xref unique.  If more than one string should
   be considered, use string concatenation for the initializer.

Both structures can be extended by embedding them in a larger type-specific
struct, e.g. ``struct xref_logmsg *``.

Unique identifiers
------------------

All xrefs that have a writable ``struct xrefdata *`` part are assigned an
unique identifier, which is formed as base32 (crockford) SHA256 on:

- the source filename
- the ``hashstr`` field
- the ``hashu32`` fields

.. note::

   Function names and line numbers are intentionally not included to allow
   moving items within a file without affecting the identifier.

For running executables, this hash is calculated once at startup.  When
directly reading from an ELF file with external tooling, the value must be
calculated when necessary.

The identifiers have the form ``AXXXX-XXXXX`` where ``X`` is
``0-9, A-Z except I,L,O,U`` and ``A`` is ``G-Z except I,L,O,U`` (i.e. the
identifiers always start with a letter.)  When reading identifiers from user
input, ``I`` and ``L`` should be replaced with ``1`` and ``O`` should be
replaced with ``0``.  There are 49 bits of entropy in this identifier.

Underlying machinery
--------------------

Xrefs are nothing other than global variables with some extra glue to make
them possible to find from the outside by looking at the binary.  The first
non-obvious part is that they can occur inside of functions, since they're
defined as ``static``.  They don't have a visible name -- they don't need one.

To make finding these variables possible, another global variable, a pointer
to the first one, is created in the same way.  However, it is put in a special
ELF section through ``__attribute__((section("xref_array")))``.  This is the
section you can see with readelf.

Finally, on the level of a whole executable or library, the linker will stuff
the individual pointers consecutive to each other since they're in the same
section — hence the array.  Start and end of this array is given by the
linker-autogenerated ``__start_xref_array`` and ``__stop_xref_array`` symbols.
Using these, both a constructor to run at startup as well as an ELF note are
created.

The ELF note is the entrypoint for externally retrieving xrefs from a binary
without having to run it.  It can be found by walking through the ELF data
structures even if the binary has been fully stripped of debug and section
information.  SystemTap's SDT probes & LTTng's trace points work in the same
way (though they emit 1 note for each probe, while xrefs only emit one note
in total which refers to the array.)  Using xrefs does not impact SystemTap
or LTTng, the notes have identifiers they can be distinguished by.

The ELF structure of a linked binary (library or executable) will look like
this::

  $ readelf --wide -l -n lib/.libs/libfrr.so

  Elf file type is DYN (Shared object file)
  Entry point 0x67d21
  There are 12 program headers, starting at offset 64

  Program Headers:
    Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
    PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0002a0 0x0002a0 R   0x8
    INTERP         0x125560 0x0000000000125560 0x0000000000125560 0x00001c 0x00001c R   0x10
        [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
    LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x02aff0 0x02aff0 R   0x1000
    LOAD           0x02b000 0x000000000002b000 0x000000000002b000 0x0b2889 0x0b2889 R E 0x1000
    LOAD           0x0de000 0x00000000000de000 0x00000000000de000 0x070048 0x070048 R   0x1000
    LOAD           0x14e428 0x000000000014f428 0x000000000014f428 0x00fb70 0x01a2b8 RW  0x1000
    DYNAMIC        0x157a40 0x0000000000158a40 0x0000000000158a40 0x000270 0x000270 RW  0x8
    NOTE           0x0002e0 0x00000000000002e0 0x00000000000002e0 0x00004c 0x00004c R   0x4
    TLS            0x14e428 0x000000000014f428 0x000000000014f428 0x000000 0x000008 R   0x8
    GNU_EH_FRAME   0x12557c 0x000000000012557c 0x000000000012557c 0x00819c 0x00819c R   0x4
    GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
    GNU_RELRO      0x14e428 0x000000000014f428 0x000000000014f428 0x009bd8 0x009bd8 R   0x1

  (...)

  Displaying notes found in: .note.gnu.build-id
    Owner                Data size 	Description
    GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)	    Build ID: 6a1f66be38b523095ebd6ec13cc15820cede903d

  Displaying notes found in: .note.FRR
    Owner                Data size 	Description
    FRRouting            0x00000010	Unknown note type: (0x46455258)	   description data: 6c eb 15 00 00 00 00 00 74 ec 15 00 00 00 00 00

Where 0x15eb6c…0x15ec74 are the offsets (relative to the note itself) where
the xref array is in the file.  Also note the owner is clearly marked as
"FRRouting" and the type is "XREF" in hex.

For SystemTap's use of ELF notes, refer to
https://libstapsdt.readthedocs.io/en/latest/how-it-works/internals.html as an
entry point.

.. note::

   Due to GCC bug 41091, the "xref_array" section is not correctly generated
   for C++ code when compiled by GCC.  A workaround is present for runtime
   functionality, but to extract the xrefs from a C++ source file, it needs
   to be built with clang (or a future fixed version of GCC) instead.

Extraction tool
---------------

The FRR source contains a matching tool to extract xref data from compiled ELF
binaries in ``python/xrelfo.py``.  This tool uses CPython extensions
implemented in ``clippy`` and must therefore be executed with that.

``xrelfo.py`` processes input from one or more ELF file (.o, .so, executable),
libtool object (.lo, .la, executable wrapper script) or JSON (output from
``xrelfo.py``) and generates an output JSON file.  During standard FRR build,
it is invoked on all binaries and libraries and the result is combined into
``frr.json``.

ELF files from any operating system, CPU architecture and endianness can be
processed on any host.  Any issues with this are bugs in ``xrelfo.py``
(or clippy's ELF code.)

``xrelfo.py`` also performs some sanity checking, particularly on log
messages.  The following options are available:

.. option:: -o OUTPUT

   Filename to write JSON output to.  As a convention, a ``.xref`` filename
   extension is used.

.. option:: -Wlog-format

   Performs extra checks on log message format strings, particularly checks
   for ``\t`` and ``\n`` characters (which should not be used in log messages).

.. option:: -Wlog-args

   Generates cleanup hints for format string arguments where
   :c:func:`printfrr()` extensions could be used, e.g. replacing ``inet_ntoa``
   with ``%pI4``.

.. option:: --profile

   Runs the Python profiler to identify hotspots in the ``xrelfo.py`` code.

``xrelfo.py`` uses information about C structure definitions saved in
``python/xrefstructs.json``.  This file is included with the FRR sources and
only needs to be regenerated when some of the ``struct xref_*`` definitions
are changed (which should be almost never).  The file is written by
``python/tiabwarfo.py``, which uses ``pahole`` to extract the necessary data
from DWARF information.