summaryrefslogtreecommitdiffstats
path: root/docs/code-quality/static-analysis/writing-new/writing-matchers.rst
blob: 5b693f5f27a1934711036d4ea7baba7b9f826ba3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
.. _writing_matchers:

Writing Matchers
================

On this page we will give some information about what a matcher is, and then provide an example of developing a simple match iteratively.

Types of Matchers
-----------------

There are three types of matches: Node, Narrowing, and Traversal.  There isn't always a clear separation or distinction between them, so treat this explanation as illustrative rather than definitive.  Here is the documentation on matchers: `https://clang.llvm.org/docs/LibASTMatchersReference.html <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_

On that page it is not obvious, so we want to note, **cicking on the name of a matcher expands help about that matcher.** Example:

.. image:: documentation-expanded.png

Node Matchers
~~~~~~~~~~~~~

Node matchers can be thought of as 'Nouns'. They specify a **type** of node you want to match, that is, a particular *thing*. A function, a binary operation, a variable, a type.

A full list of `node matchers are listed in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#node-matchers>`_. Some common ones are ``functionDecl()``, ``binaryOperator()``, and ``stmt()``.

Narrowing Matchers
~~~~~~~~~~~~~~~~~~

Narrowing matchers can be thought of as 'Adjectives'. They narrow, or describe, a node, and therefore must be applied to a Node Matcher.  For instance a node matcher may be a ``functionDecl``, and the narrowing matcher applied to it may be ``parameterCountIs``.

The `table in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#narrowing-matchers>`_ lists all the narrowing matchers, which they apply to and how to use them.  Here is how to read the table:

.. image:: narrowing-matcher.png

And some examples:

::

  m functionDecl(parameterCountIs(1))
  m functionDecl(anyOf(isDefinition(), isVariadic()))


As you can see **only one Narrowing Matcher is allowed** and it goes inside the parens of the Node Matcher. In the first example, the matcher is ``parameterCountIs``, in the second it is ``anyOf``.

In the second, we use the singular ``anyOf`` matcher to match any of multiple other Narrowing Matchers: ``isDefinition`` or ``isVariadic``. The other two common combining narrowing matchers are ``allOf()`` and ``unless()``.

If you *need* to specify a narrowing matcher (because it's a required argument to some other matcher), you can use the ``anything()`` narrowing matcher to have a no-op narrowing matcher.

Traversal Matchers
~~~~~~~~~~~~~~~~~~

Traversal Matchers *also* can be thought of as adjectives - at least most of them.  They also describe a specific node, but the difference from a narrowing matcher is that the scope of the description is broader than the individual node.  A narrowing matcher says something about the node in isolation (e.g. the number of arguments it has) while a traversal matcher says something about the node's contents or place in the program.

Again, the `the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#traversal-matchers>`_ is the best place to explore and understand these, but here is a simple example for the traversal matcher ``hasArraySize()``:

::

  Given:
    class MyClass { };
    MyClass *p1 = new MyClass[10];


  cxxNewExpr()
    matches the expression 'new MyClass[10]'.

  cxxNewExpr(hasArraySize(integerLiteral(equals(9))))
    does not match anything

  cxxNewExpr(hasArraySize(integerLiteral(equals(10))))
    matches the expression 'new MyClass[10]'.



Example of Iterative Matcher Development
----------------------------------------

When developing matchers, it will be much easier if you do the following:

1. Write out the code you want to match. Write it out in as many different ways as you can. Examples: For some value in the code use a variable, a constant and a function that returns a value. Put the code you want to match inside of a function, inside of a conditional, inside of a function call, and inside of an inline function definition.
2. Write out the code you *don't* want to match, but looks like code you do. Write out benign function calls, benign assignments, etc.
3. Iterate on your matcher and treat it as _code_ you're writing. Indent it, copy it somewhere in case your browser crashes, even stick it in a tiny temporary version-controlled file.

As an example of the above, below is a sample iterative development process of a more complicated matcher.

 **Goal**: Match function calls where one of the parameters is an assignment expression with an integer literal, but the function parameter has a default value in the function definition.

::

  int add1(int a, int b) { return a + b; }
  int add2(int c, int d = 8) { return c + d; }

  int main() {
   int x, y, z;

   add1(x, y);     // <- No match, no assignment
   add1(3 + 4, y); // <- No match, no assignment
   add1(z = x, y); // <- No match, assignment, but not an integer literal
   add1(z = 2, y); // <- No match, assignment, integer literal, but function parameter lacks default value
   add2(3, z = 2); // <- Match
  }


Here is the iterative development process:

::

  //-------------------------------------
  // Step 1: Find all the function calls
  m callExpr()
  // Matches all calls, as expected.

  //-------------------------------------
  // Step 2: Start refining based on the arguments to the call
  m callExpr(forEachArgumentWithParam()))
  // Error: forEachArgumentWithParam expects two parameters

  //-------------------------------------
  // Step 3: Figure out the syntax to matching all the calls with this new operator
  m callExpr(
  	forEachArgumentWithParam(
  		anything(),
  		anything()
  	)
  )
  // Matches all calls, as expected

  //-------------------------------------
  // Step 4: Find the calls with a binary operator of any kind
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(),
       anything()
     )
  )
  // Does not match the first call, but matches the others

  //-------------------------------------
  // Step 5: Limit the binary operator to assignments
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(isAssignmentOperator()),
       anything()
     )
  )
  // Now matches the final three calls

  //-------------------------------------
  // Step 6: Starting to refine matching the right-hand of the assignment
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(
       	allOf(
       	  isAssignmentOperator(),
       	  hasRHS()
   	    )),
       anything()
     )
  )
  // Error, hasRHS expects a parameter

  //-------------------------------------
  // Step 7:
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(
       	allOf(
       	  isAssignmentOperator(),
       	  hasRHS(anything())
   		  )),
       anything()
     )
  )
  // Okay, back to matching the final three calls

  //-------------------------------------
  // Step 8: Refine to just integer literals
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(
       	allOf(
       	  isAssignmentOperator(),
       	  hasRHS(integerLiteral())
   		  )),
       anything()
     )
  )
  // Now we match the final two calls

  //-------------------------------------
  // Step 9: Apply a restriction to the parameter definition
  m callExpr(
    forEachArgumentWithParam(
       binaryOperator(
       	allOf(
       	  isAssignmentOperator(),
       	  hasRHS(integerLiteral())
   		  )),
       hasDefaultArgument()
     )
  )
  // Now we match the final call