199 lines
7.3 KiB
ReStructuredText
199 lines
7.3 KiB
ReStructuredText
.. _writing_matchers:
|
|
|
|
Writing Matchers
|
|
================
|
|
|
|
On this page we will give some information about what a matcher is, and then provide an example of developing a simple match iteratively.
|
|
|
|
Types of Matchers
|
|
-----------------
|
|
|
|
There are three types of matches: Node, Narrowing, and Traversal. There isn't always a clear separation or distinction between them, so treat this explanation as illustrative rather than definitive. Here is the documentation on matchers: `https://clang.llvm.org/docs/LibASTMatchersReference.html <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
|
|
|
|
On that page it is not obvious, so we want to note, **cicking on the name of a matcher expands help about that matcher.** Example:
|
|
|
|
.. image:: documentation-expanded.png
|
|
|
|
Node Matchers
|
|
~~~~~~~~~~~~~
|
|
|
|
Node matchers can be thought of as 'Nouns'. They specify a **type** of node you want to match, that is, a particular *thing*. A function, a binary operation, a variable, a type.
|
|
|
|
A full list of `node matchers are listed in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#node-matchers>`_. Some common ones are ``functionDecl()``, ``binaryOperator()``, and ``stmt()``.
|
|
|
|
Narrowing Matchers
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Narrowing matchers can be thought of as 'Adjectives'. They narrow, or describe, a node, and therefore must be applied to a Node Matcher. For instance a node matcher may be a ``functionDecl``, and the narrowing matcher applied to it may be ``parameterCountIs``.
|
|
|
|
The `table in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#narrowing-matchers>`_ lists all the narrowing matchers, which they apply to and how to use them. Here is how to read the table:
|
|
|
|
.. image:: narrowing-matcher.png
|
|
|
|
And some examples:
|
|
|
|
::
|
|
|
|
m functionDecl(parameterCountIs(1))
|
|
m functionDecl(anyOf(isDefinition(), isVariadic()))
|
|
|
|
|
|
As you can see **only one Narrowing Matcher is allowed** and it goes inside the parens of the Node Matcher. In the first example, the matcher is ``parameterCountIs``, in the second it is ``anyOf``.
|
|
|
|
In the second, we use the singular ``anyOf`` matcher to match any of multiple other Narrowing Matchers: ``isDefinition`` or ``isVariadic``. The other two common combining narrowing matchers are ``allOf()`` and ``unless()``.
|
|
|
|
If you *need* to specify a narrowing matcher (because it's a required argument to some other matcher), you can use the ``anything()`` narrowing matcher to have a no-op narrowing matcher.
|
|
|
|
Traversal Matchers
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Traversal Matchers *also* can be thought of as adjectives - at least most of them. They also describe a specific node, but the difference from a narrowing matcher is that the scope of the description is broader than the individual node. A narrowing matcher says something about the node in isolation (e.g. the number of arguments it has) while a traversal matcher says something about the node's contents or place in the program.
|
|
|
|
Again, the `the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#traversal-matchers>`_ is the best place to explore and understand these, but here is a simple example for the traversal matcher ``hasArraySize()``:
|
|
|
|
::
|
|
|
|
Given:
|
|
class MyClass { };
|
|
MyClass *p1 = new MyClass[10];
|
|
|
|
|
|
cxxNewExpr()
|
|
matches the expression 'new MyClass[10]'.
|
|
|
|
cxxNewExpr(hasArraySize(integerLiteral(equals(9))))
|
|
does not match anything
|
|
|
|
cxxNewExpr(hasArraySize(integerLiteral(equals(10))))
|
|
matches the expression 'new MyClass[10]'.
|
|
|
|
|
|
|
|
Example of Iterative Matcher Development
|
|
----------------------------------------
|
|
|
|
When developing matchers, it will be much easier if you do the following:
|
|
|
|
1. Write out the code you want to match. Write it out in as many different ways as you can. Examples: For some value in the code use a variable, a constant and a function that returns a value. Put the code you want to match inside of a function, inside of a conditional, inside of a function call, and inside of an inline function definition.
|
|
2. Write out the code you *don't* want to match, but looks like code you do. Write out benign function calls, benign assignments, etc.
|
|
3. Iterate on your matcher and treat it as _code_ you're writing. Indent it, copy it somewhere in case your browser crashes, even stick it in a tiny temporary version-controlled file.
|
|
|
|
As an example of the above, below is a sample iterative development process of a more complicated matcher.
|
|
|
|
**Goal**: Match function calls where one of the parameters is an assignment expression with an integer literal, but the function parameter has a default value in the function definition.
|
|
|
|
::
|
|
|
|
int add1(int a, int b) { return a + b; }
|
|
int add2(int c, int d = 8) { return c + d; }
|
|
|
|
int main() {
|
|
int x, y, z;
|
|
|
|
add1(x, y); // <- No match, no assignment
|
|
add1(3 + 4, y); // <- No match, no assignment
|
|
add1(z = x, y); // <- No match, assignment, but not an integer literal
|
|
add1(z = 2, y); // <- No match, assignment, integer literal, but function parameter lacks default value
|
|
add2(3, z = 2); // <- Match
|
|
}
|
|
|
|
|
|
Here is the iterative development process:
|
|
|
|
::
|
|
|
|
//-------------------------------------
|
|
// Step 1: Find all the function calls
|
|
m callExpr()
|
|
// Matches all calls, as expected.
|
|
|
|
//-------------------------------------
|
|
// Step 2: Start refining based on the arguments to the call
|
|
m callExpr(forEachArgumentWithParam()))
|
|
// Error: forEachArgumentWithParam expects two parameters
|
|
|
|
//-------------------------------------
|
|
// Step 3: Figure out the syntax to matching all the calls with this new operator
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
anything(),
|
|
anything()
|
|
)
|
|
)
|
|
// Matches all calls, as expected
|
|
|
|
//-------------------------------------
|
|
// Step 4: Find the calls with a binary operator of any kind
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(),
|
|
anything()
|
|
)
|
|
)
|
|
// Does not match the first call, but matches the others
|
|
|
|
//-------------------------------------
|
|
// Step 5: Limit the binary operator to assignments
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(isAssignmentOperator()),
|
|
anything()
|
|
)
|
|
)
|
|
// Now matches the final three calls
|
|
|
|
//-------------------------------------
|
|
// Step 6: Starting to refine matching the right-hand of the assignment
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(
|
|
allOf(
|
|
isAssignmentOperator(),
|
|
hasRHS()
|
|
)),
|
|
anything()
|
|
)
|
|
)
|
|
// Error, hasRHS expects a parameter
|
|
|
|
//-------------------------------------
|
|
// Step 7:
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(
|
|
allOf(
|
|
isAssignmentOperator(),
|
|
hasRHS(anything())
|
|
)),
|
|
anything()
|
|
)
|
|
)
|
|
// Okay, back to matching the final three calls
|
|
|
|
//-------------------------------------
|
|
// Step 8: Refine to just integer literals
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(
|
|
allOf(
|
|
isAssignmentOperator(),
|
|
hasRHS(integerLiteral())
|
|
)),
|
|
anything()
|
|
)
|
|
)
|
|
// Now we match the final two calls
|
|
|
|
//-------------------------------------
|
|
// Step 9: Apply a restriction to the parameter definition
|
|
m callExpr(
|
|
forEachArgumentWithParam(
|
|
binaryOperator(
|
|
allOf(
|
|
isAssignmentOperator(),
|
|
hasRHS(integerLiteral())
|
|
)),
|
|
hasDefaultArgument()
|
|
)
|
|
)
|
|
// Now we match the final call
|