summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/macro-expansion.md
blob: 156df8d5fc31647bd609b8f6c38607ca392e336f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
# Macro expansion

<!-- toc -->

> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
> refactoring, so some of the links in this chapter may be broken.

Rust has a very powerful macro system. In the previous chapter, we saw how the
parser sets aside macros to be expanded (it temporarily uses [placeholders]).
This chapter is about the process of expanding those macros iteratively until
we have a complete AST for our crate with no unexpanded macros (or a compile
error).

[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html

First, we will discuss the algorithm that expands and integrates macro output
into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
we will look at the specifics of expanding different types of macros.

Many of the algorithms and data structures described below are in [`rustc_expand`],
with basic data structures in [`rustc_expand::base`][base].

Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
handled in [`rustc_expand::config`][cfg].

[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
[base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html
[cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html

## Expansion and AST Integration

First of all, expansion happens at the crate level. Given a raw source code for
a crate, the compiler will produce a massive AST with all macros expanded, all
modules inlined, etc. The primary entry point for this process is the
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
below for more detailed discussion of edge case expansion issues).

[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html

At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
queue of unresolved macro invocations (that is, macros we haven't found the
definition of yet). We repeatedly try to pick a macro from the queue, resolve
it, expand it, and integrate it back. If we can't make progress in an
iteration, this represents a compile error.  Here is the [algorithm][original]:

[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049

0. Initialize an `queue` of unresolved macros.
1. Repeat until `queue` is empty (or we make no progress, which is an error):
    0. [Resolve](./name-resolution.md) imports in our partially built crate as
       much as possible.
    1. Collect as many macro [`Invocation`s][inv] as possible from our
       partially built crate (fn-like, attributes, derives) and add them to the
       queue.
    2. Dequeue the first element, and attempt to resolve it.
    3. If it's resolved:
        0. Run the macro's expander function that consumes a [`TokenStream`] or
           AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
           the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
           each of which are a token (punctuation, identifier, or literal) or a
           delimited group (anything inside `()`/`[]`/`{}`)).
            - At this point, we know everything about the macro itself and can
              call `set_expn_data` to fill in its properties in the global data;
              that is the hygiene data associated with `ExpnId`. (See [the
              "Hygiene" section below][hybelow]).
        1. Integrate that piece of AST into the big existing partially built
           AST. This is essentially where the "token-like mass" becomes a
           proper set-in-stone AST with side-tables. It happens as follows:
            - If the macro produces tokens (e.g. a proc macro), we parse into
              an AST, which may produce parse errors.
            - During expansion, we create `SyntaxContext`s (hierarchy 2). (See
              [the "Hygiene" section below][hybelow])
            - These three passes happen one after another on every AST fragment
              freshly expanded from a macro:
                - [`NodeId`]s are assigned by [`InvocationCollector`]. This
                  also collects new macro calls from this new AST piece and
                  adds them to the queue.
                - ["Def paths"][defpath] are created and [`DefId`]s are
                  assigned to them by [`DefCollector`].
                - Names are put into modules (from the resolver's point of
                  view) by [`BuildReducedGraphVisitor`].
        2. After expanding a single macro and integrating its output, continue
           to the next iteration of [`fully_expand_fragment`][fef].
    4. If it's not resolved:
        0. Put the macro back in the queue
        1. Continue to next iteration...

[defpath]: hir.md#identifiers-in-the-hir
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
[hybelow]: #hygiene-and-hierarchies
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html

### Error Recovery

If we make no progress in an iteration, then we have reached a compilation
error (e.g. an undefined macro). We attempt to recover from failures
(unresolved macros or imports) for the sake of diagnostics. This allows
compilation to continue past the first error, so that we can report more errors
at a time. Recovery can't cause compilation to succeed. We know that it will
fail at this point. The recovery happens by expanding unresolved macros into
[`ExprKind::Err`][err].

[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err

### Name Resolution

Notice that name resolution is involved here: we need to resolve imports and
macro names in the above algorithm. This is done in
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
those resolutions, and reports various errors (e.g. "not found" or "found, but
it's unstable" or "expected x, found y"). However, we don't try to resolve
other names yet. This happens later, as we will see in the [next
chapter](./name-resolution.md).

[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html

### Eager Expansion

_Eager expansion_ means that we expand the arguments of a macro invocation
before the macro invocation itself. This is implemented only for a few special
built-in macros that expect literals; expanding arguments first for some of
these macro results in a smoother user experience.  As an example, consider the
following:

```rust,ignore
macro bar($i: ident) { $i }
macro foo($i: ident) { $i }

foo!(bar!(baz));
```

A lazy expansion would expand `foo!` first. An eager expansion would expand
`bar!` first.

Eager expansion is not a generally available feature of Rust.  Implementing
eager expansion more generally would be challenging, but we implement it for a
few special built-in macros for the sake of user experience.  The built-in
macros are implemented in [`rustc_builtin_macros`], along with some other early
code generation facilities like injection of standard library imports or
generation of test harness. There are some additional helpers for building
their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
performs a subset of the things that lazy (normal) expansion. It is done by
invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
whole crate, like we normally do).

### Other Data Structures

Here are some other notable data structures involved in expansion and integration:
- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
  resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
  pretty much everything else depending on [`rustc_ast`].
- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
  infrastructure in the process of its work
- [`Annotatable`] - a piece of AST that can be an attribute target, almost same
  thing as AstFragment except for types and patterns that can be produced by
  macros but cannot be annotated with attributes
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
  different `AstFragment` depending on its [`AstFragmentKind`] - item,
  or expression, or pattern etc.

[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
[`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
[`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html
[`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html
[`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html
[`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html
[`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html

## Hygiene and Hierarchies

If you have ever used C/C++ preprocessor macros, you know that there are some
annoying and hard-to-debug gotchas! For example, consider the following C code:

```c
#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};

// Then, somewhere else
struct Bar {
    ...
};

DEFINE_FOO
```

Most people avoid writing C like this – and for good reason: it doesn't
compile. The `struct Bar` defined by the macro clashes names with the `struct
Bar` defined in the code. Consider also the following example:

```c
#define DO_FOO(x) {\
    int y = 0;\
    foo(x, y);\
    }

// Then elsewhere
int y = 22;
DO_FOO(y);
```

Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead
we got `foo(0, 0)` because the macro defined its own `y`!

These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
handle names defined _within a macro_. In particular, a hygienic macro system
prevents errors due to names introduced within a macro. Rust macros are hygienic
in that they do not allow one to write the sorts of bugs above.

At a high level, hygiene within the Rust compiler is accomplished by keeping
track of the context where a name is introduced and used. We can then
disambiguate names based on that context. Future iterations of the macro system
will allow greater control to the macro author to use that context. For example,
a macro author may want to introduce a new name to the context where the macro
was called. Alternately, the macro author may be defining a variable for use
only within the macro (i.e. it should not be visible outside the macro).

[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
[code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
[parsing]: ./the-parser.html

The context is attached to AST nodes. All AST nodes generated by macros have
context attached. Additionally, there may be other nodes that have context
attached, such as some desugared syntax (non-macro-expanded nodes are
considered to just have the "root" context, as described below).
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
This struct also has hygiene information attached to it, as we will see later.

[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html

Because macros invocations and definitions can be nested, the syntax context of
a node must be a hierarchy. For example, if we expand a macro and there is
another macro invocation or definition in the generated output, then the syntax
context should reflect the nesting.

However, it turns out that there are actually a few types of context we may
want to track for different purposes. Thus, there are not just one but _three_
expansion hierarchies that together comprise the hygiene information for a
crate.

All of these hierarchies need some sort of "macro ID" to identify individual
elements in the chain of expansions. This ID is [`ExpnId`].  All macros receive
an integer ID, assigned continuously starting from 0 as we discover new macro
calls.  All hierarchies start at [`ExpnId::root()`][rootid], which is its own
parent.

[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
and structures related to hygiene and expansion that are kept in global data.

The actual hierarchies are stored in [`HygieneData`][hd]. This is a global
piece of data containing hygiene and expansion info that can be accessed from
any [`Ident`] without any context.


[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
[rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root
[hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html
[hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
[hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root
[`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html

### The Expansion Order Hierarchy

The first hierarchy tracks the order of expansions, i.e., when a macro
invocation is in the output of another macro.

Here, the children in the hierarchy will be the "innermost" tokens.  The
[`ExpnData`] struct itself contains a subset of properties from both macro
definition and macro call available through global data.
[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.

[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent

For example,

```rust,ignore
macro_rules! foo { () => { println!(); } }

fn main() { foo!(); }
```

In this code, the AST nodes that are finally generated would have hierarchy:

```
root
    expn_id_foo
        expn_id_println
```

### The Macro Definition Hierarchy

The second hierarchy tracks the order of macro definitions, i.e., when we are
expanding one macro another macro definition is revealed in its output.  This
one is a bit tricky and more complex than the other two hierarchies.

[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
[`SyntaxContextData`][scd] contains data associated with the given
`SyntaxContext`; mostly it is a cache for results of filtering that chain in
different ways.  [`SyntaxContextData::parent`][scdp] is the child -> parent
link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
elements in the chain.  The "chaining operator" is
[`SyntaxContext::apply_mark`][am] in compiler code.

A [`Span`][span], mentioned above, is actually just a compact representation of
a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
[`Symbol`] + `Span` (i.e. an interned string + hygiene data).

[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
[scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html
[scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent
[sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark

For built-in macros, we use the context:
`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
be defined at the hierarchy root. We do the same for proc-macros because we
haven't implemented cross-crate hygiene yet.

If the token had context `X` before being produced by a macro then after being
produced by the macro it has context `X -> macro_id`. Here are some examples:

Example 0:

```rust,ignore
macro m() { ident }

m!();
```

Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has
context `ROOT -> id(m)` after it's produced by `m`.

[scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root


Example 1:

```rust,ignore
macro m() { macro n() { ident } }

m!();
n!();
```
In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
after the first expansion, then `ROOT -> id(m) -> id(n)`.

Example 2:

Note that these chains are not entirely determined by their last element, in
other words `ExpnId` is not isomorphic to `SyntaxContext`.

```rust,ignore
macro m($i: ident) { macro n() { ($i, bar) } }

m!(foo);
```

After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
`ROOT -> id(m) -> id(n)`.

Finally, one last thing to mention is that currently, this hierarchy is subject
to the ["context transplantation hack"][hack]. Basically, the more modern (and
experimental) `macro` macros have stronger hygiene than the older MBE system,
but this can result in weird interactions between the two. The hack is intended
to make things "just work" for now.

[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732

### The Call-site Hierarchy

The third and final hierarchy tracks the location of macro invocations.

In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.

[callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site

Here is an example:

```rust,ignore
macro bar($i: ident) { $i }
macro foo($i: ident) { $i }

foo!(bar!(baz));
```

For the `baz` AST node in the final output, the first hierarchy is `ROOT ->
id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`.

### Macro Backtraces

Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
in [`rustc_span::hygiene`][hy].

[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html

## Producing Macro Output

Above, we saw how the output of a macro is integrated into the AST for a crate,
and we also saw how the hygiene data for a crate is generated. But how do we
actually produce the output of a macro? It depends on the type of macro.

There are two types of macros in Rust:
`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
(or "proc macros"; including custom derives). During the parsing phase, the normal
Rust parser will set aside the contents of macros and their invocations. Later,
macros are expanded using these portions of the code.

Some important data structures/interfaces here:
- [`SyntaxExtension`] - a lowered macro representation, contains its expander
  function, which transforms a `TokenStream` or AST into another `TokenStream`
  or AST + some additional data like stability, or a list of unstable features
  allowed inside the macro.
- [`SyntaxExtensionKind`] - expander functions may have several different
  signatures (take one token stream, or two, or a piece of AST, etc). This is
  an enum that lists them.
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
  traits representing the expander function signatures.

[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
[`BangProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.BangProcMacro.html
[`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html
[`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html
[`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html

## Macros By Example

MBEs have their own parser distinct from the normal Rust parser. When macros
are expanded, we may invoke the MBE parser to parse and expand a macro.  The
MBE parser, in turn, may call the normal Rust parser when it needs to bind a
metavariable (e.g.  `$my_expr`) while parsing the contents of a macro
invocation. The code for macro expansion is in
[`compiler/rustc_expand/src/mbe/`][code_dir].

### Example

It's helpful to have an example to refer to. For the remainder of this chapter,
whenever we refer to the "example _definition_", we mean the following:

```rust,ignore
macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    };
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    };
}
```

`$mvar` is called a _metavariable_. Unlike normal variables, rather than
binding to a value in a computation, a metavariable binds _at compile time_ to
a tree of _tokens_.  A _token_ is a single "unit" of the grammar, such as an
identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
special tokens, such as `EOF`, which indicates that there are no more tokens.
Token trees resulting from paired parentheses-like characters (`(`...`)`,
`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
in between (we do require that parentheses-like characters be balanced). Having
macro expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The macro expander (and much of the
rest of the compiler) doesn't really care that much about the exact line and
column of some syntactic construct in the code; it cares about what constructs
are used in the code. Using tokens allows us to care about _what_ without
worrying about _where_. For more information about tokens, see the
[Parsing][parsing] chapter of this book.

Whenever we refer to the "example _invocation_", we mean the following snippet:

```rust,ignore
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
```

The process of expanding the macro invocation into the syntax tree
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
called _macro expansion_, and it is the topic of this chapter.

### The MBE parser

There are two parts to MBE expansion: parsing the definition and parsing the
invocations. Interestingly, both are done by the macro parser.

Basically, the MBE parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the [Earley parsing
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].

The interface of the macro parser is as follows (this is slightly simplified):

```rust,ignore
fn parse_tt(
    &mut self,
    parser: &mut Cow<'_, Parser<'_>>,
    matcher: &[MatcherLoc]
) -> ParseResult
```

We use these items in macro parser:

- `parser` is a reference to the state of a normal Rust parser, including the
  token stream and parsing session. The token stream is what we are about to
  ask the MBE parser to parse. We will consume the raw stream of tokens and
  output a binding of metavariables to corresponding token trees. The parsing
  session can be used to report parser errors.
- `matcher` is a sequence of `MatcherLoc`s that we want to match
  the token stream against. They're converted from token trees before matching.

In the analogy of a regex parser, the token stream is the input and we are matching it
against the pattern `matcher`. Using our examples, the token stream could be the stream of
tokens containing the inside of the example invocation `print foo`, while `matcher`
might be the sequence of token (trees) `print $mvar:ident`.

The output of the parser is a [`ParseResult`], which indicates which of
three cases has occurred:

- Success: the token stream matches the given `matcher`, and we have produced a binding
  from metavariables to the corresponding token trees.
- Failure: the token stream does not match `matcher`. This results in an error message such as
  "No rule expected token _blah_".
- Error: some fatal error has occurred _in the parser_. For example, this
  happens if there are more than one pattern match, since that indicates
  the macro is ambiguous.

The full interface is defined [here][code_parse_int].

The macro parser does pretty much exactly the same as a normal regex parser with
one exception: in order to parse different types of metavariables, such as
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
normal Rust parser.

As mentioned above, both definitions and invocations of macros are parsed using
the macro parser. This is extremely non-intuitive and self-referential. The code
to parse macro _definitions_ is in
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
a `macro_rules` definition should have in its body at least one occurrence of a
token tree followed by `=>` followed by another token tree. When the compiler
comes to a `macro_rules` definition, it uses this pattern to match the two token
trees per rule in the definition of the macro _using the macro parser itself_.
In our example definition, the metavariable `$lhs` would match the patterns of
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`.  And `$rhs`
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
knowledge around for when it needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using
the same NFA-based macro parser that is described above. However, the matcher
used is the first token tree (`$lhs`) extracted from the arms of the macro
_definition_. Using our example, we would try to match the token stream `print
foo` from the invocation against the matchers `print $mvar:ident` and `print
twice $mvar:ident` that we previously extracted from the definition.  The
algorithm is exactly the same, but when the macro parser comes to a place in the
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an `ident` token,
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax
error.

For more information about the macro parser's implementation, see the comments
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].

### `macro`s and Macros 2.0

There is an old and mostly undocumented effort to improve the MBE system, give
it more hygiene-related features, better scoping and visibility rules, etc. There
hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
macros use the same machinery as today's MBEs; they just have additional
syntactic sugar and are allowed to be in namespaces.

## Procedural Macros

Procedural macros are also expanded during parsing, as mentioned above.
However, they use a rather different mechanism. Rather than having a parser in
the compiler, procedural macros are implemented as custom, third-party crates.
The compiler will compile the proc macro crate and specially annotated
functions in them (i.e. the proc macro itself), passing them a stream of tokens.

The proc macro can then transform the token stream and output a new token
stream, which is synthesized into the AST.

It's worth noting that the token stream type used by proc macros is _stable_,
so `rustc` does not use it internally (since our internal data structures are
unstable). The compiler's token stream is
[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
converted into the stable [`proc_macro::TokenStream`][stablets] and back in
[`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
Because the Rust ABI is unstable, we use the C ABI for this conversion.

[tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
[rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html
[pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html
[pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
[`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html

TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)

### Custom Derive

Custom derives are a special type of proc macro.

TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)