summaryrefslogtreecommitdiffstats
path: root/third_party/rust/jsparagus/js_parser/README.md
blob: a8a814619d700ce1c2051d0aaf71eb720cc70af6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
## jsparagus/js_parser: Generating a parser for JavaScript

In this directory:

*   **esgrammar.pgen** A grammar for the mini-language the ECMAScript
    standard uses to describe ES grammar.

*   **es.esgrammar** - The actual grammar for ECMAScript, in emu-grammar
    format, extracted automatically from the spec.

*   **extract_es_grammar.py** - The script that creates *es.esgrammar*.

*   **es-simplified.esgrammar** - A hacked version of *es.esgrammar* that
    jsparagus can actually handle.

*   **generate_js_parser_tables.py** - A script to generate a JS parser
    based on *es-simplified.esgrammar*.  Read on for instructions.


## How to run it

To generate a parser, follow these steps:

```console
$ cd ..
$ make init
$ make all
```

**Note:** The last step currently takes about 35 seconds to run on my
laptop.  jsparagus is slow.

Once you're done, to see your parser run, try this:

```console
$ cd crates/driver
$ cargo run --release
```

The build also produces a copy of the JS parser in Python.
After `make all`, you can use `make jsdemo` to run that.


### How simplified is "es-simplified"?

Here are the differences between *es.esgrammar*, the actual ES grammar,
and *es-simplified.esgrammar*, the simplified version that jsparagus can
actually handle:

*   The four productions with [~Yield] and [~Await] conditions are dropped.
    This means that `yield` and `await` do not match *IdentifierReference*
    or *LabelIdentifier*. I think it's better to do that in the lexer.

*   Truncated lookahead.

    `ValueError: unsupported: lookahead > 1 token, [['{'], ['function'], ['async', ('no-LineTerminator-here',), 'function'], ['class'], ['let', '[']]`

*   Delete a rule that uses `but not` since it's not implemented.

        Identifier :
          IdentifierName but not ReservedWord

    Making sense of this rule in the context of an LR parser is an
    interesting task; see issue #28.

*   Ban loops of the form `for (async of EXPR) STMT` by adjusting a
    lookahead assertion. The grammar is not LR(1).