summaryrefslogtreecommitdiffstats
path: root/third_party/rust/regex-automata/tests/gen/README.md
blob: 59439a11fd2506e5967b437122a37df9cb775d94 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
This directory contains tests for serialized objects from the regex-automata
crate. Currently, there are only two supported such objects: dense and sparse
DFAs.

The idea behind these tests is to commit some serialized objects and run some
basic tests by deserializing them and running searches and ensuring they are
correct. We also make sure these are run under Miri, since deserialization is
one of the biggest places where undefined behavior might occur in this crate
(at the time of writing).

The main thing we're testing is that the *current* code can still deserialize
*old* objects correctly. Generally speaking, compatibility extends to semver
compatible releases of this crate. Beyond that, no promises are made, although
in practice callers can at least depend on errors occurring. (The serialized
format always includes a version number, and incompatible changes increment
that version number such that an error will occur if an unsupported version is
detected.)

To generate the dense DFAs, I used this command:

```
$ regex-cli generate serialize dense regex \
    MULTI_PATTERN_V2 \
    tests/gen/dense/ \
    --rustfmt \
    --safe \
    --starts-for-each-pattern \
    --specialize-start-states \
    --start-kind both \
    --unicode-word-boundary \
    --minimize \
    '\b[a-zA-Z]+\b' \
    '(?m)^\S+$' \
    '(?Rm)^\S+$'
```

And to generate the sparse DFAs, I used this command, which is the same as
above, but with `s/dense/sparse/g`.

```
$ regex-cli generate serialize sparse regex \
    MULTI_PATTERN_V2 \
    tests/gen/sparse/ \
    --rustfmt \
    --safe \
    --starts-for-each-pattern \
    --specialize-start-states \
    --start-kind both \
    --unicode-word-boundary \
    --minimize \
    '\b[a-zA-Z]+\b' \
    '(?m)^\S+$' \
    '(?Rm)^\S+$'
```

The idea is to try to enable as many of the DFA's options as possible in order
to test that serialization works for all of them.

Arguably we should increase test coverage here, but this is a start. Note
that in particular, this does not need to test that serialization and
deserialization correctly roundtrips on its own. Indeed, the normal regex test
suite has a test that does a serialization round trip for every test supported
by DFAs. So that has very good coverage. What we're interested in testing here
is our compatibility promise: do DFAs generated with an older revision of the
code still deserialize correctly?