summaryrefslogtreecommitdiffstats
path: root/third_party/rust/regex-automata/tests/gen/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/rust/regex-automata/tests/gen/README.md')
-rw-r--r--third_party/rust/regex-automata/tests/gen/README.md65
1 files changed, 65 insertions, 0 deletions
diff --git a/third_party/rust/regex-automata/tests/gen/README.md b/third_party/rust/regex-automata/tests/gen/README.md
new file mode 100644
index 0000000000..59439a11fd
--- /dev/null
+++ b/third_party/rust/regex-automata/tests/gen/README.md
@@ -0,0 +1,65 @@
+This directory contains tests for serialized objects from the regex-automata
+crate. Currently, there are only two supported such objects: dense and sparse
+DFAs.
+
+The idea behind these tests is to commit some serialized objects and run some
+basic tests by deserializing them and running searches and ensuring they are
+correct. We also make sure these are run under Miri, since deserialization is
+one of the biggest places where undefined behavior might occur in this crate
+(at the time of writing).
+
+The main thing we're testing is that the *current* code can still deserialize
+*old* objects correctly. Generally speaking, compatibility extends to semver
+compatible releases of this crate. Beyond that, no promises are made, although
+in practice callers can at least depend on errors occurring. (The serialized
+format always includes a version number, and incompatible changes increment
+that version number such that an error will occur if an unsupported version is
+detected.)
+
+To generate the dense DFAs, I used this command:
+
+```
+$ regex-cli generate serialize dense regex \
+ MULTI_PATTERN_V2 \
+ tests/gen/dense/ \
+ --rustfmt \
+ --safe \
+ --starts-for-each-pattern \
+ --specialize-start-states \
+ --start-kind both \
+ --unicode-word-boundary \
+ --minimize \
+ '\b[a-zA-Z]+\b' \
+ '(?m)^\S+$' \
+ '(?Rm)^\S+$'
+```
+
+And to generate the sparse DFAs, I used this command, which is the same as
+above, but with `s/dense/sparse/g`.
+
+```
+$ regex-cli generate serialize sparse regex \
+ MULTI_PATTERN_V2 \
+ tests/gen/sparse/ \
+ --rustfmt \
+ --safe \
+ --starts-for-each-pattern \
+ --specialize-start-states \
+ --start-kind both \
+ --unicode-word-boundary \
+ --minimize \
+ '\b[a-zA-Z]+\b' \
+ '(?m)^\S+$' \
+ '(?Rm)^\S+$'
+```
+
+The idea is to try to enable as many of the DFA's options as possible in order
+to test that serialization works for all of them.
+
+Arguably we should increase test coverage here, but this is a start. Note
+that in particular, this does not need to test that serialization and
+deserialization correctly roundtrips on its own. Indeed, the normal regex test
+suite has a test that does a serialization round trip for every test supported
+by DFAs. So that has very good coverage. What we're interested in testing here
+is our compatibility promise: do DFAs generated with an older revision of the
+code still deserialize correctly?