summaryrefslogtreecommitdiffstats
path: root/third_party/rust/regex-syntax/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/rust/regex-syntax/README.md')
-rw-r--r--third_party/rust/regex-syntax/README.md98
1 files changed, 98 insertions, 0 deletions
diff --git a/third_party/rust/regex-syntax/README.md b/third_party/rust/regex-syntax/README.md
new file mode 100644
index 0000000000..592f842686
--- /dev/null
+++ b/third_party/rust/regex-syntax/README.md
@@ -0,0 +1,98 @@
+regex-syntax
+============
+This crate provides a robust regular expression parser.
+
+[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions)
+[![Crates.io](https://img.shields.io/crates/v/regex-syntax.svg)](https://crates.io/crates/regex-syntax)
+[![Rust](https://img.shields.io/badge/rust-1.28.0%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
+
+
+### Documentation
+
+https://docs.rs/regex-syntax
+
+
+### Overview
+
+There are two primary types exported by this crate: `Ast` and `Hir`. The former
+is a faithful abstract syntax of a regular expression, and can convert regular
+expressions back to their concrete syntax while mostly preserving its original
+form. The latter type is a high level intermediate representation of a regular
+expression that is amenable to analysis and compilation into byte codes or
+automata. An `Hir` achieves this by drastically simplifying the syntactic
+structure of the regular expression. While an `Hir` can be converted back to
+its equivalent concrete syntax, the result is unlikely to resemble the original
+concrete syntax that produced the `Hir`.
+
+
+### Example
+
+This example shows how to parse a pattern string into its HIR:
+
+```rust
+use regex_syntax::Parser;
+use regex_syntax::hir::{self, Hir};
+
+let hir = Parser::new().parse("a|b").unwrap();
+assert_eq!(hir, Hir::alternation(vec![
+ Hir::literal(hir::Literal::Unicode('a')),
+ Hir::literal(hir::Literal::Unicode('b')),
+]));
+```
+
+
+### Safety
+
+This crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's
+possible this crate could use `unsafe` code in the future, the standard
+for doing so is extremely high. In general, most code in this crate is not
+performance critical, since it tends to be dwarfed by the time it takes to
+compile a regular expression into an automaton. Therefore, there is little need
+for extreme optimization, and therefore, use of `unsafe`.
+
+The standard for using `unsafe` in this crate is extremely high because this
+crate is intended to be reasonably safe to use with user supplied regular
+expressions. Therefore, while there may be bugs in the regex parser itself,
+they should _never_ result in memory unsafety unless there is either a bug
+in the compiler or the standard library. (Since `regex-syntax` has zero
+dependencies.)
+
+
+### Crate features
+
+By default, this crate bundles a fairly large amount of Unicode data tables
+(a source size of ~750KB). Because of their large size, one can disable some
+or all of these data tables. If a regular expression attempts to use Unicode
+data that is not available, then an error will occur when translating the `Ast`
+to the `Hir`.
+
+The full set of features one can disable are
+[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features).
+
+
+### Testing
+
+Simply running `cargo test` will give you very good coverage. However, because
+of the large number of features exposed by this crate, a `test` script is
+included in this directory which will test several feature combinations. This
+is the same script that is run in CI.
+
+
+### Motivation
+
+The primary purpose of this crate is to provide the parser used by `regex`.
+Specifically, this crate is treated as an implementation detail of the `regex`,
+and is primarily developed for the needs of `regex`.
+
+Since this crate is an implementation detail of `regex`, it may experience
+breaking change releases at a different cadence from `regex`. This is only
+possible because this crate is _not_ a public dependency of `regex`.
+
+Another consequence of this de-coupling is that there is no direct way to
+compile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must
+first convert the `Hir` to a string (via its `std::fmt::Display`) and then
+compile that via `Regex::new`. While this does repeat some work, compilation
+typically takes much longer than parsing.
+
+Stated differently, the coupling between `regex` and `regex-syntax` exists only
+at the level of the concrete syntax.