diff options
Diffstat (limited to 'third_party/rust/regex-syntax/README.md')
-rw-r--r-- | third_party/rust/regex-syntax/README.md | 98 |
1 files changed, 98 insertions, 0 deletions
diff --git a/third_party/rust/regex-syntax/README.md b/third_party/rust/regex-syntax/README.md new file mode 100644 index 0000000000..592f842686 --- /dev/null +++ b/third_party/rust/regex-syntax/README.md @@ -0,0 +1,98 @@ +regex-syntax +============ +This crate provides a robust regular expression parser. + +[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions) +[![Crates.io](https://img.shields.io/crates/v/regex-syntax.svg)](https://crates.io/crates/regex-syntax) +[![Rust](https://img.shields.io/badge/rust-1.28.0%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex) + + +### Documentation + +https://docs.rs/regex-syntax + + +### Overview + +There are two primary types exported by this crate: `Ast` and `Hir`. The former +is a faithful abstract syntax of a regular expression, and can convert regular +expressions back to their concrete syntax while mostly preserving its original +form. The latter type is a high level intermediate representation of a regular +expression that is amenable to analysis and compilation into byte codes or +automata. An `Hir` achieves this by drastically simplifying the syntactic +structure of the regular expression. While an `Hir` can be converted back to +its equivalent concrete syntax, the result is unlikely to resemble the original +concrete syntax that produced the `Hir`. + + +### Example + +This example shows how to parse a pattern string into its HIR: + +```rust +use regex_syntax::Parser; +use regex_syntax::hir::{self, Hir}; + +let hir = Parser::new().parse("a|b").unwrap(); +assert_eq!(hir, Hir::alternation(vec![ + Hir::literal(hir::Literal::Unicode('a')), + Hir::literal(hir::Literal::Unicode('b')), +])); +``` + + +### Safety + +This crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's +possible this crate could use `unsafe` code in the future, the standard +for doing so is extremely high. In general, most code in this crate is not +performance critical, since it tends to be dwarfed by the time it takes to +compile a regular expression into an automaton. Therefore, there is little need +for extreme optimization, and therefore, use of `unsafe`. + +The standard for using `unsafe` in this crate is extremely high because this +crate is intended to be reasonably safe to use with user supplied regular +expressions. Therefore, while there may be bugs in the regex parser itself, +they should _never_ result in memory unsafety unless there is either a bug +in the compiler or the standard library. (Since `regex-syntax` has zero +dependencies.) + + +### Crate features + +By default, this crate bundles a fairly large amount of Unicode data tables +(a source size of ~750KB). Because of their large size, one can disable some +or all of these data tables. If a regular expression attempts to use Unicode +data that is not available, then an error will occur when translating the `Ast` +to the `Hir`. + +The full set of features one can disable are +[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features). + + +### Testing + +Simply running `cargo test` will give you very good coverage. However, because +of the large number of features exposed by this crate, a `test` script is +included in this directory which will test several feature combinations. This +is the same script that is run in CI. + + +### Motivation + +The primary purpose of this crate is to provide the parser used by `regex`. +Specifically, this crate is treated as an implementation detail of the `regex`, +and is primarily developed for the needs of `regex`. + +Since this crate is an implementation detail of `regex`, it may experience +breaking change releases at a different cadence from `regex`. This is only +possible because this crate is _not_ a public dependency of `regex`. + +Another consequence of this de-coupling is that there is no direct way to +compile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must +first convert the `Hir` to a string (via its `std::fmt::Display`) and then +compile that via `Regex::new`. While this does repeat some work, compilation +typically takes much longer than parsing. + +Stated differently, the coupling between `regex` and `regex-syntax` exists only +at the level of the concrete syntax. |