summaryrefslogtreecommitdiffstats
path: root/third_party/rust/aho-corasick/src/packed/mod.rs
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/rust/aho-corasick/src/packed/mod.rs')
-rw-r--r--third_party/rust/aho-corasick/src/packed/mod.rs120
1 files changed, 120 insertions, 0 deletions
diff --git a/third_party/rust/aho-corasick/src/packed/mod.rs b/third_party/rust/aho-corasick/src/packed/mod.rs
new file mode 100644
index 0000000000..3990bc9330
--- /dev/null
+++ b/third_party/rust/aho-corasick/src/packed/mod.rs
@@ -0,0 +1,120 @@
+/*!
+Provides packed multiple substring search, principally for a small number of
+patterns.
+
+This sub-module provides vectorized routines for quickly finding
+matches of a small number of patterns. In general, users of this crate
+shouldn't need to interface with this module directly, as the primary
+[`AhoCorasick`](crate::AhoCorasick) searcher will use these routines
+automatically as a prefilter when applicable. However, in some cases, callers
+may want to bypass the Aho-Corasick machinery entirely and use this vectorized
+searcher directly.
+
+# Overview
+
+The primary types in this sub-module are:
+
+* [`Searcher`] executes the actual search algorithm to report matches in a
+haystack.
+* [`Builder`] accumulates patterns incrementally and can construct a
+`Searcher`.
+* [`Config`] permits tuning the searcher, and itself will produce a `Builder`
+(which can then be used to build a `Searcher`). Currently, the only tuneable
+knob are the match semantics, but this may be expanded in the future.
+
+# Examples
+
+This example shows how to create a searcher from an iterator of patterns.
+By default, leftmost-first match semantics are used. (See the top-level
+[`MatchKind`] type for more details about match semantics, which apply
+similarly to packed substring search.)
+
+```
+use aho_corasick::{packed::{MatchKind, Searcher}, PatternID};
+
+# fn example() -> Option<()> {
+let searcher = Searcher::new(["foobar", "foo"].iter().cloned())?;
+let matches: Vec<PatternID> = searcher
+ .find_iter("foobar")
+ .map(|mat| mat.pattern())
+ .collect();
+assert_eq!(vec![PatternID::ZERO], matches);
+# Some(()) }
+# if cfg!(all(feature = "std", any(
+# target_arch = "x86_64", target_arch = "aarch64",
+# ))) {
+# example().unwrap()
+# } else {
+# assert!(example().is_none());
+# }
+```
+
+This example shows how to use [`Config`] to change the match semantics to
+leftmost-longest:
+
+```
+use aho_corasick::{packed::{Config, MatchKind}, PatternID};
+
+# fn example() -> Option<()> {
+let searcher = Config::new()
+ .match_kind(MatchKind::LeftmostLongest)
+ .builder()
+ .add("foo")
+ .add("foobar")
+ .build()?;
+let matches: Vec<PatternID> = searcher
+ .find_iter("foobar")
+ .map(|mat| mat.pattern())
+ .collect();
+assert_eq!(vec![PatternID::must(1)], matches);
+# Some(()) }
+# if cfg!(all(feature = "std", any(
+# target_arch = "x86_64", target_arch = "aarch64",
+# ))) {
+# example().unwrap()
+# } else {
+# assert!(example().is_none());
+# }
+```
+
+# Packed substring searching
+
+Packed substring searching refers to the use of SIMD (Single Instruction,
+Multiple Data) to accelerate the detection of matches in a haystack. Unlike
+conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring
+search tend to do better with a small number of patterns, where as Aho-Corasick
+generally maintains reasonably consistent performance regardless of the number
+of patterns you give it. Because of this, the vectorized searcher in this
+sub-module cannot be used as a general purpose searcher, since building the
+searcher may fail even when given a small number of patterns. However, in
+exchange, when searching for a small number of patterns, searching can be quite
+a bit faster than Aho-Corasick (sometimes by an order of magnitude).
+
+The key take away here is that constructing a searcher from a list of patterns
+is a fallible operation with no clear rules for when it will fail. While the
+precise conditions under which building a searcher can fail is specifically an
+implementation detail, here are some common reasons:
+
+* Too many patterns were given. Typically, the limit is on the order of 100 or
+ so, but this limit may fluctuate based on available CPU features.
+* The available packed algorithms require CPU features that aren't available.
+ For example, currently, this crate only provides packed algorithms for
+ `x86_64` and `aarch64`. Therefore, constructing a packed searcher on any
+ other target will always fail.
+* Zero patterns were given, or one of the patterns given was empty. Packed
+ searchers require at least one pattern and that all patterns are non-empty.
+* Something else about the nature of the patterns (typically based on
+ heuristics) suggests that a packed searcher would perform very poorly, so
+ no searcher is built.
+*/
+
+pub use crate::packed::api::{Builder, Config, FindIter, MatchKind, Searcher};
+
+mod api;
+mod ext;
+mod pattern;
+mod rabinkarp;
+mod teddy;
+#[cfg(all(feature = "std", test))]
+mod tests;
+mod vector;