Merging upstream version 1.67.1+dfsg1.

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-17 12:18:25 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-17 12:18:25 +0000
commit: 5363f350887b1e5b5dd21a86f88c8af9d7fea6da (patch)
tree: 35ca005eb6e0e9a1ba3bb5dbc033209ad445dc17 /vendor/regex-automata/src/nfa/range_trie.rs
parent: Adding debian version 1.66.0+dfsg1-1. (diff)
download: rustc-5363f350887b1e5b5dd21a86f88c8af9d7fea6da.tar.xz
rustc-5363f350887b1e5b5dd21a86f88c8af9d7fea6da.zip
1 files changed, 0 insertions, 1048 deletions
diff --git a/vendor/regex-automata/src/nfa/range_trie.rs b/vendor/regex-automata/src/nfa/range_trie.rs
deleted file mode 100644
index 50767c7c6..000000000
--- a/vendor/regex-automata/src/nfa/range_trie.rs
+++ /dev/null
@@ -1,1048 +0,0 @@
-// I've called the primary data structure in this module a "range trie." As far
-// as I can tell, there is no prior art on a data structure like this, however,
-// it's likely someone somewhere has built something like it. Searching for
-// "range trie" turns up the paper "Range Tries for Scalable Address Lookup,"
-// but it does not appear relevant.
-//
-// The range trie is just like a trie in that it is a special case of a
-// deterministic finite state machine. It has states and each state has a set
-// of transitions to other states. It is acyclic, and, like a normal trie,
-// it makes no attempt to reuse common suffixes among its elements. The key
-// difference between a normal trie and a range trie below is that a range trie
-// operates on *contiguous sequences* of bytes instead of singleton bytes.
-// One could say say that our alphabet is ranges of bytes instead of bytes
-// themselves, except a key part of range trie construction is splitting ranges
-// apart to ensure there is at most one transition that can be taken for any
-// byte in a given state.
-//
-// I've tried to explain the details of how the range trie works below, so
-// for now, we are left with trying to understand what problem we're trying to
-// solve. Which is itself fairly involved!
-//
-// At the highest level, here's what we want to do. We want to convert a
-// sequence of Unicode codepoints into a finite state machine whose transitions
-// are over *bytes* and *not* Unicode codepoints. We want this because it makes
-// said finite state machines much smaller and much faster to execute. As a
-// simple example, consider a byte oriented automaton for all Unicode scalar
-// values (0x00 through 0x10FFFF, not including surrogate codepoints):
-//
-//     [00-7F]
-//     [C2-DF][80-BF]
-//     [E0-E0][A0-BF][80-BF]
-//     [E1-EC][80-BF][80-BF]
-//     [ED-ED][80-9F][80-BF]
-//     [EE-EF][80-BF][80-BF]
-//     [F0-F0][90-BF][80-BF][80-BF]
-//     [F1-F3][80-BF][80-BF][80-BF]
-//     [F4-F4][80-8F][80-BF][80-BF]
-//
-// (These byte ranges are generated via the regex-syntax::utf8 module, which
-// was based on Russ Cox's code in RE2, which was in turn based on Ken
-// Thompson's implementation of the same idea in his Plan9 implementation of
-// grep.)
-//
-// It should be fairly straight-forward to see how one could compile this into
-// a DFA. The sequences are sorted and non-overlapping. Essentially, you could
-// build a trie from this fairly easy. The problem comes when your initial
-// range (in this case, 0x00-0x10FFFF) isn't so nice. For example, the class
-// represented by '\w' contains only a tenth of the codepoints that
-// 0x00-0x10FFFF contains, but if we were to write out the byte based ranges
-// as we did above, the list would stretch to 892 entries! This turns into
-// quite a large NFA with a few thousand states. Turning this beast into a DFA
-// takes quite a bit of time. We are thus left with trying to trim down the
-// number of states we produce as early as possible.
-//
-// One approach (used by RE2 and still by the regex crate, at time of writing)
-// is to try to find common suffixes while building NFA states for the above
-// and reuse them. This is very cheap to do and one can control precisely how
-// much extra memory you want to use for the cache.
-//
-// Another approach, however, is to reuse an algorithm for constructing a
-// *minimal* DFA from a sorted sequence of inputs. I don't want to go into
-// the full details here, but I explain it in more depth in my blog post on
-// FSTs[1]. Note that the algorithm not invented by me, but was published
-// in paper by Daciuk et al. in 2000 called "Incremental Construction of
-// MinimalAcyclic Finite-State Automata." Like the suffix cache approach above,
-// it is also possible to control the amount of extra memory one uses, although
-// this usually comes with the cost of sacrificing true minimality. (But it's
-// typically close enough with a reasonably sized cache of states.)
-//
-// The catch is that Daciuk's algorithm only works if you add your keys in
-// lexicographic ascending order. In our case, since we're dealing with ranges,
-// we also need the additional requirement that ranges are either equivalent
-// or do not overlap at all. For example, if one were given the following byte
-// ranges:
-//
-//     [BC-BF][80-BF]
-//     [BC-BF][90-BF]
-//
-// Then Daciuk's algorithm also would not work, since there is nothing to
-// handle the fact that the ranges overlap. They would need to be split apart.
-// Thankfully, Thompson's algorithm for producing byte ranges for Unicode
-// codepoint ranges meets both of our requirements.
-//
-// ... however, we would also like to be able to compile UTF-8 automata in
-// reverse. We want this because in order to find the starting location of a
-// match using a DFA, we need to run a second DFA---a reversed version of the
-// forward DFA---backwards to discover the match location. Unfortunately, if
-// we reverse our byte sequences for 0x00-0x10FFFF, we get sequences that are
-// can overlap, even if they are sorted:
-//
-//     [00-7F]
-//     [80-BF][80-9F][ED-ED]
-//     [80-BF][80-BF][80-8F][F4-F4]
-//     [80-BF][80-BF][80-BF][F1-F3]
-//     [80-BF][80-BF][90-BF][F0-F0]
-//     [80-BF][80-BF][E1-EC]
-//     [80-BF][80-BF][EE-EF]
-//     [80-BF][A0-BF][E0-E0]
-//     [80-BF][C2-DF]
-//
-// For example, '[80-BF][80-BF][EE-EF]' and '[80-BF][A0-BF][E0-E0]' have
-// overlapping ranges between '[80-BF]' and '[A0-BF]'. Thus, there is no
-// simple way to apply Daciuk's algorithm.
-//
-// And thus, the range trie was born. The range trie's only purpose is to take
-// sequences of byte ranges like the ones above, collect them into a trie and
-// then spit them in a sorted fashion with no overlapping ranges. For example,
-// 0x00-0x10FFFF gets translated to:
-//
-//     [0-7F]
-//     [80-BF][80-9F][80-8F][F1-F3]
-//     [80-BF][80-9F][80-8F][F4]
-//     [80-BF][80-9F][90-BF][F0]
-//     [80-BF][80-9F][90-BF][F1-F3]
-//     [80-BF][80-9F][E1-EC]
-//     [80-BF][80-9F][ED]
-//     [80-BF][80-9F][EE-EF]
-//     [80-BF][A0-BF][80-8F][F1-F3]
-//     [80-BF][A0-BF][80-8F][F4]
-//     [80-BF][A0-BF][90-BF][F0]
-//     [80-BF][A0-BF][90-BF][F1-F3]
-//     [80-BF][A0-BF][E0]
-//     [80-BF][A0-BF][E1-EC]
-//     [80-BF][A0-BF][EE-EF]
-//     [80-BF][C2-DF]
-//
-// We've thus satisfied our requirements for running Daciuk's algorithm. All
-// sequences of ranges are sorted, and any corresponding ranges are either
-// exactly equivalent or non-overlapping.
-//
-// In effect, a range trie is building a DFA from a sequence of arbitrary
-// byte ranges. But it uses an algoritm custom tailored to its input, so it
-// is not as costly as traditional DFA construction. While it is still quite
-// a bit more costly than the forward's case (which only needs Daciuk's
-// algorithm), it winds up saving a substantial amount of time if one is doing
-// a full DFA powerset construction later by virtue of producing a much much
-// smaller NFA.
-//
-// [1] - https://blog.burntsushi.net/transducers/
-// [2] - https://www.mitpressjournals.org/doi/pdfplus/10.1162/089120100561601
-
-use std::cell::RefCell;
-use std::fmt;
-use std::mem;
-use std::ops::RangeInclusive;
-use std::u32;
-
-use regex_syntax::utf8::Utf8Range;
-
-/// A smaller state ID means more effective use of the CPU cache and less
-/// time spent copying. The implementation below will panic if the state ID
-/// space is exhausted, but in order for that to happen, the range trie itself
-/// would use well over 100GB of memory. Moreover, it's likely impossible
-/// for the state ID space to get that big. In fact, it's likely that even a
-/// u16 would be good enough here. But it's not quite clear how to prove this.
-type StateID = u32;
-
-/// There is only one final state in this trie. Every sequence of byte ranges
-/// added shares the same final state.
-const FINAL: StateID = 0;
-
-/// The root state of the trie.
-const ROOT: StateID = 1;
-
-/// A range trie represents an ordered set of sequences of bytes.
-///
-/// A range trie accepts as input a sequence of byte ranges and merges
-/// them into the existing set such that the trie can produce a sorted
-/// non-overlapping sequence of byte ranges. The sequence emitted corresponds
-/// precisely to the sequence of bytes matched by the given keys, although the
-/// byte ranges themselves may be split at different boundaries.
-///
-/// The order complexity of this data structure seems difficult to analyze.
-/// If the size of a byte is held as a constant, then insertion is clearly
-/// O(n) where n is the number of byte ranges in the input key. However, if
-/// k=256 is our alphabet size, then insertion could be O(k^2 * n). In
-/// particular it seems possible for pathological inputs to cause insertion
-/// to do a lot of work. However, for what we use this data structure for,
-/// there should be no pathological inputs since the ultimate source is always
-/// a sorted set of Unicode scalar value ranges.
-///
-/// Internally, this trie is setup like a finite state machine. Note though
-/// that it is acyclic.
-#[derive(Clone)]
-pub struct RangeTrie {
-    /// The states in this trie. The first is always the shared final state.
-    /// The second is always the root state. Otherwise, there is no
-    /// particular order.
-    states: Vec<State>,
-    /// A free-list of states. When a range trie is cleared, all of its states
-    /// are added to list. Creating a new state reuses states from this list
-    /// before allocating a new one.
-    free: Vec<State>,
-    /// A stack for traversing this trie to yield sequences of byte ranges in
-    /// lexicographic order.
-    iter_stack: RefCell<Vec<NextIter>>,
-    /// A bufer that stores the current sequence during iteration.
-    iter_ranges: RefCell<Vec<Utf8Range>>,
-    /// A stack used for traversing the trie in order to (deeply) duplicate
-    /// a state.
-    dupe_stack: Vec<NextDupe>,
-    /// A stack used for traversing the trie during insertion of a new
-    /// sequence of byte ranges.
-    insert_stack: Vec<NextInsert>,
-}
-
-/// A single state in this trie.
-#[derive(Clone)]
-struct State {
-    /// A sorted sequence of non-overlapping transitions to other states. Each
-    /// transition corresponds to a single range of bytes.
-    transitions: Vec<Transition>,
-}
-
-/// A transition is a single range of bytes. If a particular byte is in this
-/// range, then the corresponding machine may transition to the state pointed
-/// to by `next_id`.
-#[derive(Clone)]
-struct Transition {
-    /// The byte range.
-    range: Utf8Range,
-    /// The next state to transition to.
-    next_id: StateID,
-}
-
-impl RangeTrie {
-    /// Create a new empty range trie.
-    pub fn new() -> RangeTrie {
-        let mut trie = RangeTrie {
-            states: vec![],
-            free: vec![],
-            iter_stack: RefCell::new(vec![]),
-            iter_ranges: RefCell::new(vec![]),
-            dupe_stack: vec![],
-            insert_stack: vec![],
-        };
-        trie.clear();
-        trie
-    }
-
-    /// Clear this range trie such that it is empty. Clearing a range trie
-    /// and reusing it can beneficial because this may reuse allocations.
-    pub fn clear(&mut self) {
-        self.free.extend(self.states.drain(..));
-        self.add_empty(); // final
-        self.add_empty(); // root
-    }
-
-    /// Iterate over all of the sequences of byte ranges in this trie, and
-    /// call the provided function for each sequence. Iteration occurs in
-    /// lexicographic order.
-    pub fn iter<F: FnMut(&[Utf8Range])>(&self, mut f: F) {
-        let mut stack = self.iter_stack.borrow_mut();
-        stack.clear();
-        let mut ranges = self.iter_ranges.borrow_mut();
-        ranges.clear();
-
-        // We do iteration in a way that permits us to use a single buffer
-        // for our keys. We iterate in a depth first fashion, while being
-        // careful to expand our frontier as we move deeper in the trie.
-        stack.push(NextIter { state_id: ROOT, tidx: 0 });
-        while let Some(NextIter { mut state_id, mut tidx }) = stack.pop() {
-            // This could be implemented more simply without an inner loop
-            // here, but at the cost of more stack pushes.
-            loop {
-                let state = self.state(state_id);
-                // If we're visited all transitions in this state, then pop
-                // back to the parent state.
-                if tidx >= state.transitions.len() {
-                    ranges.pop();
-                    break;
-                }
-
-                let t = &state.transitions[tidx];
-                ranges.push(t.range);
-                if t.next_id == FINAL {
-                    f(&ranges);
-                    ranges.pop();
-                    tidx += 1;
-                } else {
-                    // Expand our frontier. Once we come back to this state
-                    // via the stack, start in on the next transition.
-                    stack.push(NextIter { state_id, tidx: tidx + 1 });
-                    // Otherwise, move to the first transition of the next
-                    // state.
-                    state_id = t.next_id;
-                    tidx = 0;
-                }
-            }
-        }
-    }
-
-    /// Inserts a new sequence of ranges into this trie.
-    ///
-    /// The sequence given must be non-empty and must not have a length
-    /// exceeding 4.
-    pub fn insert(&mut self, ranges: &[Utf8Range]) {
-        assert!(!ranges.is_empty());
-        assert!(ranges.len() <= 4);
-
-        let mut stack = mem::replace(&mut self.insert_stack, vec![]);
-        stack.clear();
-
-        stack.push(NextInsert::new(ROOT, ranges));
-        while let Some(next) = stack.pop() {
-            let (state_id, ranges) = (next.state_id(), next.ranges());
-            assert!(!ranges.is_empty());
-
-            let (mut new, rest) = (ranges[0], &ranges[1..]);
-
-            // i corresponds to the position of the existing transition on
-            // which we are operating. Typically, the result is to remove the
-            // transition and replace it with two or more new transitions
-            // corresponding to the partitions generated by splitting the
-            // 'new' with the ith transition's range.
-            let mut i = self.state(state_id).find(new);
-
-            // In this case, there is no overlap *and* the new range is greater
-            // than all existing ranges. So we can just add it to the end.
-            if i == self.state(state_id).transitions.len() {
-                let next_id = NextInsert::push(self, &mut stack, rest);
-                self.add_transition(state_id, new, next_id);
-                continue;
-            }
-
-            // The need for this loop is a bit subtle, buf basically, after
-            // we've handled the partitions from our initial split, it's
-            // possible that there will be a partition leftover that overlaps
-            // with a subsequent transition. If so, then we have to repeat
-            // the split process again with the leftovers and that subsequent
-            // transition.
-            'OUTER: loop {
-                let old = self.state(state_id).transitions[i].clone();
-                let split = match Split::new(old.range, new) {
-                    Some(split) => split,
-                    None => {
-                        let next_id = NextInsert::push(self, &mut stack, rest);
-                        self.add_transition_at(i, state_id, new, next_id);
-                        continue;
-                    }
-                };
-                let splits = split.as_slice();
-                // If we only have one partition, then the ranges must be
-                // equivalent. There's nothing to do here for this state, so
-                // just move on to the next one.
-                if splits.len() == 1 {
-                    // ... but only if we have anything left to do.
-                    if !rest.is_empty() {
-                        stack.push(NextInsert::new(old.next_id, rest));
-                    }
-                    break;
-                }
-                // At this point, we know that 'split' is non-empty and there
-                // must be some overlap AND that the two ranges are not
-                // equivalent. Therefore, the existing range MUST be removed
-                // and split up somehow. Instead of actually doing the removal
-                // and then a subsequent insertion---with all the memory
-                // shuffling that entails---we simply overwrite the transition
-                // at position `i` for the first new transition we want to
-                // insert. After that, we're forced to do expensive inserts.
-                let mut first = true;
-                let mut add_trans =
-                    |trie: &mut RangeTrie, pos, from, range, to| {
-                        if first {
-                            trie.set_transition_at(pos, from, range, to);
-                            first = false;
-                        } else {
-                            trie.add_transition_at(pos, from, range, to);
-                        }
-                    };
-                for (j, &srange) in splits.iter().enumerate() {
-                    match srange {
-                        SplitRange::Old(r) => {
-                            // Deep clone the state pointed to by the ith
-                            // transition. This is always necessary since 'old'
-                            // is always coupled with at least a 'both'
-                            // partition. We don't want any new changes made
-                            // via the 'both' partition to impact the part of
-                            // the transition that doesn't overlap with the
-                            // new range.
-                            let dup_id = self.duplicate(old.next_id);
-                            add_trans(self, i, state_id, r, dup_id);
-                        }
-                        SplitRange::New(r) => {
-                            // This is a bit subtle, but if this happens to be
-                            // the last partition in our split, it is possible
-                            // that this overlaps with a subsequent transition.
-                            // If it does, then we must repeat the whole
-                            // splitting process over again with `r` and the
-                            // subsequent transition.
-                            {
-                                let trans = &self.state(state_id).transitions;
-                                if j + 1 == splits.len()
-                                    && i < trans.len()
-                                    && intersects(r, trans[i].range)
-                                {
-                                    new = r;
-                                    continue 'OUTER;
-                                }
-                            }
-
-                            // ... otherwise, setup exploration for a new
-                            // empty state and add a brand new transition for
-                            // this new range.
-                            let next_id =
-                                NextInsert::push(self, &mut stack, rest);
-                            add_trans(self, i, state_id, r, next_id);
-                        }
-                        SplitRange::Both(r) => {
-                            // Continue adding the remaining ranges on this
-                            // path and update the transition with the new
-                            // range.
-                            if !rest.is_empty() {
-                                stack.push(NextInsert::new(old.next_id, rest));
-                            }
-                            add_trans(self, i, state_id, r, old.next_id);
-                        }
-                    }
-                    i += 1;
-                }
-                // If we've reached this point, then we know that there are
-                // no subsequent transitions with any overlap. Therefore, we
-                // can stop processing this range and move on to the next one.
-                break;
-            }
-        }
-        self.insert_stack = stack;
-    }
-
-    pub fn add_empty(&mut self) -> StateID {
-        if self.states.len() as u64 > u32::MAX as u64 {
-            // This generally should not happen since a range trie is only
-            // ever used to compile a single sequence of Unicode scalar values.
-            // If we ever got to this point, we would, at *minimum*, be using
-            // 96GB in just the range trie alone.
-            panic!("too many sequences added to range trie");
-        }
-        let id = self.states.len() as StateID;
-        // If we have some free states available, then use them to avoid
-        // more allocations.
-        if let Some(mut state) = self.free.pop() {
-            state.clear();
-            self.states.push(state);
-        } else {
-            self.states.push(State { transitions: vec![] });
-        }
-        id
-    }
-
-    /// Performs a deep clone of the given state and returns the duplicate's
-    /// state ID.
-    ///
-    /// A "deep clone" in this context means that the state given along with
-    /// recursively all states that it points to are copied. Once complete,
-    /// the given state ID and the returned state ID share nothing.
-    ///
-    /// This is useful during range trie insertion when a new range overlaps
-    /// with an existing range that is bigger than the new one. The part of
-    /// the existing range that does *not* overlap with the new one is that
-    /// duplicated so that adding the new range to the overlap doesn't disturb
-    /// the non-overlapping portion.
-    ///
-    /// There's one exception: if old_id is the final state, then it is not
-    /// duplicated and the same final state is returned. This is because all
-    /// final states in this trie are equivalent.
-    fn duplicate(&mut self, old_id: StateID) -> StateID {
-        if old_id == FINAL {
-            return FINAL;
-        }
-
-        let mut stack = mem::replace(&mut self.dupe_stack, vec![]);
-        stack.clear();
-
-        let new_id = self.add_empty();
-        // old_id is the state we're cloning and new_id is the ID of the
-        // duplicated state for old_id.
-        stack.push(NextDupe { old_id, new_id });
-        while let Some(NextDupe { old_id, new_id }) = stack.pop() {
-            for i in 0..self.state(old_id).transitions.len() {
-                let t = self.state(old_id).transitions[i].clone();
-                if t.next_id == FINAL {
-                    // All final states are the same, so there's no need to
-                    // duplicate it.
-                    self.add_transition(new_id, t.range, FINAL);
-                    continue;
-                }
-
-                let new_child_id = self.add_empty();
-                self.add_transition(new_id, t.range, new_child_id);
-                stack.push(NextDupe {
-                    old_id: t.next_id,
-                    new_id: new_child_id,
-                });
-            }
-        }
-        self.dupe_stack = stack;
-        new_id
-    }
-
-    /// Adds the given transition to the given state.
-    ///
-    /// Callers must ensure that all previous transitions in this state
-    /// are lexicographically smaller than the given range.
-    fn add_transition(
-        &mut self,
-        from_id: StateID,
-        range: Utf8Range,
-        next_id: StateID,
-    ) {
-        self.state_mut(from_id)
-            .transitions
-            .push(Transition { range, next_id });
-    }
-
-    /// Like `add_transition`, except this inserts the transition just before
-    /// the ith transition.
-    fn add_transition_at(
-        &mut self,
-        i: usize,
-        from_id: StateID,
-        range: Utf8Range,
-        next_id: StateID,
-    ) {
-        self.state_mut(from_id)
-            .transitions
-            .insert(i, Transition { range, next_id });
-    }
-
-    /// Overwrites the transition at position i with the given transition.
-    fn set_transition_at(
-        &mut self,
-        i: usize,
-        from_id: StateID,
-        range: Utf8Range,
-        next_id: StateID,
-    ) {
-        self.state_mut(from_id).transitions[i] = Transition { range, next_id };
-    }
-
-    /// Return an immutable borrow for the state with the given ID.
-    fn state(&self, id: StateID) -> &State {
-        &self.states[id as usize]
-    }
-
-    /// Return a mutable borrow for the state with the given ID.
-    fn state_mut(&mut self, id: StateID) -> &mut State {
-        &mut self.states[id as usize]
-    }
-}
-
-impl State {
-    /// Find the position at which the given range should be inserted in this
-    /// state.
-    ///
-    /// The position returned is always in the inclusive range
-    /// [0, transitions.len()]. If 'transitions.len()' is returned, then the
-    /// given range overlaps with no other range in this state *and* is greater
-    /// than all of them.
-    ///
-    /// For all other possible positions, the given range either overlaps
-    /// with the transition at that position or is otherwise less than it
-    /// with no overlap (and is greater than the previous transition). In the
-    /// former case, careful attention must be paid to inserting this range
-    /// as a new transition. In the latter case, the range can be inserted as
-    /// a new transition at the given position without disrupting any other
-    /// transitions.
-    fn find(&self, range: Utf8Range) -> usize {
-        /// Returns the position `i` at which `pred(xs[i])` first returns true
-        /// such that for all `j >= i`, `pred(xs[j]) == true`. If `pred` never
-        /// returns true, then `xs.len()` is returned.
-        ///
-        /// We roll our own binary search because it doesn't seem like the
-        /// standard library's binary search can be used here. Namely, if
-        /// there is an overlapping range, then we want to find the first such
-        /// occurrence, but there may be many. Or at least, it's not quite
-        /// clear to me how to do it.
-        fn binary_search<T, F>(xs: &[T], mut pred: F) -> usize
-        where
-            F: FnMut(&T) -> bool,
-        {
-            let (mut left, mut right) = (0, xs.len());
-            while left < right {
-                // Overflow is impossible because xs.len() <= 256.
-                let mid = (left + right) / 2;
-                if pred(&xs[mid]) {
-                    right = mid;
-                } else {
-                    left = mid + 1;
-                }
-            }
-            left
-        }
-
-        // Benchmarks suggest that binary search is just a bit faster than
-        // straight linear search. Specifically when using the debug tool:
-        //
-        //   hyperfine "regex-automata-debug debug -acqr '\w{40} ecurB'"
-        binary_search(&self.transitions, |t| range.start <= t.range.end)
-    }
-
-    /// Clear this state such that it has zero transitions.
-    fn clear(&mut self) {
-        self.transitions.clear();
-    }
-}
-
-/// The next state to process during duplication.
-#[derive(Clone, Debug)]
-struct NextDupe {
-    /// The state we want to duplicate.
-    old_id: StateID,
-    /// The ID of the new state that is a duplicate of old_id.
-    new_id: StateID,
-}
-
-/// The next state (and its corresponding transition) that we want to visit
-/// during iteration in lexicographic order.
-#[derive(Clone, Debug)]
-struct NextIter {
-    state_id: StateID,
-    tidx: usize,
-}
-
-/// The next state to process during insertion and any remaining ranges that we
-/// want to add for a partcular sequence of ranges. The first such instance
-/// is always the root state along with all ranges given.
-#[derive(Clone, Debug)]
-struct NextInsert {
-    /// The next state to begin inserting ranges. This state should be the
-    /// state at which `ranges[0]` should be inserted.
-    state_id: StateID,
-    /// The ranges to insert. We used a fixed-size array here to avoid an
-    /// allocation.
-    ranges: [Utf8Range; 4],
-    /// The number of valid ranges in the above array.
-    len: u8,
-}
-
-impl NextInsert {
-    /// Create the next item to visit. The given state ID should correspond
-    /// to the state at which the first range in the given slice should be
-    /// inserted. The slice given must not be empty and it must be no longer
-    /// than 4.
-    fn new(state_id: StateID, ranges: &[Utf8Range]) -> NextInsert {
-        let len = ranges.len();
-        assert!(len > 0);
-        assert!(len <= 4);
-
-        let mut tmp = [Utf8Range { start: 0, end: 0 }; 4];
-        tmp[..len].copy_from_slice(ranges);
-        NextInsert { state_id, ranges: tmp, len: len as u8 }
-    }
-
-    /// Push a new empty state to visit along with any remaining ranges that
-    /// still need to be inserted. The ID of the new empty state is returned.
-    ///
-    /// If ranges is empty, then no new state is created and FINAL is returned.
-    fn push(
-        trie: &mut RangeTrie,
-        stack: &mut Vec<NextInsert>,
-        ranges: &[Utf8Range],
-    ) -> StateID {
-        if ranges.is_empty() {
-            FINAL
-        } else {
-            let next_id = trie.add_empty();
-            stack.push(NextInsert::new(next_id, ranges));
-            next_id
-        }
-    }
-
-    /// Return the ID of the state to visit.
-    fn state_id(&self) -> StateID {
-        self.state_id
-    }
-
-    /// Return the remaining ranges to insert.
-    fn ranges(&self) -> &[Utf8Range] {
-        &self.ranges[..self.len as usize]
-    }
-}
-
-/// Split represents a partitioning of two ranges into one or more ranges. This
-/// is the secret sauce that makes a range trie work, as it's what tells us
-/// how to deal with two overlapping but unequal ranges during insertion.
-///
-/// Essentially, either two ranges overlap or they don't. If they don't, then
-/// handling insertion is easy: just insert the new range into its
-/// lexicographically correct position. Since it does not overlap with anything
-/// else, no other transitions are impacted by the new range.
-///
-/// If they do overlap though, there are generally three possible cases to
-/// handle:
-///
-/// 1. The part where the two ranges actually overlap. i.e., The intersection.
-/// 2. The part of the existing range that is not in the the new range.
-/// 3. The part of the new range that is not in the old range.
-///
-/// (1) is guaranteed to always occur since all overlapping ranges have a
-/// non-empty intersection. If the two ranges are not equivalent, then at
-/// least one of (2) or (3) is guaranteed to occur as well. In some cases,
-/// e.g., `[0-4]` and `[4-9]`, all three cases will occur.
-///
-/// This `Split` type is responsible for providing (1), (2) and (3) for any
-/// possible pair of byte ranges.
-///
-/// As for insertion, for the overlap in (1), the remaining ranges to insert
-/// should be added by following the corresponding transition. However, this
-/// should only be done for the overlapping parts of the range. If there was
-/// a part of the existing range that was not in the new range, then that
-/// existing part must be split off from the transition and duplicated. The
-/// remaining parts of the overlap can then be added to using the new ranges
-/// without disturbing the existing range.
-///
-/// Handling the case for the part of a new range that is not in an existing
-/// range is seemingly easy. Just treat it as if it were a non-overlapping
-/// range. The problem here is that if this new non-overlapping range occurs
-/// after both (1) and (2), then it's possible that it can overlap with the
-/// next transition in the current state. If it does, then the whole process
-/// must be repeated!
-///
-/// # Details of the 3 cases
-///
-/// The following details the various cases that are implemented in code
-/// below. It's plausible that the number of cases is not actually minimal,
-/// but it's important for this code to remain at least somewhat readable.
-///
-/// Given [a,b] and [x,y], where a <= b, x <= y, b < 256 and y < 256, we define
-/// the follow distinct relationships where at least one must apply. The order
-/// of these matters, since multiple can match. The first to match applies.
-///
-///   1. b < x <=> [a,b] < [x,y]
-///   2. y < a <=> [x,y] < [a,b]
-///
-/// In the case of (1) and (2), these are the only cases where there is no
-/// overlap. Or otherwise, the intersection of [a,b] and [x,y] is empty. In
-/// order to compute the intersection, one can do [max(a,x), min(b,y)]. The
-/// intersection in all of the following cases is non-empty.
-///
-///    3. a = x && b = y <=> [a,b] == [x,y]
-///    4. a = x && b < y <=> [x,y] right-extends [a,b]
-///    5. b = y && a > x <=> [x,y] left-extends [a,b]
-///    6. x = a && y < b <=> [a,b] right-extends [x,y]
-///    7. y = b && x > a <=> [a,b] left-extends [x,y]
-///    8. a > x && b < y <=> [x,y] covers [a,b]
-///    9. x > a && y < b <=> [a,b] covers [x,y]
-///   10. b = x && a < y <=> [a,b] is left-adjacent to [x,y]
-///   11. y = a && x < b <=> [x,y] is left-adjacent to [a,b]
-///   12. b > x && b < y <=> [a,b] left-overlaps [x,y]
-///   13. y > a && y < b <=> [x,y] left-overlaps [a,b]
-///
-/// In cases 3-13, we can form rules that partition the ranges into a
-/// non-overlapping ordered sequence of ranges:
-///
-///    3. [a,b]
-///    4. [a,b], [b+1,y]
-///    5. [x,a-1], [a,b]
-///    6. [x,y], [y+1,b]
-///    7. [a,x-1], [x,y]
-///    8. [x,a-1], [a,b], [b+1,y]
-///    9. [a,x-1], [x,y], [y+1,b]
-///   10. [a,b-1], [b,b], [b+1,y]
-///   11. [x,y-1], [y,y], [y+1,b]
-///   12. [a,x-1], [x,b], [b+1,y]
-///   13. [x,a-1], [a,y], [y+1,b]
-///
-/// In the code below, we go a step further and identify each of the above
-/// outputs as belonging either to the overlap of the two ranges or to one
-/// of [a,b] or [x,y] exclusively.
-#[derive(Clone, Debug, Eq, PartialEq)]
-struct Split {
-    partitions: [SplitRange; 3],
-    len: usize,
-}
-
-/// A tagged range indicating how it was derived from a pair of ranges.
-#[derive(Clone, Copy, Debug, Eq, PartialEq)]
-enum SplitRange {
-    Old(Utf8Range),
-    New(Utf8Range),
-    Both(Utf8Range),
-}
-
-impl Split {
-    /// Create a partitioning of the given ranges.
-    ///
-    /// If the given ranges have an empty intersection, then None is returned.
-    fn new(o: Utf8Range, n: Utf8Range) -> Option<Split> {
-        let range = |r: RangeInclusive<u8>| Utf8Range {
-            start: *r.start(),
-            end: *r.end(),
-        };
-        let old = |r| SplitRange::Old(range(r));
-        let new = |r| SplitRange::New(range(r));
-        let both = |r| SplitRange::Both(range(r));
-
-        // Use same names as the comment above to make it easier to compare.
-        let (a, b, x, y) = (o.start, o.end, n.start, n.end);
-
-        if b < x || y < a {
-            // case 1, case 2
-            None
-        } else if a == x && b == y {
-            // case 3
-            Some(Split::parts1(both(a..=b)))
-        } else if a == x && b < y {
-            // case 4
-            Some(Split::parts2(both(a..=b), new(b + 1..=y)))
-        } else if b == y && a > x {
-            // case 5
-            Some(Split::parts2(new(x..=a - 1), both(a..=b)))
-        } else if x == a && y < b {
-            // case 6
-            Some(Split::parts2(both(x..=y), old(y + 1..=b)))
-        } else if y == b && x > a {
-            // case 7
-            Some(Split::parts2(old(a..=x - 1), both(x..=y)))
-        } else if a > x && b < y {
-            // case 8
-            Some(Split::parts3(new(x..=a - 1), both(a..=b), new(b + 1..=y)))
-        } else if x > a && y < b {
-            // case 9
-            Some(Split::parts3(old(a..=x - 1), both(x..=y), old(y + 1..=b)))
-        } else if b == x && a < y {
-            // case 10
-            Some(Split::parts3(old(a..=b - 1), both(b..=b), new(b + 1..=y)))
-        } else if y == a && x < b {
-            // case 11
-            Some(Split::parts3(new(x..=y - 1), both(y..=y), old(y + 1..=b)))
-        } else if b > x && b < y {
-            // case 12
-            Some(Split::parts3(old(a..=x - 1), both(x..=b), new(b + 1..=y)))
-        } else if y > a && y < b {
-            // case 13
-            Some(Split::parts3(new(x..=a - 1), both(a..=y), old(y + 1..=b)))
-        } else {
-            unreachable!()
-        }
-    }
-
-    /// Create a new split with a single partition. This only occurs when two
-    /// ranges are equivalent.
-    fn parts1(r1: SplitRange) -> Split {
-        // This value doesn't matter since it is never accessed.
-        let nada = SplitRange::Old(Utf8Range { start: 0, end: 0 });
-        Split { partitions: [r1, nada, nada], len: 1 }
-    }
-
-    /// Create a new split with two partitions.
-    fn parts2(r1: SplitRange, r2: SplitRange) -> Split {
-        // This value doesn't matter since it is never accessed.
-        let nada = SplitRange::Old(Utf8Range { start: 0, end: 0 });
-        Split { partitions: [r1, r2, nada], len: 2 }
-    }
-
-    /// Create a new split with three partitions.
-    fn parts3(r1: SplitRange, r2: SplitRange, r3: SplitRange) -> Split {
-        Split { partitions: [r1, r2, r3], len: 3 }
-    }
-
-    /// Return the partitions in this split as a slice.
-    fn as_slice(&self) -> &[SplitRange] {
-        &self.partitions[..self.len]
-    }
-}
-
-impl fmt::Debug for RangeTrie {
-    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        writeln!(f, "")?;
-        for (i, state) in self.states.iter().enumerate() {
-            let status = if i == FINAL as usize { '*' } else { ' ' };
-            writeln!(f, "{}{:06}: {:?}", status, i, state)?;
-        }
-        Ok(())
-    }
-}
-
-impl fmt::Debug for State {
-    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        let rs = self
-            .transitions
-            .iter()
-            .map(|t| format!("{:?}", t))
-            .collect::<Vec<String>>()
-            .join(", ");
-        write!(f, "{}", rs)
-    }
-}
-
-impl fmt::Debug for Transition {
-    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        if self.range.start == self.range.end {
-            write!(f, "{:02X} => {:02X}", self.range.start, self.next_id)
-        } else {
-            write!(
-                f,
-                "{:02X}-{:02X} => {:02X}",
-                self.range.start, self.range.end, self.next_id
-            )
-        }
-    }
-}
-
-/// Returns true if and only if the given ranges intersect.
-fn intersects(r1: Utf8Range, r2: Utf8Range) -> bool {
-    !(r1.end < r2.start || r2.end < r1.start)
-}
-
-#[cfg(test)]
-mod tests {
-    use std::ops::RangeInclusive;
-
-    use regex_syntax::utf8::Utf8Range;
-
-    use super::*;
-
-    fn r(range: RangeInclusive<u8>) -> Utf8Range {
-        Utf8Range { start: *range.start(), end: *range.end() }
-    }
-
-    fn split_maybe(
-        old: RangeInclusive<u8>,
-        new: RangeInclusive<u8>,
-    ) -> Option<Split> {
-        Split::new(r(old), r(new))
-    }
-
-    fn split(
-        old: RangeInclusive<u8>,
-        new: RangeInclusive<u8>,
-    ) -> Vec<SplitRange> {
-        split_maybe(old, new).unwrap().as_slice().to_vec()
-    }
-
-    #[test]
-    fn no_splits() {
-        // case 1
-        assert_eq!(None, split_maybe(0..=1, 2..=3));
-        // case 2
-        assert_eq!(None, split_maybe(2..=3, 0..=1));
-    }
-
-    #[test]
-    fn splits() {
-        let range = |r: RangeInclusive<u8>| Utf8Range {
-            start: *r.start(),
-            end: *r.end(),
-        };
-        let old = |r| SplitRange::Old(range(r));
-        let new = |r| SplitRange::New(range(r));
-        let both = |r| SplitRange::Both(range(r));
-
-        // case 3
-        assert_eq!(split(0..=0, 0..=0), vec![both(0..=0)]);
-        assert_eq!(split(9..=9, 9..=9), vec![both(9..=9)]);
-
-        // case 4
-        assert_eq!(split(0..=5, 0..=6), vec![both(0..=5), new(6..=6)]);
-        assert_eq!(split(0..=5, 0..=8), vec![both(0..=5), new(6..=8)]);
-        assert_eq!(split(5..=5, 5..=8), vec![both(5..=5), new(6..=8)]);
-
-        // case 5
-        assert_eq!(split(1..=5, 0..=5), vec![new(0..=0), both(1..=5)]);
-        assert_eq!(split(3..=5, 0..=5), vec![new(0..=2), both(3..=5)]);
-        assert_eq!(split(5..=5, 0..=5), vec![new(0..=4), both(5..=5)]);
-
-        // case 6
-        assert_eq!(split(0..=6, 0..=5), vec![both(0..=5), old(6..=6)]);
-        assert_eq!(split(0..=8, 0..=5), vec![both(0..=5), old(6..=8)]);
-        assert_eq!(split(5..=8, 5..=5), vec![both(5..=5), old(6..=8)]);
-
-        // case 7
-        assert_eq!(split(0..=5, 1..=5), vec![old(0..=0), both(1..=5)]);
-        assert_eq!(split(0..=5, 3..=5), vec![old(0..=2), both(3..=5)]);
-        assert_eq!(split(0..=5, 5..=5), vec![old(0..=4), both(5..=5)]);
-
-        // case 8
-        assert_eq!(
-            split(3..=6, 2..=7),
-            vec![new(2..=2), both(3..=6), new(7..=7)],
-        );
-        assert_eq!(
-            split(3..=6, 1..=8),
-            vec![new(1..=2), both(3..=6), new(7..=8)],
-        );
-
-        // case 9
-        assert_eq!(
-            split(2..=7, 3..=6),
-            vec![old(2..=2), both(3..=6), old(7..=7)],
-        );
-        assert_eq!(
-            split(1..=8, 3..=6),
-            vec![old(1..=2), both(3..=6), old(7..=8)],
-        );
-
-        // case 10
-        assert_eq!(
-            split(3..=6, 6..=7),
-            vec![old(3..=5), both(6..=6), new(7..=7)],
-        );
-        assert_eq!(
-            split(3..=6, 6..=8),
-            vec![old(3..=5), both(6..=6), new(7..=8)],
-        );
-        assert_eq!(
-            split(5..=6, 6..=7),
-            vec![old(5..=5), both(6..=6), new(7..=7)],
-        );
-
-        // case 11
-        assert_eq!(
-            split(6..=7, 3..=6),
-            vec![new(3..=5), both(6..=6), old(7..=7)],
-        );
-        assert_eq!(
-            split(6..=8, 3..=6),
-            vec![new(3..=5), both(6..=6), old(7..=8)],
-        );
-        assert_eq!(
-            split(6..=7, 5..=6),
-            vec![new(5..=5), both(6..=6), old(7..=7)],
-        );
-
-        // case 12
-        assert_eq!(
-            split(3..=7, 5..=9),
-            vec![old(3..=4), both(5..=7), new(8..=9)],
-        );
-        assert_eq!(
-            split(3..=5, 4..=6),
-            vec![old(3..=3), both(4..=5), new(6..=6)],
-        );
-
-        // case 13
-        assert_eq!(
-            split(5..=9, 3..=7),
-            vec![new(3..=4), both(5..=7), old(8..=9)],
-        );
-        assert_eq!(
-            split(4..=6, 3..=5),
-            vec![new(3..=3), both(4..=5), old(6..=6)],
-        );
-    }
-
-    // Arguably there should be more tests here, but in practice, this data
-    // structure is well covered by the huge number of regex tests.
-}
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-17 12:18:25 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-17 12:18:25 +0000
commit	5363f350887b1e5b5dd21a86f88c8af9d7fea6da (patch)
tree	35ca005eb6e0e9a1ba3bb5dbc033209ad445dc17 /vendor/regex-automata/src/nfa/range_trie.rs
parent	Adding debian version 1.66.0+dfsg1-1. (diff)
download	rustc-5363f350887b1e5b5dd21a86f88c8af9d7fea6da.tar.xz rustc-5363f350887b1e5b5dd21a86f88c8af9d7fea6da.zip