* Consider refactoring the NFA representation such that it can be instantly loaded from a `&[u8]`, just like a sparse DFA. Main downside is that this could negatively impact using the NFA with deserialization costs. Before doing this, we should write PikeVM and backtracking implementations so that they can be benchmarked. * Add captures to NFA. * Once we're happy, re-organize the public API such that NFAs are exported and usable on their own. * Investigate why NFA shrinking seems to produce bigger DFAs after determinization, even though it makes determinization substantially faster. This might be because of its use of sparse NFA states, which have a lower constant overhead associated with them.