* Consider refactoring the NFA representation such that it can be instantly
  loaded from a `&[u8]`, just like a sparse DFA. Main downside is that this
  could negatively impact using the NFA with deserialization costs. Before
  doing this, we should write PikeVM and backtracking implementations so that
  they can be benchmarked.
* Add captures to NFA.
* Once we're happy, re-organize the public API such that NFAs are exported
  and usable on their own.

* Investigate why NFA shrinking seems to produce bigger DFAs after
  determinization, even though it makes determinization substantially
  faster. This might be because of its use of sparse NFA states, which have
  a lower constant overhead associated with them.