summaryrefslogtreecommitdiffstats
path: root/vendor/memchr/README.md
blob: fe44625e3bfebc5aa18f672ad75fd213100e29e1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
memchr
======
This library provides heavily optimized routines for string search primitives.

[![Build status](https://github.com/BurntSushi/memchr/workflows/ci/badge.svg)](https://github.com/BurntSushi/memchr/actions)
[![Crates.io](https://img.shields.io/crates/v/memchr.svg)](https://crates.io/crates/memchr)

Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).


### Documentation

[https://docs.rs/memchr](https://docs.rs/memchr)


### Overview

* The top-level module provides routines for searching for 1, 2 or 3 bytes
  in the forward or reverse direction. When searching for more than one byte,
  positions are considered a match if the byte at that position matches any
  of the bytes.
* The `memmem` sub-module provides forward and reverse substring search
  routines.

In all such cases, routines operate on `&[u8]` without regard to encoding. This
is exactly what you want when searching either UTF-8 or arbitrary bytes.

### Compiling without the standard library

memchr links to the standard library by default, but you can disable the
`std` feature if you want to use it in a `#![no_std]` crate:

```toml
[dependencies]
memchr = { version = "2", default-features = false }
```

On `x86_64` platforms, when the `std` feature is disabled, the SSE2 accelerated
implementations will be used. When `std` is enabled, AVX2 accelerated
implementations will be used if the CPU is determined to support it at runtime.

SIMD accelerated routines are also available on the `wasm32` and `aarch64`
targets. The `std` feature is not required to use them.

When a SIMD version is not available, then this crate falls back to
[SWAR](https://en.wikipedia.org/wiki/SWAR) techniques.

### Minimum Rust version policy

This crate's minimum supported `rustc` version is `1.61.0`.

The current policy is that the minimum Rust version required to use this crate
can be increased in minor version updates. For example, if `crate 1.0` requires
Rust 1.20.0, then `crate 1.0.z` for all values of `z` will also require Rust
1.20.0 or newer. However, `crate 1.y` for `y > 0` may require a newer minimum
version of Rust.

In general, this crate will be conservative with respect to the minimum
supported version of Rust.


### Testing strategy

Given the complexity of the code in this crate, along with the pervasive use
of `unsafe`, this crate has an extensive testing strategy. It combines multiple
approaches:

* Hand-written tests.
* Exhaustive-style testing meant to exercise all possible branching and offset
  calculations.
* Property based testing through [`quickcheck`](https://github.com/BurntSushi/quickcheck).
* Fuzz testing through [`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz).
* A huge suite of benchmarks that are also run as tests. Benchmarks always
  confirm that the expected result occurs.

Improvements to the testing infrastructure are very welcome.


### Algorithms used

At time of writing, this crate's implementation of substring search actually
has a few different algorithms to choose from depending on the situation.

* For very small haystacks,
  [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
  is used to reduce latency. Rabin-Karp has very small overhead and can often
  complete before other searchers have even been constructed.
* For small needles, a variant of the
  ["Generic SIMD"](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd)
  algorithm is used. Instead of using the first and last bytes, a heuristic is
  used to select bytes based on a background distribution of byte frequencies.
* In all other cases,
  [Two-Way](https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm)
  is used. If possible, a prefilter based on the "Generic SIMD" algorithm
  linked above is used to find candidates quickly. A dynamic heuristic is used
  to detect if the prefilter is ineffective, and if so, disables it.