summaryrefslogtreecommitdiffstats
path: root/vendor/zerovec/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/zerovec/README.md')
-rw-r--r--vendor/zerovec/README.md193
1 files changed, 193 insertions, 0 deletions
diff --git a/vendor/zerovec/README.md b/vendor/zerovec/README.md
new file mode 100644
index 000000000..dbde4022c
--- /dev/null
+++ b/vendor/zerovec/README.md
@@ -0,0 +1,193 @@
+# zerovec [![crates.io](https://img.shields.io/crates/v/zerovec)](https://crates.io/crates/zerovec)
+
+Zero-copy vector abstractions for arbitrary types, backed by byte slices.
+
+`zerovec` enables a far wider range of types — beyond just `&[u8]` and `&str` — to participate in
+zero-copy deserialization from byte slices. It is `serde` compatible and comes equipped with
+proc macros
+
+Clients upgrading to `zerovec` benefit from zero heap allocations when deserializing
+read-only data.
+
+This crate has four main types:
+
+- [`ZeroVec<'a, T>`] (and [`ZeroSlice<T>`](ZeroSlice)) for fixed-width types like `u32`
+- [`VarZeroVec<'a, T>`] (and [`VarZeroSlice<T>`](ZeroSlice)) for variable-width types like `str`
+- [`ZeroMap<'a, K, V>`] to map from `K` to `V`
+- [`ZeroMap2d<'a, K0, K1, V>`] to map from the pair `(K0, K1)` to `V`
+
+The first two are intended as close-to-drop-in replacements for `Vec<T>` in Serde structs. The third and fourth are
+intended as a replacement for `HashMap` or [`LiteMap`](docs.rs/litemap). When used with Serde derives, **be sure to apply
+`#[serde(borrow)]` to these types**, same as one would for [`Cow<'a, T>`].
+
+[`ZeroVec<'a, T>`], [`VarZeroVec<'a, T>`], [`ZeroMap<'a, K, V>`], and [`ZeroMap2d<'a, K0, K1, V>`] all behave like
+[`Cow<'a, T>`] in that they abstract over either borrowed or owned data. When performing deserialization
+from human-readable formats (like `json` and `xml`), typically these types will allocate and fully own their data, whereas if deserializing
+from binary formats like `bincode` and `postcard`, these types will borrow data directly from the buffer being deserialized from,
+avoiding allocations and only performing validity checks. As such, this crate can be pretty fast (see [below](#Performance) for more information)
+on deserialization.
+
+See [the design doc](https://github.com/unicode-org/icu4x/blob/main/utils/zerovec/design_doc.md) for details on how this crate
+works under the hood.
+
+## Cargo features
+
+This crate has five optional features:
+ - `serde`: Allows serializing and deserializing `zerovec`'s abstractions via [`serde`](https://docs.rs/serde)
+ - `yoke`: Enables implementations of `Yokeable` from the [`yoke`](https://docs.rs/yoke/) crate, which is also useful
+ in situations involving a lot of zero-copy deserialization.
+ - `derive`: Makes it easier to use custom types in these collections by providing the [`#[make_ule]`](crate::make_ule) and
+ [`#[make_varule]`](crate::make_varule) proc macros, which generate appropriate [`ULE`](crate::ule::ULE) and
+ [`VarULE`](crate::ule::VarULE)-conformant types for a given "normal" type.
+ - `std`: Enabled `std::Error` implementations for error types. This crate is by default `no_std` with a dependency on `alloc`.
+
+[`ZeroVec<'a, T>`]: ZeroVec
+[`VarZeroVec<'a, T>`]: VarZeroVec
+[`ZeroMap<'a, K, V>`]: ZeroMap
+[`ZeroMap2d<'a, K0, K1, V>`]: ZeroMap2d
+[`Cow<'a, T>`]: alloc::borrow::Cow
+
+## Examples
+
+Serialize and deserialize a struct with ZeroVec and VarZeroVec with Bincode:
+
+```rust
+use zerovec::{VarZeroVec, ZeroVec};
+
+// This example requires the "serde" feature
+#[derive(serde::Serialize, serde::Deserialize)]
+pub struct DataStruct<'data> {
+ #[serde(borrow)]
+ nums: ZeroVec<'data, u32>,
+ #[serde(borrow)]
+ chars: ZeroVec<'data, char>,
+ #[serde(borrow)]
+ strs: VarZeroVec<'data, str>,
+}
+
+let data = DataStruct {
+ nums: ZeroVec::from_slice_or_alloc(&[211, 281, 421, 461]),
+ chars: ZeroVec::alloc_from_slice(&['ö', '冇', 'म']),
+ strs: VarZeroVec::from(&["hello", "world"]),
+};
+let bincode_bytes =
+ bincode::serialize(&data).expect("Serialization should be successful");
+assert_eq!(bincode_bytes.len(), 67);
+
+let deserialized: DataStruct = bincode::deserialize(&bincode_bytes)
+ .expect("Deserialization should be successful");
+assert_eq!(deserialized.nums.first(), Some(211));
+assert_eq!(deserialized.chars.get(1), Some('冇'));
+assert_eq!(deserialized.strs.get(1), Some("world"));
+// The deserialization will not have allocated anything
+assert!(!deserialized.nums.is_owned());
+```
+
+Use custom types inside of ZeroVec:
+
+```rust
+use zerovec::{ZeroVec, VarZeroVec, ZeroMap};
+use std::borrow::Cow;
+use zerovec::ule::encode_varule_to_box;
+
+// custom fixed-size ULE type for ZeroVec
+#[zerovec::make_ule(DateULE)]
+#[derive(Copy, Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
+struct Date {
+ y: u64,
+ m: u8,
+ d: u8
+}
+
+// custom variable sized VarULE type for VarZeroVec
+#[zerovec::make_varule(PersonULE)]
+#[zerovec::derive(Serialize, Deserialize)] // add Serde impls to PersonULE
+#[derive(Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
+struct Person<'a> {
+ birthday: Date,
+ favorite_character: char,
+ #[serde(borrow)]
+ name: Cow<'a, str>,
+}
+
+#[derive(serde::Serialize, serde::Deserialize)]
+struct Data<'a> {
+ #[serde(borrow)]
+ important_dates: ZeroVec<'a, Date>,
+ // note: VarZeroVec always must reference the ULE type directly
+ #[serde(borrow)]
+ important_people: VarZeroVec<'a, PersonULE>,
+ #[serde(borrow)]
+ birthdays_to_people: ZeroMap<'a, Date, PersonULE>
+}
+
+
+let person1 = Person {
+ birthday: Date { y: 1990, m: 9, d: 7},
+ favorite_character: 'π',
+ name: Cow::from("Kate")
+};
+let person2 = Person {
+ birthday: Date { y: 1960, m: 5, d: 25},
+ favorite_character: '冇',
+ name: Cow::from("Jesse")
+};
+
+let important_dates = ZeroVec::alloc_from_slice(&[Date { y: 1943, m: 3, d: 20}, Date { y: 1976, m: 8, d: 2}, Date { y: 1998, m: 2, d: 15}]);
+let important_people = VarZeroVec::from(&[&person1, &person2]);
+let mut birthdays_to_people: ZeroMap<Date, PersonULE> = ZeroMap::new();
+// `.insert_var_v()` is slightly more convenient over `.insert()` for custom ULE types
+birthdays_to_people.insert_var_v(&person1.birthday, &person1);
+birthdays_to_people.insert_var_v(&person2.birthday, &person2);
+
+let data = Data { important_dates, important_people, birthdays_to_people };
+
+let bincode_bytes = bincode::serialize(&data)
+ .expect("Serialization should be successful");
+assert_eq!(bincode_bytes.len(), 168);
+
+let deserialized: Data = bincode::deserialize(&bincode_bytes)
+ .expect("Deserialization should be successful");
+
+assert_eq!(deserialized.important_dates.get(0).unwrap().y, 1943);
+assert_eq!(&deserialized.important_people.get(1).unwrap().name, "Jesse");
+assert_eq!(&deserialized.important_people.get(0).unwrap().name, "Kate");
+assert_eq!(&deserialized.birthdays_to_people.get(&person1.birthday).unwrap().name, "Kate");
+
+} // feature = serde and derive
+```
+
+## Performance
+
+`zerovec` is designed for fast deserialization from byte buffers with zero memory allocations
+while minimizing performance regressions for common vector operations.
+
+Benchmark results on x86_64:
+
+| Operation | `Vec<T>` | `zerovec` |
+|---|---|---|
+| Deserialize vec of 100 `u32` | 233.18 ns | 14.120 ns |
+| Compute sum of vec of 100 `u32` (read every element) | 8.7472 ns | 10.775 ns |
+| Binary search vec of 1000 `u32` 50 times | 442.80 ns | 472.51 ns |
+| Deserialize vec of 100 strings | 7.3740 μs\* | 1.4495 μs |
+| Count chars in vec of 100 strings (read every element) | 747.50 ns | 955.28 ns |
+| Binary search vec of 500 strings 10 times | 466.09 ns | 790.33 ns |
+
+\* *This result is reported for `Vec<String>`. However, Serde also supports deserializing to the partially-zero-copy `Vec<&str>`; this gives 1.8420 μs, much faster than `Vec<String>` but a bit slower than `zerovec`.*
+
+| Operation | `HashMap<K,V>` | `LiteMap<K,V>` | `ZeroMap<K,V>` |
+|---|---|---|---|
+| Deserialize a small map | 2.72 μs | 1.28 μs | 480 ns |
+| Deserialize a large map | 50.5 ms | 18.3 ms | 3.74 ms |
+| Look up from a small deserialized map | 49 ns | 42 ns | 54 ns |
+| Look up from a large deserialized map | 51 ns | 155 ns | 213 ns |
+
+Small = 16 elements, large = 131,072 elements. Maps contain `<String, String>`.
+
+The benches used to generate the above table can be found in the `benches` directory in the project repository.
+`zeromap` benches are named by convention, e.g. `zeromap/deserialize/small`, `zeromap/lookup/large`. The type
+is appended for baseline comparisons, e.g. `zeromap/lookup/small/hashmap`.
+
+## More Information
+
+For more information on development, authorship, contributing etc. please visit [`ICU4X home page`](https://github.com/unicode-org/icu4x).