diff options
Diffstat (limited to 'vendor/sharded-slab/IMPLEMENTATION.md')
-rw-r--r-- | vendor/sharded-slab/IMPLEMENTATION.md | 135 |
1 files changed, 135 insertions, 0 deletions
diff --git a/vendor/sharded-slab/IMPLEMENTATION.md b/vendor/sharded-slab/IMPLEMENTATION.md new file mode 100644 index 000000000..5494302cb --- /dev/null +++ b/vendor/sharded-slab/IMPLEMENTATION.md @@ -0,0 +1,135 @@ +Notes on `sharded-slab`'s implementation and design. + +# Design + +The sharded slab's design is strongly inspired by the ideas presented by +Leijen, Zorn, and de Moura in [Mimalloc: Free List Sharding in +Action][mimalloc]. In this report, the authors present a novel design for a +memory allocator based on a concept of _free list sharding_. + +Memory allocators must keep track of what memory regions are not currently +allocated ("free") in order to provide them to future allocation requests. +The term [_free list_][freelist] refers to a technique for performing this +bookkeeping, where each free block stores a pointer to the next free block, +forming a linked list. The memory allocator keeps a pointer to the most +recently freed block, the _head_ of the free list. To allocate more memory, +the allocator pops from the free list by setting the head pointer to the +next free block of the current head block, and returning the previous head. +To deallocate a block, the block is pushed to the free list by setting its +first word to the current head pointer, and the head pointer is set to point +to the deallocated block. Most implementations of slab allocators backed by +arrays or vectors use a similar technique, where pointers are replaced by +indices into the backing array. + +When allocations and deallocations can occur concurrently across threads, +they must synchronize accesses to the free list; either by putting the +entire allocator state inside of a lock, or by using atomic operations to +treat the free list as a lock-free structure (such as a [Treiber stack]). In +both cases, there is a significant performance cost — even when the free +list is lock-free, it is likely that a noticeable amount of time will be +spent in compare-and-swap loops. Ideally, the global synchronzation point +created by the single global free list could be avoided as much as possible. + +The approach presented by Leijen, Zorn, and de Moura is to introduce +sharding and thus increase the granularity of synchronization significantly. +In mimalloc, the heap is _sharded_ so that each thread has its own +thread-local heap. Objects are always allocated from the local heap of the +thread where the allocation is performed. Because allocations are always +done from a thread's local heap, they need not be synchronized. + +However, since objects can move between threads before being deallocated, +_deallocations_ may still occur concurrently. Therefore, Leijen et al. +introduce a concept of _local_ and _global_ free lists. When an object is +deallocated on the same thread it was originally allocated on, it is placed +on the local free list; if it is deallocated on another thread, it goes on +the global free list for the heap of the thread from which it originated. To +allocate, the local free list is used first; if it is empty, the entire +global free list is popped onto the local free list. Since the local free +list is only ever accessed by the thread it belongs to, it does not require +synchronization at all, and because the global free list is popped from +infrequently, the cost of synchronization has a reduced impact. A majority +of allocations can occur without any synchronization at all; and +deallocations only require synchronization when an object has left its +parent thread (a relatively uncommon case). + +[mimalloc]: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf +[freelist]: https://en.wikipedia.org/wiki/Free_list +[Treiber stack]: https://en.wikipedia.org/wiki/Treiber_stack + +# Implementation + +A slab is represented as an array of [`MAX_THREADS`] _shards_. A shard +consists of a vector of one or more _pages_ plus associated metadata. +Finally, a page consists of an array of _slots_, head indices for the local +and remote free lists. + +```text +┌─────────────┐ +│ shard 1 │ +│ │ ┌─────────────┐ ┌────────┐ +│ pages───────┼───▶│ page 1 │ │ │ +├─────────────┤ ├─────────────┤ ┌────▶│ next──┼─┐ +│ shard 2 │ │ page 2 │ │ ├────────┤ │ +├─────────────┤ │ │ │ │XXXXXXXX│ │ +│ shard 3 │ │ local_head──┼──┘ ├────────┤ │ +└─────────────┘ │ remote_head─┼──┐ │ │◀┘ + ... ├─────────────┤ │ │ next──┼─┐ +┌─────────────┐ │ page 3 │ │ ├────────┤ │ +│ shard n │ └─────────────┘ │ │XXXXXXXX│ │ +└─────────────┘ ... │ ├────────┤ │ + ┌─────────────┐ │ │XXXXXXXX│ │ + │ page n │ │ ├────────┤ │ + └─────────────┘ │ │ │◀┘ + └────▶│ next──┼───▶ ... + ├────────┤ + │XXXXXXXX│ + └────────┘ +``` + + +The size of the first page in a shard is always a power of two, and every +subsequent page added after the first is twice as large as the page that +preceeds it. + +```text + +pg. +┌───┐ ┌─┬─┐ +│ 0 │───▶ │ │ +├───┤ ├─┼─┼─┬─┐ +│ 1 │───▶ │ │ │ │ +├───┤ ├─┼─┼─┼─┼─┬─┬─┬─┐ +│ 2 │───▶ │ │ │ │ │ │ │ │ +├───┤ ├─┼─┼─┼─┼─┼─┼─┼─┼─┬─┬─┬─┬─┬─┬─┬─┐ +│ 3 │───▶ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ +└───┘ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ +``` + +When searching for a free slot, the smallest page is searched first, and if +it is full, the search proceeds to the next page until either a free slot is +found or all available pages have been searched. If all available pages have +been searched and the maximum number of pages has not yet been reached, a +new page is then allocated. + +Since every page is twice as large as the previous page, and all page sizes +are powers of two, we can determine the page index that contains a given +address by shifting the address down by the smallest page size and +looking at how many twos places necessary to represent that number, +telling us what power of two page size it fits inside of. We can +determine the number of twos places by counting the number of leading +zeros (unused twos places) in the number's binary representation, and +subtracting that count from the total number of bits in a word. + +The formula for determining the page number that contains an offset is thus: + +```rust,ignore +WIDTH - ((offset + INITIAL_PAGE_SIZE) >> INDEX_SHIFT).leading_zeros() +``` + +where `WIDTH` is the number of bits in a `usize`, and `INDEX_SHIFT` is + +```rust,ignore +INITIAL_PAGE_SIZE.trailing_zeros() + 1; +``` + +[`MAX_THREADS`]: https://docs.rs/sharded-slab/latest/sharded_slab/trait.Config.html#associatedconstant.MAX_THREADS |