diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
commit | 19fcec84d8d7d21e796c7624e521b60d28ee21ed (patch) | |
tree | 42d26aa27d1e3f7c0b8bd3fd14e7d7082f5008dc /src/spdk/doc/userspace.md | |
parent | Initial commit. (diff) | |
download | ceph-19fcec84d8d7d21e796c7624e521b60d28ee21ed.tar.xz ceph-19fcec84d8d7d21e796c7624e521b60d28ee21ed.zip |
Adding upstream version 16.2.11+ds.upstream/16.2.11+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/spdk/doc/userspace.md')
-rw-r--r-- | src/spdk/doc/userspace.md | 97 |
1 files changed, 97 insertions, 0 deletions
diff --git a/src/spdk/doc/userspace.md b/src/spdk/doc/userspace.md new file mode 100644 index 000000000..54ba1bdfa --- /dev/null +++ b/src/spdk/doc/userspace.md @@ -0,0 +1,97 @@ +# User Space Drivers {#userspace} + +# Controlling Hardware From User Space {#userspace_control} + +Much of the documentation for SPDK talks about _user space drivers_, so it's +important to understand what that means at a technical level. First and +foremost, a _driver_ is software that directly controls a particular device +attached to a computer. Second, operating systems segregate the system's +virtual memory into two categories of addresses based on privilege level - +[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This +separation is aided by features on the CPU itself that enforce memory +separation called +[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically, +drivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that +instead are designed to run in user space, but they still interface directly +with the hardware device that they are controlling. + +In order for SPDK to take control of a device, it must first instruct the +operating system to relinquish control. This is often referred to as unbinding +the kernel driver from the device and on Linux is done by +[writing to a file in sysfs](https://lwn.net/Articles/143397/). +SPDK then rebinds the driver to one of two special device drivers that come +bundled with Linux - +[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or +[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers +are "dummy" drivers in the sense that they mostly indicate to the operating +system that the device has a driver bound to it so it won't automatically try +to re-bind the default driver. They don't actually initialize the hardware in +any way, nor do they even understand what type of device it is. The primary +difference between uio and vfio is that vfio is capable of programming the +platform's +[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit), +which is a critical piece of hardware for ensuring memory safety in user space +drivers. See @ref memory for full details. + +Once the device is unbound from the operating system kernel, the operating +system can't use it anymore. For example, if you unbind an NVMe device on Linux, +the devices corresponding to it such as /dev/nvme0n1 will disappear. It further +means that filesystems mounted on the device will also be removed and kernel +filesystems can no longer interact with the device. In fact, the entire kernel +block storage stack is no longer involved. Instead, SPDK provides re-imagined +implementations of most of the layers in a typical operating system storage +stack all as C libraries that can be directly embedded into your application. +This includes a [block device abstraction layer](@ref bdev) primarily, but +also [block allocators](@ref blob) and [filesystem-like components](@ref blobfs). + +User space drivers utilize features in uio or vfio to map the +[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device +into the current process, which allows the driver to perform +[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref +nvme, for instance, maps the BAR for the NVMe device and then follows along +with the +[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf) +to initialize the device, create queue pairs, and ultimately send I/O. + +# Interrupts {#userspace_interrupts} + +SPDK polls devices for completions instead of waiting for interrupts. There +are a number of reasons for doing this: 1) practically speaking, routing an +interrupt to a handler in a user space process just isn't feasible for most +hardware designs, 2) interrupts introduce software jitter and have significant +overhead due to forced context switches. Operations in SPDK are almost +universally asynchronous and allow the user to provide a callback on +completion. The callback is called in response to the user calling a function +to poll for completions. Polling an NVMe device is fast because only host +memory needs to be read (no MMIO) to check a queue pair for a bit flip and +technologies such as Intel's +[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html) +will ensure that the host memory being checked is present in the CPU cache +after an update by the device. + +# Threading {#userspace_threading} + +NVMe devices expose multiple queues for submitting requests to the hardware. +Separate queues can be accessed without coordination, so software can send +requests to the device from multiple threads of execution in parallel without +locks. Unfortunately, kernel drivers must be designed to handle I/O coming +from lots of different places either in the operating system or in various +processes on the system, and the thread topology of those processes changes +over time. Most kernel drivers elect to map hardware queues to cores (as close +to 1:1 as possible), and then when a request is submitted they look up the +correct hardware queue for whatever core the current thread happens to be +running on. Often, they'll need to either acquire a lock around the queue or +temporarily disable interrupts to guard against preemption from threads +running on the same core, which can be expensive. This is a large improvement +from older hardware interfaces that only had a single queue or no queue at +all, but still isn't always optimal. + +A user space driver, on the other hand, is embedded into a single application. +This application knows exactly how many threads (or processes) exist +because the application created them. Therefore, the SPDK drivers choose to +expose the hardware queues directly to the application with the requirement +that a hardware queue is only ever accessed from one thread at a time. In +practice, applications assign one hardware queue to each thread (as opposed to +one hardware queue per core in kernel drivers). This guarantees that the thread +can submit requests without having to perform any sort of coordination (i.e. +locking) with the other threads in the system. |