diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-06 01:02:30 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-06 01:02:30 +0000 |
commit | 76cb841cb886eef6b3bee341a2266c76578724ad (patch) | |
tree | f5892e5ba6cc11949952a6ce4ecbe6d516d6ce58 /Documentation/block/queue-sysfs.txt | |
parent | Initial commit. (diff) | |
download | linux-upstream.tar.xz linux-upstream.zip |
Adding upstream version 4.19.249.upstream/4.19.249upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/block/queue-sysfs.txt')
-rw-r--r-- | Documentation/block/queue-sysfs.txt | 197 |
1 files changed, 197 insertions, 0 deletions
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt new file mode 100644 index 000000000..2c1e67058 --- /dev/null +++ b/Documentation/block/queue-sysfs.txt @@ -0,0 +1,197 @@ +Queue sysfs files +================= + +This text file will detail the queue files that are located in the sysfs tree +for each block device. Note that stacked devices typically do not export +any settings, since their queue merely functions are a remapping target. +These files are the ones found in the /sys/block/xxx/queue/ directory. + +Files denoted with a RO postfix are readonly and the RW postfix means +read-write. + +add_random (RW) +---------------- +This file allows to turn off the disk entropy contribution. Default +value of this file is '1'(on). + +dax (RO) +-------- +This file indicates whether the device supports Direct Access (DAX), +used by CPU-addressable storage to bypass the pagecache. It shows '1' +if true, '0' if not. + +discard_granularity (RO) +----------------------- +This shows the size of internal allocation of the device in bytes, if +reported by the device. A value of '0' means device does not support +the discard functionality. + +discard_max_hw_bytes (RO) +---------------------- +Devices that support discard functionality may have internal limits on +the number of bytes that can be trimmed or unmapped in a single operation. +The discard_max_bytes parameter is set by the device driver to the maximum +number of bytes that can be discarded in a single operation. Discard +requests issued to the device must not exceed this limit. A discard_max_bytes +value of 0 means that the device does not support discard functionality. + +discard_max_bytes (RW) +---------------------- +While discard_max_hw_bytes is the hardware limit for the device, this +setting is the software limit. Some devices exhibit large latencies when +large discards are issued, setting this value lower will make Linux issue +smaller discards and potentially help reduce latencies induced by large +discard operations. + +hw_sector_size (RO) +------------------- +This is the hardware sector size of the device, in bytes. + +io_poll (RW) +------------ +When read, this file shows whether polling is enabled (1) or disabled +(0). Writing '0' to this file will disable polling for this device. +Writing any non-zero value will enable this feature. + +io_poll_delay (RW) +------------------ +If polling is enabled, this controls what kind of polling will be +performed. It defaults to -1, which is classic polling. In this mode, +the CPU will repeatedly ask for completions without giving up any time. +If set to 0, a hybrid polling mode is used, where the kernel will attempt +to make an educated guess at when the IO will complete. Based on this +guess, the kernel will put the process issuing IO to sleep for an amount +of time, before entering a classic poll loop. This mode might be a +little slower than pure classic polling, but it will be more efficient. +If set to a value larger than 0, the kernel will put the process issuing +IO to sleep for this amont of microseconds before entering classic +polling. + +iostats (RW) +------------- +This file is used to control (on/off) the iostats accounting of the +disk. + +logical_block_size (RO) +----------------------- +This is the logical block size of the device, in bytes. + +max_hw_sectors_kb (RO) +---------------------- +This is the maximum number of kilobytes supported in a single data transfer. + +max_integrity_segments (RO) +--------------------------- +When read, this file shows the max limit of integrity segments as +set by block layer which a hardware controller can handle. + +max_sectors_kb (RW) +------------------- +This is the maximum number of kilobytes that the block layer will allow +for a filesystem request. Must be smaller than or equal to the maximum +size allowed by the hardware. + +max_segments (RO) +----------------- +Maximum number of segments of the device. + +max_segment_size (RO) +--------------------- +Maximum segment size of the device. + +minimum_io_size (RO) +-------------------- +This is the smallest preferred IO size reported by the device. + +nomerges (RW) +------------- +This enables the user to disable the lookup logic involved with IO +merging requests in the block layer. By default (0) all merges are +enabled. When set to 1 only simple one-hit merges will be tried. When +set to 2 no merge algorithms will be tried (including one-hit or more +complex tree/hash lookups). + +nr_requests (RW) +---------------- +This controls how many requests may be allocated in the block layer for +read or write requests. Note that the total allocated number may be twice +this amount, since it applies only to reads or writes (not the accumulated +sum). + +To avoid priority inversion through request starvation, a request +queue maintains a separate request pool per each cgroup when +CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such +per-block-cgroup request pool. IOW, if there are N block cgroups, +each request queue may have up to N request pools, each independently +regulated by nr_requests. + +optimal_io_size (RO) +-------------------- +This is the optimal IO size reported by the device. + +physical_block_size (RO) +------------------------ +This is the physical block size of device, in bytes. + +read_ahead_kb (RW) +------------------ +Maximum number of kilobytes to read-ahead for filesystems on this block +device. + +rotational (RW) +--------------- +This file is used to stat if the device is of rotational type or +non-rotational type. + +rq_affinity (RW) +---------------- +If this option is '1', the block layer will migrate request completions to the +cpu "group" that originally submitted the request. For some workloads this +provides a significant reduction in CPU cycles due to caching effects. + +For storage configurations that need to maximize distribution of completion +processing setting this option to '2' forces the completion to run on the +requesting cpu (bypassing the "group" aggregation logic). + +scheduler (RW) +-------------- +When read, this file will display the current and available IO schedulers +for this block device. The currently active IO scheduler will be enclosed +in [] brackets. Writing an IO scheduler name to this file will switch +control of this block device to that new IO scheduler. Note that writing +an IO scheduler name to this file will attempt to load that IO scheduler +module, if it isn't already present in the system. + +write_cache (RW) +---------------- +When read, this file will display whether the device has write back +caching enabled or not. It will return "write back" for the former +case, and "write through" for the latter. Writing to this file can +change the kernels view of the device, but it doesn't alter the +device state. This means that it might not be safe to toggle the +setting from "write back" to "write through", since that will also +eliminate cache flushes issued by the kernel. + +write_same_max_bytes (RO) +------------------------- +This is the number of bytes the device can write in a single write-same +command. A value of '0' means write-same is not supported by this +device. + +wb_lat_usec (RW) +---------------- +If the device is registered for writeback throttling, then this file shows +the target minimum read latency. If this latency is exceeded in a given +window of time (see wb_window_usec), then the writeback throttling will start +scaling back writes. Writing a value of '0' to this file disables the +feature. Writing a value of '-1' to this file resets the value to the +default setting. + +throttle_sample_time (RW) +------------------------- +This is the time window that blk-throttle samples data, in millisecond. +blk-throttle makes decision based on the samplings. Lower time means cgroups +have more smooth throughput, but higher CPU overhead. This exists only when +CONFIG_BLK_DEV_THROTTLING_LOW is enabled. + +Jens Axboe <jens.axboe@oracle.com>, February 2009 |