diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
commit | 19fcec84d8d7d21e796c7624e521b60d28ee21ed (patch) | |
tree | 42d26aa27d1e3f7c0b8bd3fd14e7d7082f5008dc /src/doc/dynamic-throttle.txt | |
parent | Initial commit. (diff) | |
download | ceph-upstream.tar.xz ceph-upstream.zip |
Adding upstream version 16.2.11+ds.upstream/16.2.11+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/doc/dynamic-throttle.txt')
-rw-r--r-- | src/doc/dynamic-throttle.txt | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/src/doc/dynamic-throttle.txt b/src/doc/dynamic-throttle.txt new file mode 100644 index 000000000..39ce34506 --- /dev/null +++ b/src/doc/dynamic-throttle.txt @@ -0,0 +1,127 @@ +------ +TOPIC: +------ +Dynamic backoff throttle is been introduced in the Jewel timeframe to produce +a stable and improved performance out from filestore. This should also improve +the average and 99th latency significantly. + +----------- +WHAT IS IT? +----------- +The old throttle scheme in the filestore is to allow all the ios till the +outstanding io/bytes reached some limit. Once crossed the threshold, it will not +allow any more io to go through till the outstanding ios/bytes goes below the +threshold. That's why once it is crossed the threshold, the write behavior +becomes very spiky. +The idea of the new throttle is, based on some user configurable parameters +it can start throttling (inducing delays) early and with proper parameter +value we can prevent the outstanding io/bytes to reach to the threshold mark. +This dynamic backoff throttle is implemented in the following scenarios within +filestore. + + 1. Filestore op worker queue holds the transactions those are yet to be applied. + This queue needs to be bounded and a backoff throttle is used to gradually + induce delays to the queueing threads if the ratio of current outstanding + bytes|ops and filestore_queue_max_(bytes|ops) is more than + filestore_queue_low_threshhold. The throttles will block io once the + outstanding bytes|ops reaches max value (filestore_queue_max_(bytes|ops)). + + User configurable config option for adjusting delay is the following. + + filestore_queue_low_threshhold: + Valid values should be between 0-1 and shouldn't be > *_high_threshold. + + filestore_queue_high_threshhold: + Valid values should be between 0-1 and shouldn't be < *_low_threshold. + + filestore_expected_throughput_ops: + Shouldn't be less than or equal to 0. + + filestore_expected_throughput_bytes: + Shouldn't be less than or equal to 0. + + filestore_queue_high_delay_multiple: + Shouldn't be less than 0 and shouldn't be > *_max_delay_multiple. + + filestore_queue_max_delay_multiple: + Shouldn't be less than 0 and shouldn't be < *_high_delay_multiple + + 2. journal usage throttle is implemented to gradually slow down queue_transactions + callers as the journal fills up. We don't want journal to become full as this + will again induce spiky behavior. The configs work very similarly to the + Filestore op worker queue throttle. + + journal_throttle_low_threshhold + journal_throttle_high_threshhold + filestore_expected_throughput_ops + filestore_expected_throughput_bytes + journal_throttle_high_multiple + journal_throttle_max_multiple + +This scheme will not be inducing any delay between [0, low_threshold]. +In [low_threshold, high_threshold), delays should be injected based on a line +from 0 at low_threshold to high_multiple * (1/expected_throughput) at high_threshold. +In [high_threshold, 1), we want delays injected based on a line from +(high_multiple * (1/expected_throughput)) at high_threshold to +(high_multiple * (1/expected_throughput)) + (max_multiple * (1/expected_throughput)) +at 1. +Now setting *_high_multiple and *_max_multiple to 0 should give us similar effect +like old throttle (this is default). Setting filestore_queue_max_ops and +filestore_queue_max_bytes to zero should disable the entire backoff throttle. + +------------------------- +HOW TO PICK PROPER VALUE ? +------------------------- + +General guideline is the following. + + filestore_queue_max_bytes and filestore_queue_max_ops: + ----------------------------------------------------- + + This directly depends on how filestore op_wq will be growing in the memory. So, + user needs to calculate how much memory he can give to each OSD. Bigger value + meaning current/max ratio will be smaller and throttle will be relaxed a bit. + This could help in case of 100% write but in case of mixed read/write this can + induce more latencies for reads on the objects those are not yet been applied + i.e yet to reach in it's final filesystem location. Ideally, once we reach the + max throughput of the backend, increasing further will only increase memory + usage and latency. + + filestore_expected_throughput_bytes and filestore_expected_throughput_ops: + ------------------------------------------------------------------------- + + This directly co-relates to how much delay needs to be induced per throttle. + Ideally, this should be a bit above (or same) your backend device throughput. + + *_low_threshhold *_high_threshhold : + ---------------------------------- + + If your backend is fast you may want to set this threshold value higher so that + initial throttle would be less and it can give a chance to back end transaction + to catch up. + + *_high_multiple and *_max_multiple: + ---------------------------------- + + This is the delay tunables and one should do all effort here so that the current + doesn't reach to the max. For HDD and smaller block size, the old throttle may + make sense, in that case we should set these delay multiple to 0. + +It is recommended that user should use ceph_smalliobenchfs tool to see the effect +of these parameters (and decide) on your HW before deploying OSDs. + +Need to supply the following parameters. + + 1. --journal-path + + 2. --filestore-path + (Just mount the empty data partition and give the path here) + + 3. --op-dump-file + +'ceph_smalliobenchfs --help' will show all the options that user can tweak. +In case user wants to supply any ceph.conf parameter related to filestore, +it can be done by adding '--' in front, for ex. --debug_filestore + +Once test is done, analyze the throughput and latency information dumped in the +file provided in the option --op-dump-file. |