diff options
Diffstat (limited to '')
-rw-r--r-- | doc/rados/configuration/mclock-config-ref.rst | 699 |
1 files changed, 699 insertions, 0 deletions
diff --git a/doc/rados/configuration/mclock-config-ref.rst b/doc/rados/configuration/mclock-config-ref.rst new file mode 100644 index 000000000..a338aa6da --- /dev/null +++ b/doc/rados/configuration/mclock-config-ref.rst @@ -0,0 +1,699 @@ +======================== + mClock Config Reference +======================== + +.. index:: mclock; configuration + +QoS support in Ceph is implemented using a queuing scheduler based on `the +dmClock algorithm`_. See :ref:`dmclock-qos` section for more details. + +To make the usage of mclock more user-friendly and intuitive, mclock config +profiles are introduced. The mclock profiles mask the low level details from +users, making it easier to configure and use mclock. + +The following input parameters are required for a mclock profile to configure +the QoS related parameters: + +* total capacity (IOPS) of each OSD (determined automatically - + See `OSD Capacity Determination (Automated)`_) + +* the max sequential bandwidth capacity (MiB/s) of each OSD - + See *osd_mclock_max_sequential_bandwidth_[hdd|ssd]* option + +* an mclock profile type to enable + +Using the settings in the specified profile, an OSD determines and applies the +lower-level mclock and Ceph parameters. The parameters applied by the mclock +profile make it possible to tune the QoS between client I/O and background +operations in the OSD. + + +.. index:: mclock; mclock clients + +mClock Client Types +=================== + +The mclock scheduler handles requests from different types of Ceph services. +Each service can be considered as a type of client from mclock's perspective. +Depending on the type of requests handled, mclock clients are classified into +the buckets as shown in the table below, + ++------------------------+--------------------------------------------------------------+ +| Client Type | Request Types | ++========================+==============================================================+ +| Client | I/O requests issued by external clients of Ceph | ++------------------------+--------------------------------------------------------------+ +| Background recovery | Internal recovery requests | ++------------------------+--------------------------------------------------------------+ +| Background best-effort | Internal backfill, scrub, snap trim and PG deletion requests | ++------------------------+--------------------------------------------------------------+ + +The mclock profiles allocate parameters like reservation, weight and limit +(see :ref:`dmclock-qos`) differently for each client type. The next sections +describe the mclock profiles in greater detail. + + +.. index:: mclock; profile definition + +mClock Profiles - Definition and Purpose +======================================== + +A mclock profile is *“a configuration setting that when applied on a running +Ceph cluster enables the throttling of the operations(IOPS) belonging to +different client classes (background recovery, scrub, snaptrim, client op, +osd subop)”*. + +The mclock profile uses the capacity limits and the mclock profile type selected +by the user to determine the low-level mclock resource control configuration +parameters and apply them transparently. Additionally, other Ceph configuration +parameters are also applied. Please see sections below for more information. + +The low-level mclock resource control parameters are the *reservation*, +*limit*, and *weight* that provide control of the resource shares, as +described in the :ref:`dmclock-qos` section. + + +.. index:: mclock; profile types + +mClock Profile Types +==================== + +mclock profiles can be broadly classified into *built-in* and *custom* profiles, + +Built-in Profiles +----------------- +Users can choose between the following built-in profile types: + +.. note:: The values mentioned in the tables below represent the proportion + of the total IOPS capacity of the OSD allocated for the service type. + +* balanced (default) +* high_client_ops +* high_recovery_ops + +balanced (*default*) +^^^^^^^^^^^^^^^^^^^^ +The *balanced* profile is the default mClock profile. This profile allocates +equal reservation/priority to client operations and background recovery +operations. Background best-effort ops are given lower reservation and therefore +take a longer time to complete when are are competing operations. This profile +helps meet the normal/steady-state requirements of the cluster. This is the +case when external client performance requirement is not critical and there are +other background operations that still need attention within the OSD. + +But there might be instances that necessitate giving higher allocations to either +client ops or recovery ops. In order to deal with such a situation, the alternate +built-in profiles may be enabled by following the steps mentioned in next sections. + ++------------------------+-------------+--------+-------+ +| Service Type | Reservation | Weight | Limit | ++========================+=============+========+=======+ +| client | 50% | 1 | MAX | ++------------------------+-------------+--------+-------+ +| background recovery | 50% | 1 | MAX | ++------------------------+-------------+--------+-------+ +| background best-effort | MIN | 1 | 90% | ++------------------------+-------------+--------+-------+ + +high_client_ops +^^^^^^^^^^^^^^^ +This profile optimizes client performance over background activities by +allocating more reservation and limit to client operations as compared to +background operations in the OSD. This profile, for example, may be enabled +to provide the needed performance for I/O intensive applications for a +sustained period of time at the cost of slower recoveries. The table shows +the resource control parameters set by the profile: + ++------------------------+-------------+--------+-------+ +| Service Type | Reservation | Weight | Limit | ++========================+=============+========+=======+ +| client | 60% | 2 | MAX | ++------------------------+-------------+--------+-------+ +| background recovery | 40% | 1 | MAX | ++------------------------+-------------+--------+-------+ +| background best-effort | MIN | 1 | 70% | ++------------------------+-------------+--------+-------+ + +high_recovery_ops +^^^^^^^^^^^^^^^^^ +This profile optimizes background recovery performance as compared to external +clients and other background operations within the OSD. This profile, for +example, may be enabled by an administrator temporarily to speed-up background +recoveries during non-peak hours. The table shows the resource control +parameters set by the profile: + ++------------------------+-------------+--------+-------+ +| Service Type | Reservation | Weight | Limit | ++========================+=============+========+=======+ +| client | 30% | 1 | MAX | ++------------------------+-------------+--------+-------+ +| background recovery | 70% | 2 | MAX | ++------------------------+-------------+--------+-------+ +| background best-effort | MIN | 1 | MAX | ++------------------------+-------------+--------+-------+ + +.. note:: Across the built-in profiles, internal background best-effort clients + of mclock include "backfill", "scrub", "snap trim", and "pg deletion" + operations. + + +Custom Profile +-------------- +This profile gives users complete control over all the mclock configuration +parameters. This profile should be used with caution and is meant for advanced +users, who understand mclock and Ceph related configuration options. + + +.. index:: mclock; built-in profiles + +mClock Built-in Profiles - Locked Config Options +================================================= +The below sections describe the config options that are locked to certain values +in order to ensure mClock scheduler is able to provide predictable QoS. + +mClock Config Options +--------------------- +.. important:: These defaults cannot be changed using any of the config + subsytem commands like *config set* or via the *config daemon* or *config + tell* interfaces. Although the above command(s) report success, the mclock + QoS parameters are reverted to their respective built-in profile defaults. + +When a built-in profile is enabled, the mClock scheduler calculates the low +level mclock parameters [*reservation*, *weight*, *limit*] based on the profile +enabled for each client type. The mclock parameters are calculated based on +the max OSD capacity provided beforehand. As a result, the following mclock +config parameters cannot be modified when using any of the built-in profiles: + +- :confval:`osd_mclock_scheduler_client_res` +- :confval:`osd_mclock_scheduler_client_wgt` +- :confval:`osd_mclock_scheduler_client_lim` +- :confval:`osd_mclock_scheduler_background_recovery_res` +- :confval:`osd_mclock_scheduler_background_recovery_wgt` +- :confval:`osd_mclock_scheduler_background_recovery_lim` +- :confval:`osd_mclock_scheduler_background_best_effort_res` +- :confval:`osd_mclock_scheduler_background_best_effort_wgt` +- :confval:`osd_mclock_scheduler_background_best_effort_lim` + +Recovery/Backfill Options +------------------------- +.. warning:: The recommendation is to not change these options as the built-in + profiles are optimized based on them. Changing these defaults can result in + unexpected performance outcomes. + +The following recovery and backfill related Ceph options are overridden to +mClock defaults: + +- :confval:`osd_max_backfills` +- :confval:`osd_recovery_max_active` +- :confval:`osd_recovery_max_active_hdd` +- :confval:`osd_recovery_max_active_ssd` + +The following table shows the mClock defaults which is the same as the current +defaults. This is done to maximize the performance of the foreground (client) +operations: + ++----------------------------------------+------------------+----------------+ +| Config Option | Original Default | mClock Default | ++========================================+==================+================+ +| :confval:`osd_max_backfills` | 1 | 1 | ++----------------------------------------+------------------+----------------+ +| :confval:`osd_recovery_max_active` | 0 | 0 | ++----------------------------------------+------------------+----------------+ +| :confval:`osd_recovery_max_active_hdd` | 3 | 3 | ++----------------------------------------+------------------+----------------+ +| :confval:`osd_recovery_max_active_ssd` | 10 | 10 | ++----------------------------------------+------------------+----------------+ + +The above mClock defaults, can be modified only if necessary by enabling +:confval:`osd_mclock_override_recovery_settings` (default: false). The +steps for this is discussed in the +`Steps to Modify mClock Max Backfills/Recovery Limits`_ section. + +Sleep Options +------------- +If any mClock profile (including "custom") is active, the following Ceph config +sleep options are disabled (set to 0), + +- :confval:`osd_recovery_sleep` +- :confval:`osd_recovery_sleep_hdd` +- :confval:`osd_recovery_sleep_ssd` +- :confval:`osd_recovery_sleep_hybrid` +- :confval:`osd_scrub_sleep` +- :confval:`osd_delete_sleep` +- :confval:`osd_delete_sleep_hdd` +- :confval:`osd_delete_sleep_ssd` +- :confval:`osd_delete_sleep_hybrid` +- :confval:`osd_snap_trim_sleep` +- :confval:`osd_snap_trim_sleep_hdd` +- :confval:`osd_snap_trim_sleep_ssd` +- :confval:`osd_snap_trim_sleep_hybrid` + +The above sleep options are disabled to ensure that mclock scheduler is able to +determine when to pick the next op from its operation queue and transfer it to +the operation sequencer. This results in the desired QoS being provided across +all its clients. + + +.. index:: mclock; enable built-in profile + +Steps to Enable mClock Profile +============================== + +As already mentioned, the default mclock profile is set to *balanced*. +The other values for the built-in profiles include *high_client_ops* and +*high_recovery_ops*. + +If there is a requirement to change the default profile, then the option +:confval:`osd_mclock_profile` may be set during runtime by using the following +command: + + .. prompt:: bash # + + ceph config set osd.N osd_mclock_profile <value> + +For example, to change the profile to allow faster recoveries on "osd.0", the +following command can be used to switch to the *high_recovery_ops* profile: + + .. prompt:: bash # + + ceph config set osd.0 osd_mclock_profile high_recovery_ops + +.. note:: The *custom* profile is not recommended unless you are an advanced + user. + +And that's it! You are ready to run workloads on the cluster and check if the +QoS requirements are being met. + + +Switching Between Built-in and Custom Profiles +============================================== + +There may be situations requiring switching from a built-in profile to the +*custom* profile and vice-versa. The following sections outline the steps to +accomplish this. + +Steps to Switch From a Built-in to the Custom Profile +----------------------------------------------------- + +The following command can be used to switch to the *custom* profile: + + .. prompt:: bash # + + ceph config set osd osd_mclock_profile custom + +For example, to change the profile to *custom* on all OSDs, the following +command can be used: + + .. prompt:: bash # + + ceph config set osd osd_mclock_profile custom + +After switching to the *custom* profile, the desired mClock configuration +option may be modified. For example, to change the client reservation IOPS +ratio for a specific OSD (say osd.0) to 0.5 (or 50%), the following command +can be used: + + .. prompt:: bash # + + ceph config set osd.0 osd_mclock_scheduler_client_res 0.5 + +.. important:: Care must be taken to change the reservations of other services + like recovery and background best effort accordingly to ensure that the sum + of the reservations do not exceed the maximum proportion (1.0) of the IOPS + capacity of the OSD. + +.. tip:: The reservation and limit parameter allocations are per-shard based on + the type of backing device (HDD/SSD) under the OSD. See + :confval:`osd_op_num_shards_hdd` and :confval:`osd_op_num_shards_ssd` for + more details. + +Steps to Switch From the Custom Profile to a Built-in Profile +------------------------------------------------------------- + +Switching from the *custom* profile to a built-in profile requires an +intermediate step of removing the custom settings from the central config +database for the changes to take effect. + +The following sequence of commands can be used to switch to a built-in profile: + +#. Set the desired built-in profile using: + + .. prompt:: bash # + + ceph config set osd <mClock Configuration Option> + + For example, to set the built-in profile to ``high_client_ops`` on all + OSDs, run the following command: + + .. prompt:: bash # + + ceph config set osd osd_mclock_profile high_client_ops +#. Determine the existing custom mClock configuration settings in the central + config database using the following command: + + .. prompt:: bash # + + ceph config dump +#. Remove the custom mClock configuration settings determined in the previous + step from the central config database: + + .. prompt:: bash # + + ceph config rm osd <mClock Configuration Option> + + For example, to remove the configuration option + :confval:`osd_mclock_scheduler_client_res` that was set on all OSDs, run the + following command: + + .. prompt:: bash # + + ceph config rm osd osd_mclock_scheduler_client_res +#. After all existing custom mClock configuration settings have been removed + from the central config database, the configuration settings pertaining to + ``high_client_ops`` will come into effect. For e.g., to verify the settings + on osd.0 use: + + .. prompt:: bash # + + ceph config show osd.0 + +Switch Temporarily Between mClock Profiles +------------------------------------------ + +To switch between mClock profiles on a temporary basis, the following commands +may be used to override the settings: + +.. warning:: This section is for advanced users or for experimental testing. The + recommendation is to not use the below commands on a running cluster as it + could have unexpected outcomes. + +.. note:: The configuration changes on an OSD using the below commands are + ephemeral and are lost when it restarts. It is also important to note that + the config options overridden using the below commands cannot be modified + further using the *ceph config set osd.N ...* command. The changes will not + take effect until a given OSD is restarted. This is intentional, as per the + config subsystem design. However, any further modification can still be made + ephemerally using the commands mentioned below. + +#. Run the *injectargs* command as shown to override the mclock settings: + + .. prompt:: bash # + + ceph tell osd.N injectargs '--<mClock Configuration Option>=<value>' + + For example, the following command overrides the + :confval:`osd_mclock_profile` option on osd.0: + + .. prompt:: bash # + + ceph tell osd.0 injectargs '--osd_mclock_profile=high_recovery_ops' + + +#. An alternate command that can be used is: + + .. prompt:: bash # + + ceph daemon osd.N config set <mClock Configuration Option> <value> + + For example, the following command overrides the + :confval:`osd_mclock_profile` option on osd.0: + + .. prompt:: bash # + + ceph daemon osd.0 config set osd_mclock_profile high_recovery_ops + +The individual QoS-related config options for the *custom* profile can also be +modified ephemerally using the above commands. + + +Steps to Modify mClock Max Backfills/Recovery Limits +==================================================== + +This section describes the steps to modify the default max backfills or recovery +limits if the need arises. + +.. warning:: This section is for advanced users or for experimental testing. The + recommendation is to retain the defaults as is on a running cluster as + modifying them could have unexpected performance outcomes. The values may + be modified only if the cluster is unable to cope/showing poor performance + with the default settings or for performing experiments on a test cluster. + +.. important:: The max backfill/recovery options that can be modified are listed + in section `Recovery/Backfill Options`_. The modification of the mClock + default backfills/recovery limit is gated by the + :confval:`osd_mclock_override_recovery_settings` option, which is set to + *false* by default. Attempting to modify any default recovery/backfill + limits without setting the gating option will reset that option back to the + mClock defaults along with a warning message logged in the cluster log. Note + that it may take a few seconds for the default value to come back into + effect. Verify the limit using the *config show* command as shown below. + +#. Set the :confval:`osd_mclock_override_recovery_settings` config option on all + osds to *true* using: + + .. prompt:: bash # + + ceph config set osd osd_mclock_override_recovery_settings true + +#. Set the desired max backfill/recovery option using: + + .. prompt:: bash # + + ceph config set osd osd_max_backfills <value> + + For example, the following command modifies the :confval:`osd_max_backfills` + option on all osds to 5. + + .. prompt:: bash # + + ceph config set osd osd_max_backfills 5 + +#. Wait for a few seconds and verify the running configuration for a specific + OSD using: + + .. prompt:: bash # + + ceph config show osd.N | grep osd_max_backfills + + For example, the following command shows the running configuration of + :confval:`osd_max_backfills` on osd.0. + + .. prompt:: bash # + + ceph config show osd.0 | grep osd_max_backfills + +#. Reset the :confval:`osd_mclock_override_recovery_settings` config option on + all osds to *false* using: + + .. prompt:: bash # + + ceph config set osd osd_mclock_override_recovery_settings false + + +OSD Capacity Determination (Automated) +====================================== + +The OSD capacity in terms of total IOPS is determined automatically during OSD +initialization. This is achieved by running the OSD bench tool and overriding +the default value of ``osd_mclock_max_capacity_iops_[hdd, ssd]`` option +depending on the device type. No other action/input is expected from the user +to set the OSD capacity. + +.. note:: If you wish to manually benchmark OSD(s) or manually tune the + Bluestore throttle parameters, see section + `Steps to Manually Benchmark an OSD (Optional)`_. + +You may verify the capacity of an OSD after the cluster is brought up by using +the following command: + + .. prompt:: bash # + + ceph config show osd.N osd_mclock_max_capacity_iops_[hdd, ssd] + +For example, the following command shows the max capacity for "osd.0" on a Ceph +node whose underlying device type is SSD: + + .. prompt:: bash # + + ceph config show osd.0 osd_mclock_max_capacity_iops_ssd + +Mitigation of Unrealistic OSD Capacity From Automated Test +---------------------------------------------------------- +In certain conditions, the OSD bench tool may show unrealistic/inflated result +depending on the drive configuration and other environment related conditions. +To mitigate the performance impact due to this unrealistic capacity, a couple +of threshold config options depending on the osd's device type are defined and +used: + +- :confval:`osd_mclock_iops_capacity_threshold_hdd` = 500 +- :confval:`osd_mclock_iops_capacity_threshold_ssd` = 80000 + +The following automated step is performed: + +Fallback to using default OSD capacity (automated) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If OSD bench reports a measurement that exceeds the above threshold values +depending on the underlying device type, the fallback mechanism reverts to the +default value of :confval:`osd_mclock_max_capacity_iops_hdd` or +:confval:`osd_mclock_max_capacity_iops_ssd`. The threshold config options +can be reconfigured based on the type of drive used. Additionally, a cluster +warning is logged in case the measurement exceeds the threshold. For example, :: + + 2022-10-27T15:30:23.270+0000 7f9b5dbe95c0 0 log_channel(cluster) log [WRN] + : OSD bench result of 39546.479392 IOPS exceeded the threshold limit of + 25000.000000 IOPS for osd.1. IOPS capacity is unchanged at 21500.000000 + IOPS. The recommendation is to establish the osd's IOPS capacity using other + benchmark tools (e.g. Fio) and then override + osd_mclock_max_capacity_iops_[hdd|ssd]. + +If the default capacity doesn't accurately represent the OSD's capacity, the +following additional step is recommended to address this: + +Run custom drive benchmark if defaults are not accurate (manual) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If the default OSD capacity is not accurate, the recommendation is to run a +custom benchmark using your preferred tool (e.g. Fio) on the drive and then +override the ``osd_mclock_max_capacity_iops_[hdd, ssd]`` option as described +in the `Specifying Max OSD Capacity`_ section. + +This step is highly recommended until an alternate mechansim is worked upon. + +Steps to Manually Benchmark an OSD (Optional) +============================================= + +.. note:: These steps are only necessary if you want to override the OSD + capacity already determined automatically during OSD initialization. + Otherwise, you may skip this section entirely. + +.. tip:: If you have already determined the benchmark data and wish to manually + override the max osd capacity for an OSD, you may skip to section + `Specifying Max OSD Capacity`_. + + +Any existing benchmarking tool (e.g. Fio) can be used for this purpose. In this +case, the steps use the *Ceph OSD Bench* command described in the next section. +Regardless of the tool/command used, the steps outlined further below remain the +same. + +As already described in the :ref:`dmclock-qos` section, the number of +shards and the bluestore's throttle parameters have an impact on the mclock op +queues. Therefore, it is critical to set these values carefully in order to +maximize the impact of the mclock scheduler. + +:Number of Operational Shards: + We recommend using the default number of shards as defined by the + configuration options ``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and + ``osd_op_num_shards_ssd``. In general, a lower number of shards will increase + the impact of the mclock queues. + +:Bluestore Throttle Parameters: + We recommend using the default values as defined by + :confval:`bluestore_throttle_bytes` and + :confval:`bluestore_throttle_deferred_bytes`. But these parameters may also be + determined during the benchmarking phase as described below. + +OSD Bench Command Syntax +------------------------ + +The :ref:`osd-subsystem` section describes the OSD bench command. The syntax +used for benchmarking is shown below : + +.. prompt:: bash # + + ceph tell osd.N bench [TOTAL_BYTES] [BYTES_PER_WRITE] [OBJ_SIZE] [NUM_OBJS] + +where, + +* ``TOTAL_BYTES``: Total number of bytes to write +* ``BYTES_PER_WRITE``: Block size per write +* ``OBJ_SIZE``: Bytes per object +* ``NUM_OBJS``: Number of objects to write + +Benchmarking Test Steps Using OSD Bench +--------------------------------------- + +The steps below use the default shards and detail the steps used to determine +the correct bluestore throttle values (optional). + +#. Bring up your Ceph cluster and login to the Ceph node hosting the OSDs that + you wish to benchmark. +#. Run a simple 4KiB random write workload on an OSD using the following + commands: + + .. note:: Note that before running the test, caches must be cleared to get an + accurate measurement. + + For example, if you are running the benchmark test on osd.0, run the following + commands: + + .. prompt:: bash # + + ceph tell osd.0 cache drop + + .. prompt:: bash # + + ceph tell osd.0 bench 12288000 4096 4194304 100 + +#. Note the overall throughput(IOPS) obtained from the output of the osd bench + command. This value is the baseline throughput(IOPS) when the default + bluestore throttle options are in effect. +#. If the intent is to determine the bluestore throttle values for your + environment, then set the two options, :confval:`bluestore_throttle_bytes` + and :confval:`bluestore_throttle_deferred_bytes` to 32 KiB(32768 Bytes) each + to begin with. Otherwise, you may skip to the next section. +#. Run the 4KiB random write test as before using OSD bench. +#. Note the overall throughput from the output and compare the value + against the baseline throughput recorded in step 3. +#. If the throughput doesn't match with the baseline, increment the bluestore + throttle options by 2x and repeat steps 5 through 7 until the obtained + throughput is very close to the baseline value. + +For example, during benchmarking on a machine with NVMe SSDs, a value of 256 KiB +for both bluestore throttle and deferred bytes was determined to maximize the +impact of mclock. For HDDs, the corresponding value was 40 MiB, where the +overall throughput was roughly equal to the baseline throughput. Note that in +general for HDDs, the bluestore throttle values are expected to be higher when +compared to SSDs. + + +Specifying Max OSD Capacity +---------------------------- + +The steps in this section may be performed only if you want to override the +max osd capacity automatically set during OSD initialization. The option +``osd_mclock_max_capacity_iops_[hdd, ssd]`` for an OSD can be set by running the +following command: + + .. prompt:: bash # + + ceph config set osd.N osd_mclock_max_capacity_iops_[hdd,ssd] <value> + +For example, the following command sets the max capacity for a specific OSD +(say "osd.0") whose underlying device type is HDD to 350 IOPS: + + .. prompt:: bash # + + ceph config set osd.0 osd_mclock_max_capacity_iops_hdd 350 + +Alternatively, you may specify the max capacity for OSDs within the Ceph +configuration file under the respective [osd.N] section. See +:ref:`ceph-conf-settings` for more details. + + +.. index:: mclock; config settings + +mClock Config Options +===================== + +.. confval:: osd_mclock_profile +.. confval:: osd_mclock_max_capacity_iops_hdd +.. confval:: osd_mclock_max_capacity_iops_ssd +.. confval:: osd_mclock_max_sequential_bandwidth_hdd +.. confval:: osd_mclock_max_sequential_bandwidth_ssd +.. confval:: osd_mclock_force_run_benchmark_on_init +.. confval:: osd_mclock_skip_benchmark +.. confval:: osd_mclock_override_recovery_settings +.. confval:: osd_mclock_iops_capacity_threshold_hdd +.. confval:: osd_mclock_iops_capacity_threshold_ssd + +.. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf |