diff options
Diffstat (limited to 'doc/dev/crimson/pipeline.rst')
-rw-r--r-- | doc/dev/crimson/pipeline.rst | 97 |
1 files changed, 97 insertions, 0 deletions
diff --git a/doc/dev/crimson/pipeline.rst b/doc/dev/crimson/pipeline.rst new file mode 100644 index 000000000..e9115c6d7 --- /dev/null +++ b/doc/dev/crimson/pipeline.rst @@ -0,0 +1,97 @@ +============================== +The ``ClientRequest`` pipeline +============================== + +In crimson, exactly like in the classical OSD, a client request has data and +ordering dependencies which must be satisfied before processing (actually +a particular phase of) can begin. As one of the goals behind crimson is to +preserve the compatibility with the existing OSD incarnation, the same semantic +must be assured. An obvious example of such data dependency is the fact that +an OSD needs to have a version of OSDMap that matches the one used by the client +(``Message::get_min_epoch()``). + +If a dependency is not satisfied, the processing stops. It is crucial to note +the same must happen to all other requests that are sequenced-after (due to +their ordering requirements). + +There are a few cases when the blocking of a client request can happen. + + + ``ClientRequest::ConnectionPipeline::await_map`` + wait for particular OSDMap version is available at the OSD level + ``ClientRequest::ConnectionPipeline::get_pg`` + wait a particular PG becomes available on OSD + ``ClientRequest::PGPipeline::await_map`` + wait on a PG being advanced to particular epoch + ``ClientRequest::PGPipeline::wait_for_active`` + wait for a PG to become *active* (i.e. have ``is_active()`` asserted) + ``ClientRequest::PGPipeline::recover_missing`` + wait on an object to be recovered (i.e. leaving the ``missing`` set) + ``ClientRequest::PGPipeline::get_obc`` + wait on an object to be available for locking. The ``obc`` will be locked + before this operation is allowed to continue + ``ClientRequest::PGPipeline::process`` + wait if any other ``MOSDOp`` message is handled against this PG + +At any moment, a ``ClientRequest`` being served should be in one and only one +of the phases described above. Similarly, an object denoting particular phase +can host not more than a single ``ClientRequest`` the same time. At low-level +this is achieved with a combination of a barrier and an exclusive lock. +They implement the semantic of a semaphore with a single slot for these exclusive +phases. + +As the execution advances, request enters next phase and leaves the current one +freeing it for another ``ClientRequest`` instance. All these phases form a pipeline +which assures the order is preserved. + +These pipeline phases are divided into two ordering domains: ``ConnectionPipeline`` +and ``PGPipeline``. The former ensures order across a client connection while +the latter does that across a PG. That is, requests originating from the same +connection are executed in the same order as they were sent by the client. +The same applies to the PG domain: when requests from multiple connections reach +a PG, they are executed in the same order as they entered a first blocking phase +of the ``PGPipeline``. + +Comparison with the classical OSD +---------------------------------- +As the audience of this document are Ceph Developers, it seems reasonable to +match the phases of crimson's ``ClientRequest`` pipeline with the blocking +stages in the classical OSD. The names in the right column are names of +containers (lists and maps) used to implement these stages. They are also +already documented in the ``PG.h`` header. + ++----------------------------------------+--------------------------------------+ +| crimson | ceph-osd waiting list | ++========================================+======================================+ +|``ConnectionPipeline::await_map`` | ``OSDShardPGSlot::waiting`` and | +|``ConnectionPipeline::get_pg`` | ``OSDShardPGSlot::waiting_peering`` | ++----------------------------------------+--------------------------------------+ +|``PGPipeline::await_map`` | ``PG::waiting_for_map`` | ++----------------------------------------+--------------------------------------+ +|``PGPipeline::wait_for_active`` | ``PG::waiting_for_peered`` | +| +--------------------------------------+ +| | ``PG::waiting_for_flush`` | +| +--------------------------------------+ +| | ``PG::waiting_for_active`` | ++----------------------------------------+--------------------------------------+ +|To be done (``PG_STATE_LAGGY``) | ``PG::waiting_for_readable`` | ++----------------------------------------+--------------------------------------+ +|To be done | ``PG::waiting_for_scrub`` | ++----------------------------------------+--------------------------------------+ +|``PGPipeline::recover_missing`` | ``PG::waiting_for_unreadable_object``| +| +--------------------------------------+ +| | ``PG::waiting_for_degraded_object`` | ++----------------------------------------+--------------------------------------+ +|To be done (proxying) | ``PG::waiting_for_blocked_object`` | ++----------------------------------------+--------------------------------------+ +|``PGPipeline::get_obc`` | *obc rwlocks* | ++----------------------------------------+--------------------------------------+ +|``PGPipeline::process`` | ``PG::lock`` (roughly) | ++----------------------------------------+--------------------------------------+ + + +As the last word it might be worth to emphasize that the ordering implementations +in both classical OSD and in crimson are stricter than a theoretical minimum one +required by the RADOS protocol. For instance, we could parallelize read operations +targeting the same object at the price of extra complexity but we don't -- the +simplicity has won. |