diff options
Diffstat (limited to 'doc/dev/osd_internals/pg_removal.rst')
-rw-r--r-- | doc/dev/osd_internals/pg_removal.rst | 56 |
1 files changed, 56 insertions, 0 deletions
diff --git a/doc/dev/osd_internals/pg_removal.rst b/doc/dev/osd_internals/pg_removal.rst new file mode 100644 index 000000000..c5fe0e1ab --- /dev/null +++ b/doc/dev/osd_internals/pg_removal.rst @@ -0,0 +1,56 @@ +========== +PG Removal +========== + +See OSD::_remove_pg, OSD::RemoveWQ + +There are two ways for a pg to be removed from an OSD: + + 1. MOSDPGRemove from the primary + 2. OSD::advance_map finds that the pool has been removed + +In either case, our general strategy for removing the pg is to +atomically set the metadata objects (pg->log_oid, pg->biginfo_oid) to +backfill and asynchronously remove the pg collections. We do not do +this inline because scanning the collections to remove the objects is +an expensive operation. + +OSDService::deleting_pgs tracks all pgs in the process of being +deleted. Each DeletingState object in deleting_pgs lives while at +least one reference to it remains. Each item in RemoveWQ carries a +reference to the DeletingState for the relevant pg such that +deleting_pgs.lookup(pgid) will return a null ref only if there are no +collections currently being deleted for that pg. + +The DeletingState for a pg also carries information about the status +of the current deletion and allows the deletion to be cancelled. +The possible states are: + + 1. QUEUED: the PG is in the RemoveWQ + 2. CLEARING_DIR: the PG's contents are being removed synchronously + 3. DELETING_DIR: the PG's directories and metadata being queued for removal + 4. DELETED_DIR: the final removal transaction has been queued + 5. CANCELED: the deletion has been cancelled + +In 1 and 2, the deletion can be cancelled. Each state transition +method (and check_canceled) returns false if deletion has been +cancelled and true if the state transition was successful. Similarly, +try_stop_deletion() returns true if it succeeds in cancelling the +deletion. Additionally, try_stop_deletion() in the event that it +fails to stop the deletion will not return until the final removal +transaction is queued. This ensures that any operations queued after +that point will be ordered after the pg deletion. + +OSD::_create_lock_pg must handle two cases: + + 1. Either there is no DeletingStateRef for the pg, or it failed to cancel + 2. We succeeded in cancelling the deletion. + +In case 1., we proceed as if there were no deletion occurring, except that +we avoid writing to the PG until the deletion finishes. In case 2., we +proceed as in case 1., except that we first mark the PG as backfilling. + +Similarly, OSD::osr_registry ensures that the OpSequencers for those +pgs can be reused for a new pg if created before the old one is fully +removed, ensuring that operations on the new pg are sequenced properly +with respect to operations on the old one. |