summaryrefslogtreecommitdiffstats
path: root/doc/radosgw/orphans.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/radosgw/orphans.rst')
-rw-r--r--doc/radosgw/orphans.rst117
1 files changed, 117 insertions, 0 deletions
diff --git a/doc/radosgw/orphans.rst b/doc/radosgw/orphans.rst
new file mode 100644
index 000000000..bf6b10edf
--- /dev/null
+++ b/doc/radosgw/orphans.rst
@@ -0,0 +1,117 @@
+==================================
+Orphan List and Associated Tooling
+==================================
+
+.. version added:: Luminous
+
+.. contents::
+
+Orphans are RADOS objects that are left behind after their associated
+RGW objects are removed. Normally these RADOS objects are removed
+automatically, either immediately or through a process known as
+"garbage collection". Over the history of RGW, however, there may have
+been bugs that prevented these RADOS objects from being deleted, and
+these RADOS objects may be consuming space on the Ceph cluster without
+being of any use. From the perspective of RGW, we call such RADOS
+objects "orphans".
+
+Orphans Find -- DEPRECATED
+--------------------------
+
+The `radosgw-admin` tool has/had three subcommands to help manage
+orphans, however these subcommands are (or will soon be)
+deprecated. These subcommands are:
+
+.. prompt:: bash #
+
+ radosgw-admin orphans find ...
+ radosgw-admin orphans finish ...
+ radosgw-admin orphans list-jobs ...
+
+There are two key problems with these subcommands, however. First,
+these subcommands have not been actively maintained and therefore have
+not tracked RGW as it has evolved in terms of features and updates. As
+a result the confidence that these subcommands can accurately identify
+true orphans is presently low.
+
+Second, these subcommands store intermediate results on the cluster
+itself. This can be problematic when cluster administrators are
+confronting insufficient storage space and want to remove orphans as a
+means of addressing the issue. The intermediate results could strain
+the existing cluster storage capacity even further.
+
+For these reasons "orphans find" has been deprecated.
+
+Orphan List
+-----------
+
+Because "orphans find" has been deprecated, RGW now includes an
+additional tool -- 'rgw-orphan-list'. When run it will list the
+available pools and prompt the user to enter the name of the data
+pool. At that point the tool will, perhaps after an extended period of
+time, produce a local file containing the RADOS objects from the
+designated pool that appear to be orphans. The administrator is free
+to examine this file and the decide on a course of action, perhaps
+removing those RADOS objects from the designated pool.
+
+All intermediate results are stored on the local file system rather
+than the Ceph cluster. So running the 'rgw-orphan-list' tool should
+have no appreciable impact on the amount of cluster storage consumed.
+
+WARNING: Experimental Status
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'rgw-orphan-list' tool is new and therefore currently considered
+experimental. The list of orphans produced should be "sanity checked"
+before being used for a large delete operation.
+
+WARNING: Specifying a Data Pool
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If a pool other than an RGW data pool is specified, the results of the
+tool will be erroneous. All RADOS objects found on such a pool will
+falsely be designated as orphans.
+
+WARNING: Unindexed Buckets
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+RGW allows for unindexed buckets, that is buckets that do not maintain
+an index of their contents. This is not a typical configuration, but
+it is supported. Because the 'rgw-orphan-list' tool uses the bucket
+indices to determine what RADOS objects should exist, objects in the
+unindexed buckets will falsely be listed as orphans.
+
+
+RADOS List
+----------
+
+One of the sub-steps in computing a list of orphans is to map each RGW
+object into its corresponding set of RADOS objects. This is done using
+a subcommand of 'radosgw-admin'.
+
+.. prompt:: bash #
+
+ radosgw-admin bucket radoslist [--bucket={bucket-name}]
+
+The subcommand will produce a list of RADOS objects that support all
+of the RGW objects. If a bucket is specified then the subcommand will
+only produce a list of RADOS objects that correspond back the RGW
+objects in the specified bucket.
+
+Note: Shared Bucket Markers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some administrators will be aware of the coding schemes used to name
+the RADOS objects that correspond to RGW objects, which include a
+"marker" unique to a given bucket.
+
+RADOS objects that correspond with the contents of one RGW bucket,
+however, may contain a marker that specifies a different bucket. This
+behavior is a consequence of the "shallow copy" optimization used by
+RGW. When larger objects are copied from bucket to bucket, only the
+"head" objects are actually copied, and the tail objects are
+shared. Those shared objects will contain the marker of the original
+bucket.
+
+.. _Data Layout in RADOS : ../layout
+.. _Pool Placement and Storage Classes : ../placement