summaryrefslogtreecommitdiffstats
path: root/doc/cephfs/disaster-recovery.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/cephfs/disaster-recovery.rst')
-rw-r--r--doc/cephfs/disaster-recovery.rst61
1 files changed, 61 insertions, 0 deletions
diff --git a/doc/cephfs/disaster-recovery.rst b/doc/cephfs/disaster-recovery.rst
new file mode 100644
index 00000000..71344e90
--- /dev/null
+++ b/doc/cephfs/disaster-recovery.rst
@@ -0,0 +1,61 @@
+.. _cephfs-disaster-recovery:
+
+Disaster recovery
+=================
+
+Metadata damage and repair
+--------------------------
+
+If a filesystem has inconsistent or missing metadata, it is considered
+*damaged*. You may find out about damage from a health message, or in some
+unfortunate cases from an assertion in a running MDS daemon.
+
+Metadata damage can result either from data loss in the underlying RADOS
+layer (e.g. multiple disk failures that lose all copies of a PG), or from
+software bugs.
+
+CephFS includes some tools that may be able to recover a damaged filesystem,
+but to use them safely requires a solid understanding of CephFS internals.
+The documentation for these potentially dangerous operations is on a
+separate page: :ref:`disaster-recovery-experts`.
+
+Data pool damage (files affected by lost data PGs)
+--------------------------------------------------
+
+If a PG is lost in a *data* pool, then the filesystem will continue
+to operate normally, but some parts of some files will simply
+be missing (reads will return zeros).
+
+Losing a data PG may affect many files. Files are split into many objects,
+so identifying which files are affected by loss of particular PGs requires
+a full scan over all object IDs that may exist within the size of a file.
+This type of scan may be useful for identifying which files require
+restoring from a backup.
+
+.. danger::
+
+ This command does not repair any metadata, so when restoring files in
+ this case you must *remove* the damaged file, and replace it in order
+ to have a fresh inode. Do not overwrite damaged files in place.
+
+If you know that objects have been lost from PGs, use the ``pg_files``
+subcommand to scan for files that may have been damaged as a result:
+
+::
+
+ cephfs-data-scan pg_files <path> <pg id> [<pg id>...]
+
+For example, if you have lost data from PGs 1.4 and 4.5, and you would like
+to know which files under /home/bob might have been damaged:
+
+::
+
+ cephfs-data-scan pg_files /home/bob 1.4 4.5
+
+The output will be a list of paths to potentially damaged files, one
+per line.
+
+Note that this command acts as a normal CephFS client to find all the
+files in the filesystem and read their layouts, so the MDS must be
+up and running.
+