1 files changed, 302 insertions, 0 deletions
diff --git a/doc/man/8/crushtool.rst b/doc/man/8/crushtool.rst
new file mode 100644
index 000000000..4c8486596
--- /dev/null
+++ b/doc/man/8/crushtool.rst
@@ -0,0 +1,302 @@
+:orphan:
+
+==========================================
+ crushtool -- CRUSH map manipulation tool
+==========================================
+
+.. program:: crushtool
+
+Synopsis
+========
+
+| **crushtool** ( -d *map* | -c *map.txt* | --build --num_osds *numosds*
+  *layer1* *...* | --test ) [ -o *outfile* ]
+
+
+Description
+===========
+
+**crushtool** is a utility that lets you create, compile, decompile
+and test CRUSH map files.
+
+CRUSH is a pseudo-random data distribution algorithm that efficiently
+maps input values (which, in the context of Ceph, correspond to Placement
+Groups) across a heterogeneous, hierarchically structured device map.
+The algorithm was originally described in detail in the following paper
+(although it has evolved some since then)::
+
+   http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
+
+The tool has four modes of operation.
+
+.. option:: --compile|-c map.txt
+
+   will compile a plaintext map.txt into a binary map file.
+
+.. option:: --decompile|-d map
+
+   will take the compiled map and decompile it into a plaintext source
+   file, suitable for editing.
+
+.. option:: --build --num_osds {num-osds} layer1 ...
+
+   will create map with the given layer structure. See below for a
+   detailed explanation.
+
+.. option:: --test
+
+   will perform a dry run of a CRUSH mapping for a range of input
+   values ``[--min-x,--max-x]`` (default ``[0,1023]``) which can be
+   thought of as simulated Placement Groups. See below for a more
+   detailed explanation.
+
+Unlike other Ceph tools, **crushtool** does not accept generic options
+such as **--debug-crush** from the command line. They can, however, be
+provided via the CEPH_ARGS environment variable. For instance, to
+silence all output from the CRUSH subsystem::
+
+    CEPH_ARGS="--debug-crush 0" crushtool ...
+
+
+Running tests with --test
+=========================
+
+The test mode will use the input crush map ( as specified with **-i
+map** ) and perform a dry run of CRUSH mapping or random placement
+(if **--simulate** is set ). On completion, two kinds of reports can be
+created. 
+1) The **--show-...** option outputs human readable information
+on stderr. 
+2) The **--output-csv** option creates CSV files that are
+documented by the **--help-output** option.
+
+Note: Each Placement Group (PG) has an integer ID which can be obtained
+from ``ceph pg dump`` (for example PG 2.2f means pool id 2, PG id 32).
+The pool and PG IDs are combined by a function to get a value which is
+given to CRUSH to map it to OSDs. crushtool does not know about PGs or
+pools; it only runs simulations by mapping values in the range
+``[--min-x,--max-x]``.
+
+
+.. option:: --show-statistics
+
+   Displays a summary of the distribution. For instance::
+
+       rule 1 (metadata) num_rep 5 result size == 5:	1024/1024
+
+   shows that rule **1** which is named **metadata** successfully
+   mapped **1024** values to **result size == 5** devices when trying
+   to map them to **num_rep 5** replicas. When it fails to provide the
+   required mapping, presumably because the number of **tries** must
+   be increased, a breakdown of the failures is displayed. For instance::
+
+       rule 1 (metadata) num_rep 10 result size == 8:	4/1024
+       rule 1 (metadata) num_rep 10 result size == 9:	93/1024
+       rule 1 (metadata) num_rep 10 result size == 10:	927/1024
+
+   shows that although **num_rep 10** replicas were required, **4**
+   out of **1024** values ( **4/1024** ) were mapped to **result size
+   == 8** devices only.
+
+.. option:: --show-mappings
+
+   Displays the mapping of each value in the range ``[--min-x,--max-x]``.
+   For instance::
+
+       CRUSH rule 1 x 24 [11,6]
+
+   shows that value **24** is mapped to devices **[11,6]** by rule
+   **1**.
+
+   One of the following is required when using the ``--show-mappings`` option:
+   
+        (a) ``--num-rep`` 
+        (b) both ``--min-rep`` and ``--max-rep``
+
+   ``--num-rep`` stands for "number of replicas, indicates the number of
+   replicas in a pool, and is used to specify an exact number of replicas (for
+   example ``--num-rep 5``). ``--min-rep`` and ``--max-rep`` are used together
+   to specify a range of replicas (for example, ``--min-rep 1 --max-rep 10``).
+
+.. option:: --show-bad-mappings
+
+   Displays which value failed to be mapped to the required number of
+   devices. For instance::
+
+     bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
+
+   shows that when rule **1** was required to map **7** devices, it
+   could map only six : **[8,10,2,11,6,9]**.
+
+.. option:: --show-utilization
+
+   Displays the expected and actual utilization for each device, for
+   each number of replicas. For instance::
+
+     device 0: stored : 951      expected : 853.333
+     device 1: stored : 963      expected : 853.333
+     ...
+
+   shows that device **0** stored **951** values and was expected to store **853**.
+   Implies **--show-statistics**.
+
+.. option:: --show-utilization-all
+
+   Displays the same as **--show-utilization** but does not suppress
+   output when the weight of a device is zero.
+   Implies **--show-statistics**.
+
+.. option:: --show-choose-tries
+
+   Displays how many attempts were needed to find a device mapping.
+   For instance::
+
+      0:     95224
+      1:      3745
+      2:      2225
+      ..
+
+   shows that **95224** mappings succeeded without retries, **3745**
+   mappings succeeded with one attempts, etc. There are as many rows
+   as the value of the **--set-choose-total-tries** option.
+
+.. option:: --output-csv
+
+   Creates CSV files (in the current directory) containing information
+   documented by **--help-output**. The files are named after the rule
+   used when collecting the statistics. For instance, if the rule
+   : 'metadata' is used, the CSV files will be::
+
+      metadata-absolute_weights.csv
+      metadata-device_utilization.csv
+      ...
+
+   The first line of the file shortly explains the column layout. For
+   instance::
+
+      metadata-absolute_weights.csv
+      Device ID, Absolute Weight
+      0,1
+      ...
+
+.. option:: --output-name NAME
+
+   Prepend **NAME** to the file names generated when **--output-csv**
+   is specified. For instance **--output-name FOO** will create
+   files::
+
+      FOO-metadata-absolute_weights.csv
+      FOO-metadata-device_utilization.csv
+      ...
+
+The **--set-...** options can be used to modify the tunables of the
+input crush map. The input crush map is modified in
+memory. For example::
+
+      $ crushtool -i mymap --test --show-bad-mappings
+      bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
+
+could be fixed by increasing the **choose-total-tries** as follows:
+
+      $ crushtool -i mymap --test \
+          --show-bad-mappings \
+          --set-choose-total-tries 500
+
+Building a map with --build
+===========================
+
+The build mode will generate hierarchical maps. The first argument
+specifies the number of devices (leaves) in the CRUSH hierarchy. Each
+layer describes how the layer (or devices) preceding it should be
+grouped.
+
+Each layer consists of::
+
+       bucket ( uniform | list | tree | straw | straw2 ) size
+
+The **bucket** is the type of the buckets in the layer
+(e.g. "rack"). Each bucket name will be built by appending a unique
+number to the **bucket** string (e.g. "rack0", "rack1"...).
+
+The second component is the type of bucket: **straw** should be used
+most of the time.
+
+The third component is the maximum size of the bucket. A size of zero
+means a bucket of infinite capacity.
+
+
+Example
+=======
+
+Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
+each node contains 4 storage devices for Ceph OSD Daemons. This configuration
+allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
+leaving an extra 2U for a rack switch.
+
+To reflect our hierarchy of devices, nodes, racks and rows, we would execute
+the following::
+
+    $ crushtool -o crushmap --build --num_osds 320 \
+           node straw 4 \
+           rack straw 20 \
+           row straw 2 \
+           root straw 0
+    # id	weight	type name	reweight
+    -87	320	root root
+    -85	160		row row0
+    -81	80			rack rack0
+    -1	4				node node0
+    0	1					osd.0	1
+    1	1					osd.1	1
+    2	1					osd.2	1
+    3	1					osd.3	1
+    -2	4				node node1
+    4	1					osd.4	1
+    5	1					osd.5	1
+    ...
+
+CRUSH rules are created so the generated crushmap can be
+tested. They are the same rules as the ones created by default when
+creating a new Ceph cluster. They can be further edited with::
+
+       # decompile
+       crushtool -d crushmap -o map.txt
+
+       # edit
+       emacs map.txt
+
+       # recompile
+       crushtool -c map.txt -o crushmap
+
+Reclassify
+==========
+
+The *reclassify* function allows users to transition from older maps that
+maintain parallel hierarchies for OSDs of different types to a modern CRUSH
+map that makes use of the *device class* feature.  For more information,
+see https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
+
+Example output from --test
+==========================
+
+See https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
+for sample ``crushtool --test`` commands and output produced thereby.
+
+Availability
+============
+
+**crushtool** is part of Ceph, a massively scalable, open-source, distributed storage system. Please
+refer to the Ceph documentation at https://docs.ceph.com for more
+information.
+
+
+See also
+========
+
+:doc:`ceph <ceph>`\(8),
+:doc:`osdmaptool <osdmaptool>`\(8),
+
+Authors
+=======
+
+John Wilkins, Sage Weil, Loic Dachary