summaryrefslogtreecommitdiffstats
path: root/doc/rados/operations/pools.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rados/operations/pools.rst')
-rw-r--r--doc/rados/operations/pools.rst751
1 files changed, 751 insertions, 0 deletions
diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst
new file mode 100644
index 000000000..dda9e844e
--- /dev/null
+++ b/doc/rados/operations/pools.rst
@@ -0,0 +1,751 @@
+.. _rados_pools:
+
+=======
+ Pools
+=======
+Pools are logical partitions that are used to store objects.
+
+Pools provide:
+
+- **Resilience**: It is possible to set the number of OSDs that are allowed to
+ fail without any data being lost. If your cluster uses replicated pools, the
+ number of OSDs that can fail without data loss is equal to the number of
+ replicas.
+
+ For example: a typical configuration stores an object and two replicas
+ (copies) of each RADOS object (that is: ``size = 3``), but you can configure
+ the number of replicas on a per-pool basis. For `erasure-coded pools
+ <../erasure-code>`_, resilience is defined as the number of coding chunks
+ (for example, ``m = 2`` in the default **erasure code profile**).
+
+- **Placement Groups**: You can set the number of placement groups (PGs) for
+ the pool. In a typical configuration, the target number of PGs is
+ approximately one hundred PGs per OSD. This provides reasonable balancing
+ without consuming excessive computing resources. When setting up multiple
+ pools, be careful to set an appropriate number of PGs for each pool and for
+ the cluster as a whole. Each PG belongs to a specific pool: when multiple
+ pools use the same OSDs, make sure that the **sum** of PG replicas per OSD is
+ in the desired PG-per-OSD target range. To calculate an appropriate number of
+ PGs for your pools, use the `pgcalc`_ tool.
+
+- **CRUSH Rules**: When data is stored in a pool, the placement of the object
+ and its replicas (or chunks, in the case of erasure-coded pools) in your
+ cluster is governed by CRUSH rules. Custom CRUSH rules can be created for a
+ pool if the default rule does not fit your use case.
+
+- **Snapshots**: The command ``ceph osd pool mksnap`` creates a snapshot of a
+ pool.
+
+Pool Names
+==========
+
+Pool names beginning with ``.`` are reserved for use by Ceph's internal
+operations. Do not create or manipulate pools with these names.
+
+
+List Pools
+==========
+
+There are multiple ways to get the list of pools in your cluster.
+
+To list just your cluster's pool names (good for scripting), execute:
+
+.. prompt:: bash $
+
+ ceph osd pool ls
+
+::
+
+ .rgw.root
+ default.rgw.log
+ default.rgw.control
+ default.rgw.meta
+
+To list your cluster's pools with the pool number, run the following command:
+
+.. prompt:: bash $
+
+ ceph osd lspools
+
+::
+
+ 1 .rgw.root
+ 2 default.rgw.log
+ 3 default.rgw.control
+ 4 default.rgw.meta
+
+To list your cluster's pools with additional information, execute:
+
+.. prompt:: bash $
+
+ ceph osd pool ls detail
+
+::
+
+ pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 19 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+ pool 2 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+ pool 3 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 23 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+ pool 4 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 25 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw read_balance_score 4.00
+
+To get even more information, you can execute this command with the ``--format`` (or ``-f``) option and the ``json``, ``json-pretty``, ``xml`` or ``xml-pretty`` value.
+
+.. _createpool:
+
+Creating a Pool
+===============
+
+Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. Your
+Ceph configuration file contains a setting (namely, ``pg_num``) that determines
+the number of PGs. However, this setting's default value is NOT appropriate
+for most systems. In most cases, you should override this default value when
+creating your pool. For details on PG numbers, see `setting the number of
+placement groups`_
+
+For example:
+
+.. prompt:: bash $
+
+ osd_pool_default_pg_num = 128
+ osd_pool_default_pgp_num = 128
+
+.. note:: In Luminous and later releases, each pool must be associated with the
+ application that will be using the pool. For more information, see
+ `Associating a Pool with an Application`_ below.
+
+To create a pool, run one of the following commands:
+
+.. prompt:: bash $
+
+ ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
+ [crush-rule-name] [expected-num-objects]
+
+or:
+
+.. prompt:: bash $
+
+ ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
+ [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
+
+For a brief description of the elements of the above commands, consult the
+following:
+
+.. describe:: {pool-name}
+
+ The name of the pool. It must be unique.
+
+ :Type: String
+ :Required: Yes.
+
+.. describe:: {pg-num}
+
+ The total number of PGs in the pool. For details on calculating an
+ appropriate number, see :ref:`placement groups`. The default value ``8`` is
+ NOT suitable for most systems.
+
+ :Type: Integer
+ :Required: Yes.
+ :Default: 8
+
+.. describe:: {pgp-num}
+
+ The total number of PGs for placement purposes. This **should be equal to
+ the total number of PGs**, except briefly while ``pg_num`` is being
+ increased or decreased.
+
+ :Type: Integer
+ :Required: Yes. If no value has been specified in the command, then the default value is used (unless a different value has been set in Ceph configuration).
+ :Default: 8
+
+.. describe:: {replicated|erasure}
+
+ The pool type. This can be either **replicated** (to recover from lost OSDs
+ by keeping multiple copies of the objects) or **erasure** (to achieve a kind
+ of `generalized parity RAID <../erasure-code>`_ capability). The
+ **replicated** pools require more raw storage but can implement all Ceph
+ operations. The **erasure** pools require less raw storage but can perform
+ only some Ceph tasks and may provide decreased performance.
+
+ :Type: String
+ :Required: No.
+ :Default: replicated
+
+.. describe:: [crush-rule-name]
+
+ The name of the CRUSH rule to use for this pool. The specified rule must
+ exist; otherwise the command will fail.
+
+ :Type: String
+ :Required: No.
+ :Default: For **replicated** pools, it is the rule specified by the :confval:`osd_pool_default_crush_rule` configuration variable. This rule must exist. For **erasure** pools, it is the ``erasure-code`` rule if the ``default`` `erasure code profile`_ is used or the ``{pool-name}`` rule if not. This rule will be created implicitly if it doesn't already exist.
+
+.. describe:: [erasure-code-profile=profile]
+
+ For **erasure** pools only. Instructs Ceph to use the specified `erasure
+ code profile`_. This profile must be an existing profile as defined by **osd
+ erasure-code-profile set**.
+
+ :Type: String
+ :Required: No.
+
+.. _erasure code profile: ../erasure-code-profile
+
+.. describe:: --autoscale-mode=<on,off,warn>
+
+ - ``on``: the Ceph cluster will autotune or recommend changes to the number of PGs in your pool based on actual usage.
+ - ``warn``: the Ceph cluster will autotune or recommend changes to the number of PGs in your pool based on actual usage.
+ - ``off``: refer to :ref:`placement groups` for more information.
+
+ :Type: String
+ :Required: No.
+ :Default: The default behavior is determined by the :confval:`osd_pool_default_pg_autoscale_mode` option.
+
+.. describe:: [expected-num-objects]
+
+ The expected number of RADOS objects for this pool. By setting this value and
+ assigning a negative value to **filestore merge threshold**, you arrange
+ for the PG folder splitting to occur at the time of pool creation and
+ avoid the latency impact that accompanies runtime folder splitting.
+
+ :Type: Integer
+ :Required: No.
+ :Default: 0, no splitting at the time of pool creation.
+
+.. _associate-pool-to-application:
+
+Associating a Pool with an Application
+======================================
+
+Pools need to be associated with an application before they can be used. Pools
+that are intended for use with CephFS and pools that are created automatically
+by RGW are associated automatically. Pools that are intended for use with RBD
+should be initialized with the ``rbd`` tool (see `Block Device Commands`_ for
+more information).
+
+For other cases, you can manually associate a free-form application name to a
+pool by running the following command.:
+
+.. prompt:: bash $
+
+ ceph osd pool application enable {pool-name} {application-name}
+
+.. note:: CephFS uses the application name ``cephfs``, RBD uses the
+ application name ``rbd``, and RGW uses the application name ``rgw``.
+
+Setting Pool Quotas
+===================
+
+To set pool quotas for the maximum number of bytes and/or the maximum number of
+RADOS objects per pool, run the following command:
+
+.. prompt:: bash $
+
+ ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
+
+For example:
+
+.. prompt:: bash $
+
+ ceph osd pool set-quota data max_objects 10000
+
+To remove a quota, set its value to ``0``.
+
+
+Deleting a Pool
+===============
+
+To delete a pool, run a command of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
+
+To remove a pool, you must set the ``mon_allow_pool_delete`` flag to ``true``
+in the monitor's configuration. Otherwise, monitors will refuse to remove
+pools.
+
+For more information, see `Monitor Configuration`_.
+
+.. _Monitor Configuration: ../../configuration/mon-config-ref
+
+If there are custom rules for a pool that is no longer needed, consider
+deleting those rules.
+
+.. prompt:: bash $
+
+ ceph osd pool get {pool-name} crush_rule
+
+For example, if the custom rule is "123", check all pools to see whether they
+use the rule by running the following command:
+
+.. prompt:: bash $
+
+ ceph osd dump | grep "^pool" | grep "crush_rule 123"
+
+If no pools use this custom rule, then it is safe to delete the rule from the
+cluster.
+
+Similarly, if there are users with permissions restricted to a pool that no
+longer exists, consider deleting those users by running commands of the
+following forms:
+
+.. prompt:: bash $
+
+ ceph auth ls | grep -C 5 {pool-name}
+ ceph auth del {user}
+
+
+Renaming a Pool
+===============
+
+To rename a pool, run a command of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool rename {current-pool-name} {new-pool-name}
+
+If you rename a pool for which an authenticated user has per-pool capabilities,
+you must update the user's capabilities ("caps") to refer to the new pool name.
+
+
+Showing Pool Statistics
+=======================
+
+To show a pool's utilization statistics, run the following command:
+
+.. prompt:: bash $
+
+ rados df
+
+To obtain I/O information for a specific pool or for all pools, run a command
+of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool stats [{pool-name}]
+
+
+Making a Snapshot of a Pool
+===========================
+
+To make a snapshot of a pool, run a command of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool mksnap {pool-name} {snap-name}
+
+Removing a Snapshot of a Pool
+=============================
+
+To remove a snapshot of a pool, run a command of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool rmsnap {pool-name} {snap-name}
+
+.. _setpoolvalues:
+
+Setting Pool Values
+===================
+
+To assign values to a pool's configuration keys, run a command of the following
+form:
+
+.. prompt:: bash $
+
+ ceph osd pool set {pool-name} {key} {value}
+
+You may set values for the following keys:
+
+.. _compression_algorithm:
+
+.. describe:: compression_algorithm
+
+ :Description: Sets the inline compression algorithm used in storing data on the underlying BlueStore back end. This key's setting overrides the global setting :confval:`bluestore_compression_algorithm`.
+ :Type: String
+ :Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
+
+.. describe:: compression_mode
+
+ :Description: Sets the policy for the inline compression algorithm used in storing data on the underlying BlueStore back end. This key's setting overrides the global setting :confval:`bluestore_compression_mode`.
+ :Type: String
+ :Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
+
+.. describe:: compression_min_blob_size
+
+
+ :Description: Sets the minimum size for the compression of chunks: that is, chunks smaller than this are not compressed. This key's setting overrides the following global settings:
+
+ * :confval:`bluestore_compression_min_blob_size`
+ * :confval:`bluestore_compression_min_blob_size_hdd`
+ * :confval:`bluestore_compression_min_blob_size_ssd`
+
+ :Type: Unsigned Integer
+
+
+.. describe:: compression_max_blob_size
+
+ :Description: Sets the maximum size for chunks: that is, chunks larger than this are broken into smaller blobs of this size before compression is performed.
+ :Type: Unsigned Integer
+
+.. _size:
+
+.. describe:: size
+
+ :Description: Sets the number of replicas for objects in the pool. For further details, see `Setting the Number of RADOS Object Replicas`_. Replicated pools only.
+ :Type: Integer
+
+.. _min_size:
+
+.. describe:: min_size
+
+ :Description: Sets the minimum number of replicas required for I/O. For further details, see `Setting the Number of RADOS Object Replicas`_. For erasure-coded pools, this should be set to a value greater than 'k'. If I/O is allowed at the value 'k', then there is no redundancy and data will be lost in the event of a permanent OSD failure. For more information, see `Erasure Code <../erasure-code>`_
+ :Type: Integer
+ :Version: ``0.54`` and above
+
+.. _pg_num:
+
+.. describe:: pg_num
+
+ :Description: Sets the effective number of PGs to use when calculating data placement.
+ :Type: Integer
+ :Valid Range: ``0`` to ``mon_max_pool_pg_num``. If set to ``0``, the value of ``osd_pool_default_pg_num`` will be used.
+
+.. _pgp_num:
+
+.. describe:: pgp_num
+
+ :Description: Sets the effective number of PGs to use when calculating data placement.
+ :Type: Integer
+ :Valid Range: Between ``1`` and the current value of ``pg_num``.
+
+.. _crush_rule:
+
+.. describe:: crush_rule
+
+ :Description: Sets the CRUSH rule that Ceph uses to map object placement within the pool.
+ :Type: String
+
+.. _allow_ec_overwrites:
+
+.. describe:: allow_ec_overwrites
+
+ :Description: Determines whether writes to an erasure-coded pool are allowed to update only part of a RADOS object. This allows CephFS and RBD to use an EC (erasure-coded) pool for user data (but not for metadata). For more details, see `Erasure Coding with Overwrites`_.
+ :Type: Boolean
+
+ .. versionadded:: 12.2.0
+
+.. describe:: hashpspool
+
+ :Description: Sets and unsets the HASHPSPOOL flag on a given pool.
+ :Type: Integer
+ :Valid Range: 1 sets flag, 0 unsets flag
+
+.. _nodelete:
+
+.. describe:: nodelete
+
+ :Description: Sets and unsets the NODELETE flag on a given pool.
+ :Type: Integer
+ :Valid Range: 1 sets flag, 0 unsets flag
+ :Version: Version ``FIXME``
+
+.. _nopgchange:
+
+.. describe:: nopgchange
+
+ :Description: Sets and unsets the NOPGCHANGE flag on a given pool.
+ :Type: Integer
+ :Valid Range: 1 sets flag, 0 unsets flag
+ :Version: Version ``FIXME``
+
+.. _nosizechange:
+
+.. describe:: nosizechange
+
+ :Description: Sets and unsets the NOSIZECHANGE flag on a given pool.
+ :Type: Integer
+ :Valid Range: 1 sets flag, 0 unsets flag
+ :Version: Version ``FIXME``
+
+.. _bulk:
+
+.. describe:: bulk
+
+ :Description: Sets and unsets the bulk flag on a given pool.
+ :Type: Boolean
+ :Valid Range: ``true``/``1`` sets flag, ``false``/``0`` unsets flag
+
+.. _write_fadvise_dontneed:
+
+.. describe:: write_fadvise_dontneed
+
+ :Description: Sets and unsets the WRITE_FADVISE_DONTNEED flag on a given pool.
+ :Type: Integer
+ :Valid Range: ``1`` sets flag, ``0`` unsets flag
+
+.. _noscrub:
+
+.. describe:: noscrub
+
+ :Description: Sets and unsets the NOSCRUB flag on a given pool.
+ :Type: Integer
+ :Valid Range: ``1`` sets flag, ``0`` unsets flag
+
+.. _nodeep-scrub:
+
+.. describe:: nodeep-scrub
+
+ :Description: Sets and unsets the NODEEP_SCRUB flag on a given pool.
+ :Type: Integer
+ :Valid Range: ``1`` sets flag, ``0`` unsets flag
+
+.. _target_max_bytes:
+
+.. describe:: target_max_bytes
+
+ :Description: Ceph will begin flushing or evicting objects when the
+ ``max_bytes`` threshold is triggered.
+ :Type: Integer
+ :Example: ``1000000000000`` #1-TB
+
+.. _target_max_objects:
+
+.. describe:: target_max_objects
+
+ :Description: Ceph will begin flushing or evicting objects when the
+ ``max_objects`` threshold is triggered.
+ :Type: Integer
+ :Example: ``1000000`` #1M objects
+
+.. _fast_read:
+
+.. describe:: fast_read
+
+ :Description: For erasure-coded pools, if this flag is turned ``on``, the
+ read request issues "sub reads" to all shards, and then waits
+ until it receives enough shards to decode before it serves
+ the client. If *jerasure* or *isa* erasure plugins are in
+ use, then after the first *K* replies have returned, the
+ client's request is served immediately using the data decoded
+ from these replies. This approach sacrifices resources in
+ exchange for better performance. This flag is supported only
+ for erasure-coded pools.
+ :Type: Boolean
+ :Defaults: ``0``
+
+.. _scrub_min_interval:
+
+.. describe:: scrub_min_interval
+
+ :Description: Sets the minimum interval (in seconds) for successive scrubs of the pool's PGs when the load is low. If the default value of ``0`` is in effect, then the value of ``osd_scrub_min_interval`` from central config is used.
+
+ :Type: Double
+ :Default: ``0``
+
+.. _scrub_max_interval:
+
+.. describe:: scrub_max_interval
+
+ :Description: Sets the maximum interval (in seconds) for scrubs of the pool's PGs regardless of cluster load. If the value of ``scrub_max_interval`` is ``0``, then the value ``osd_scrub_max_interval`` from central config is used.
+
+ :Type: Double
+ :Default: ``0``
+
+.. _deep_scrub_interval:
+
+.. describe:: deep_scrub_interval
+
+ :Description: Sets the interval (in seconds) for pool “deep” scrubs of the pool's PGs. If the value of ``deep_scrub_interval`` is ``0``, the value ``osd_deep_scrub_interval`` from central config is used.
+
+ :Type: Double
+ :Default: ``0``
+
+.. _recovery_priority:
+
+.. describe:: recovery_priority
+
+ :Description: Setting this value adjusts a pool's computed reservation priority. This value must be in the range ``-10`` to ``10``. Any pool assigned a negative value will be given a lower priority than any new pools, so users are directed to assign negative values to low-priority pools.
+
+ :Type: Integer
+ :Default: ``0``
+
+
+.. _recovery_op_priority:
+
+.. describe:: recovery_op_priority
+
+ :Description: Sets the recovery operation priority for a specific pool's PGs. This overrides the general priority determined by :confval:`osd_recovery_op_priority`.
+
+ :Type: Integer
+ :Default: ``0``
+
+
+Getting Pool Values
+===================
+
+To get a value from a pool's key, run a command of the following form:
+
+.. prompt:: bash $
+
+ ceph osd pool get {pool-name} {key}
+
+
+You may get values from the following keys:
+
+
+``size``
+
+:Description: See size_.
+
+:Type: Integer
+
+
+``min_size``
+
+:Description: See min_size_.
+
+:Type: Integer
+:Version: ``0.54`` and above
+
+
+``pg_num``
+
+:Description: See pg_num_.
+
+:Type: Integer
+
+
+``pgp_num``
+
+:Description: See pgp_num_.
+
+:Type: Integer
+:Valid Range: Equal to or less than ``pg_num``.
+
+
+``crush_rule``
+
+:Description: See crush_rule_.
+
+
+``target_max_bytes``
+
+:Description: See target_max_bytes_.
+
+:Type: Integer
+
+
+``target_max_objects``
+
+:Description: See target_max_objects_.
+
+:Type: Integer
+
+
+``fast_read``
+
+:Description: See fast_read_.
+
+:Type: Boolean
+
+
+``scrub_min_interval``
+
+:Description: See scrub_min_interval_.
+
+:Type: Double
+
+
+``scrub_max_interval``
+
+:Description: See scrub_max_interval_.
+
+:Type: Double
+
+
+``deep_scrub_interval``
+
+:Description: See deep_scrub_interval_.
+
+:Type: Double
+
+
+``allow_ec_overwrites``
+
+:Description: See allow_ec_overwrites_.
+
+:Type: Boolean
+
+
+``recovery_priority``
+
+:Description: See recovery_priority_.
+
+:Type: Integer
+
+
+``recovery_op_priority``
+
+:Description: See recovery_op_priority_.
+
+:Type: Integer
+
+
+Setting the Number of RADOS Object Replicas
+===========================================
+
+To set the number of data replicas on a replicated pool, run a command of the
+following form:
+
+.. prompt:: bash $
+
+ ceph osd pool set {poolname} size {num-replicas}
+
+.. important:: The ``{num-replicas}`` argument includes the primary object
+ itself. For example, if you want there to be two replicas of the object in
+ addition to the original object (for a total of three instances of the
+ object) specify ``3`` by running the following command:
+
+.. prompt:: bash $
+
+ ceph osd pool set data size 3
+
+You may run the above command for each pool.
+
+.. Note:: An object might accept I/Os in degraded mode with fewer than ``pool
+ size`` replicas. To set a minimum number of replicas required for I/O, you
+ should use the ``min_size`` setting. For example, you might run the
+ following command:
+
+.. prompt:: bash $
+
+ ceph osd pool set data min_size 2
+
+This command ensures that no object in the data pool will receive I/O if it has
+fewer than ``min_size`` (in this case, two) replicas.
+
+
+Getting the Number of Object Replicas
+=====================================
+
+To get the number of object replicas, run the following command:
+
+.. prompt:: bash $
+
+ ceph osd dump | grep 'replicated size'
+
+Ceph will list pools and highlight the ``replicated size`` attribute. By
+default, Ceph creates two replicas of an object (a total of three copies, for a
+size of ``3``).
+
+Managing pools that are flagged with ``--bulk``
+===============================================
+See :ref:`managing_bulk_flagged_pools`.
+
+
+.. _pgcalc: https://old.ceph.com/pgcalc/
+.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
+.. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
+.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
+.. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
+.. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool