diff options
Diffstat (limited to 'doc/rados/operations/pools.rst')
-rw-r--r-- | doc/rados/operations/pools.rst | 835 |
1 files changed, 835 insertions, 0 deletions
diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst new file mode 100644 index 00000000..18fdd3ee --- /dev/null +++ b/doc/rados/operations/pools.rst @@ -0,0 +1,835 @@ +======= + Pools +======= + +When you first deploy a cluster without creating a pool, Ceph uses the default +pools for storing data. A pool provides you with: + +- **Resilience**: You can set how many OSD are allowed to fail without losing data. + For replicated pools, it is the desired number of copies/replicas of an object. + A typical configuration stores an object and one additional copy + (i.e., ``size = 2``), but you can determine the number of copies/replicas. + For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks + (i.e. ``m=2`` in the **erasure code profile**) + +- **Placement Groups**: You can set the number of placement groups for the pool. + A typical configuration uses approximately 100 placement groups per OSD to + provide optimal balancing without using up too many computing resources. When + setting up multiple pools, be careful to ensure you set a reasonable number of + placement groups for both the pool and the cluster as a whole. + +- **CRUSH Rules**: When you store data in a pool, placement of the object + and its replicas (or chunks for erasure coded pools) in your cluster is governed + by CRUSH rules. You can create a custom CRUSH rule for your pool if the default + rule is not appropriate for your use case. + +- **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``, + you effectively take a snapshot of a particular pool. + +To organize data into pools, you can list, create, and remove pools. +You can also view the utilization statistics for each pool. + +List Pools +========== + +To list your cluster's pools, execute:: + + ceph osd lspools + + +.. _createpool: + +Create a Pool +============= + +Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_. +Ideally, you should override the default value for the number of placement +groups in your Ceph configuration file, as the default is NOT ideal. +For details on placement group numbers refer to `setting the number of placement groups`_ + +.. note:: Starting with Luminous, all pools need to be associated to the + application using the pool. See `Associate Pool to Application`_ below for + more information. + +For example:: + + osd pool default pg num = 100 + osd pool default pgp num = 100 + +To create a pool, execute:: + + ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \ + [crush-rule-name] [expected-num-objects] + ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \ + [erasure-code-profile] [crush-rule-name] [expected_num_objects] + +Where: + +``{pool-name}`` + +:Description: The name of the pool. It must be unique. +:Type: String +:Required: Yes. + +``{pg-num}`` + +:Description: The total number of placement groups for the pool. See `Placement + Groups`_ for details on calculating a suitable number. The + default value ``8`` is NOT suitable for most systems. + +:Type: Integer +:Required: Yes. +:Default: 8 + +``{pgp-num}`` + +:Description: The total number of placement groups for placement purposes. This + **should be equal to the total number of placement groups**, except + for placement group splitting scenarios. + +:Type: Integer +:Required: Yes. Picks up default or Ceph configuration value if not specified. +:Default: 8 + +``{replicated|erasure}`` + +:Description: The pool type which may either be **replicated** to + recover from lost OSDs by keeping multiple copies of the + objects or **erasure** to get a kind of + `generalized RAID5 <../erasure-code>`_ capability. + The **replicated** pools require more + raw storage but implement all Ceph operations. The + **erasure** pools require less raw storage but only + implement a subset of the available operations. + +:Type: String +:Required: No. +:Default: replicated + +``[crush-rule-name]`` + +:Description: The name of a CRUSH rule to use for this pool. The specified + rule must exist. + +:Type: String +:Required: No. +:Default: For **replicated** pools it is the rule specified by the ``osd + pool default crush rule`` config variable. This rule must exist. + For **erasure** pools it is ``erasure-code`` if the ``default`` + `erasure code profile`_ is used or ``{pool-name}`` otherwise. This + rule will be created implicitly if it doesn't exist already. + + +``[erasure-code-profile=profile]`` + +.. _erasure code profile: ../erasure-code-profile + +:Description: For **erasure** pools only. Use the `erasure code profile`_. It + must be an existing profile as defined by + **osd erasure-code-profile set**. + +:Type: String +:Required: No. + +When you create a pool, set the number of placement groups to a reasonable value +(e.g., ``100``). Consider the total number of placement groups per OSD too. +Placement groups are computationally expensive, so performance will degrade when +you have many pools with many placement groups (e.g., 50 pools with 100 +placement groups each). The point of diminishing returns depends upon the power +of the OSD host. + +See `Placement Groups`_ for details on calculating an appropriate number of +placement groups for your pool. + +.. _Placement Groups: ../placement-groups + +``[expected-num-objects]`` + +:Description: The expected number of objects for this pool. By setting this value ( + together with a negative **filestore merge threshold**), the PG folder + splitting would happen at the pool creation time, to avoid the latency + impact to do a runtime folder splitting. + +:Type: Integer +:Required: No. +:Default: 0, no splitting at the pool creation time. + +.. _associate-pool-to-application: + +Associate Pool to Application +============================= + +Pools need to be associated with an application before use. Pools that will be +used with CephFS or pools that are automatically created by RGW are +automatically associated. Pools that are intended for use with RBD should be +initialized using the ``rbd`` tool (see `Block Device Commands`_ for more +information). + +For other cases, you can manually associate a free-form application name to +a pool.:: + + ceph osd pool application enable {pool-name} {application-name} + +.. note:: CephFS uses the application name ``cephfs``, RBD uses the + application name ``rbd``, and RGW uses the application name ``rgw``. + +Set Pool Quotas +=============== + +You can set pool quotas for the maximum number of bytes and/or the maximum +number of objects per pool. :: + + ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}] + +For example:: + + ceph osd pool set-quota data max_objects 10000 + +To remove a quota, set its value to ``0``. + + +Delete a Pool +============= + +To delete a pool, execute:: + + ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it] + + +To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's +configuration. Otherwise they will refuse to remove a pool. + +See `Monitor Configuration`_ for more information. + +.. _Monitor Configuration: ../../configuration/mon-config-ref + +If you created your own rules for a pool you created, you should consider +removing them when you no longer need your pool:: + + ceph osd pool get {pool-name} crush_rule + +If the rule was "123", for example, you can check the other pools like so:: + + ceph osd dump | grep "^pool" | grep "crush_rule 123" + +If no other pools use that custom rule, then it's safe to delete that +rule from the cluster. + +If you created users with permissions strictly for a pool that no longer +exists, you should consider deleting those users too:: + + ceph auth ls | grep -C 5 {pool-name} + ceph auth del {user} + + +Rename a Pool +============= + +To rename a pool, execute:: + + ceph osd pool rename {current-pool-name} {new-pool-name} + +If you rename a pool and you have per-pool capabilities for an authenticated +user, you must update the user's capabilities (i.e., caps) with the new pool +name. + +Show Pool Statistics +==================== + +To show a pool's utilization statistics, execute:: + + rados df + +Additionally, to obtain I/O information for a specific pool or all, execute:: + + ceph osd pool stats [{pool-name}] + + +Make a Snapshot of a Pool +========================= + +To make a snapshot of a pool, execute:: + + ceph osd pool mksnap {pool-name} {snap-name} + +Remove a Snapshot of a Pool +=========================== + +To remove a snapshot of a pool, execute:: + + ceph osd pool rmsnap {pool-name} {snap-name} + +.. _setpoolvalues: + + +Set Pool Values +=============== + +To set a value to a pool, execute the following:: + + ceph osd pool set {pool-name} {key} {value} + +You may set values for the following keys: + +.. _compression_algorithm: + +``compression_algorithm`` + +:Description: Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``. + +:Type: String +:Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd`` + +``compression_mode`` + +:Description: Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``. + +:Type: String +:Valid Settings: ``none``, ``passive``, ``aggressive``, ``force`` + +``compression_min_blob_size`` + +:Description: Chunks smaller than this are never compressed. This setting overrides the `global setting <http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``. + +:Type: Unsigned Integer + +``compression_max_blob_size`` + +:Description: Chunks larger than this are broken into smaller blobs sizing + ``compression_max_blob_size`` before being compressed. + +:Type: Unsigned Integer + +.. _size: + +``size`` + +:Description: Sets the number of replicas for objects in the pool. + See `Set the Number of Object Replicas`_ for further details. + Replicated pools only. + +:Type: Integer + +.. _min_size: + +``min_size`` + +:Description: Sets the minimum number of replicas required for I/O. + See `Set the Number of Object Replicas`_ for further details. + Replicated pools only. + +:Type: Integer +:Version: ``0.54`` and above + +.. _pg_num: + +``pg_num`` + +:Description: The effective number of placement groups to use when calculating + data placement. +:Type: Integer +:Valid Range: Superior to ``pg_num`` current value. + +.. _pgp_num: + +``pgp_num`` + +:Description: The effective number of placement groups for placement to use + when calculating data placement. + +:Type: Integer +:Valid Range: Equal to or less than ``pg_num``. + +.. _crush_rule: + +``crush_rule`` + +:Description: The rule to use for mapping object placement in the cluster. +:Type: String + +.. _allow_ec_overwrites: + +``allow_ec_overwrites`` + +:Description: Whether writes to an erasure coded pool can update part + of an object, so cephfs and rbd can use it. See + `Erasure Coding with Overwrites`_ for more details. +:Type: Boolean +:Version: ``12.2.0`` and above + +.. _hashpspool: + +``hashpspool`` + +:Description: Set/Unset HASHPSPOOL flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag + +.. _nodelete: + +``nodelete`` + +:Description: Set/Unset NODELETE flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag +:Version: Version ``FIXME`` + +.. _nopgchange: + +``nopgchange`` + +:Description: Set/Unset NOPGCHANGE flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag +:Version: Version ``FIXME`` + +.. _nosizechange: + +``nosizechange`` + +:Description: Set/Unset NOSIZECHANGE flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag +:Version: Version ``FIXME`` + +.. _write_fadvise_dontneed: + +``write_fadvise_dontneed`` + +:Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag + +.. _noscrub: + +``noscrub`` + +:Description: Set/Unset NOSCRUB flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag + +.. _nodeep-scrub: + +``nodeep-scrub`` + +:Description: Set/Unset NODEEP_SCRUB flag on a given pool. +:Type: Integer +:Valid Range: 1 sets flag, 0 unsets flag + +.. _hit_set_type: + +``hit_set_type`` + +:Description: Enables hit set tracking for cache pools. + See `Bloom Filter`_ for additional information. + +:Type: String +:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object`` +:Default: ``bloom``. Other values are for testing. + +.. _hit_set_count: + +``hit_set_count`` + +:Description: The number of hit sets to store for cache pools. The higher + the number, the more RAM consumed by the ``ceph-osd`` daemon. + +:Type: Integer +:Valid Range: ``1``. Agent doesn't handle > 1 yet. + +.. _hit_set_period: + +``hit_set_period`` + +:Description: The duration of a hit set period in seconds for cache pools. + The higher the number, the more RAM consumed by the + ``ceph-osd`` daemon. + +:Type: Integer +:Example: ``3600`` 1hr + +.. _hit_set_fpp: + +``hit_set_fpp`` + +:Description: The false positive probability for the ``bloom`` hit set type. + See `Bloom Filter`_ for additional information. + +:Type: Double +:Valid Range: 0.0 - 1.0 +:Default: ``0.05`` + +.. _cache_target_dirty_ratio: + +``cache_target_dirty_ratio`` + +:Description: The percentage of the cache pool containing modified (dirty) + objects before the cache tiering agent will flush them to the + backing storage pool. + +:Type: Double +:Default: ``.4`` + +.. _cache_target_dirty_high_ratio: + +``cache_target_dirty_high_ratio`` + +:Description: The percentage of the cache pool containing modified (dirty) + objects before the cache tiering agent will flush them to the + backing storage pool with a higher speed. + +:Type: Double +:Default: ``.6`` + +.. _cache_target_full_ratio: + +``cache_target_full_ratio`` + +:Description: The percentage of the cache pool containing unmodified (clean) + objects before the cache tiering agent will evict them from the + cache pool. + +:Type: Double +:Default: ``.8`` + +.. _target_max_bytes: + +``target_max_bytes`` + +:Description: Ceph will begin flushing or evicting objects when the + ``max_bytes`` threshold is triggered. + +:Type: Integer +:Example: ``1000000000000`` #1-TB + +.. _target_max_objects: + +``target_max_objects`` + +:Description: Ceph will begin flushing or evicting objects when the + ``max_objects`` threshold is triggered. + +:Type: Integer +:Example: ``1000000`` #1M objects + + +``hit_set_grade_decay_rate`` + +:Description: Temperature decay rate between two successive hit_sets +:Type: Integer +:Valid Range: 0 - 100 +:Default: ``20`` + + +``hit_set_search_last_n`` + +:Description: Count at most N appearance in hit_sets for temperature calculation +:Type: Integer +:Valid Range: 0 - hit_set_count +:Default: ``1`` + + +.. _cache_min_flush_age: + +``cache_min_flush_age`` + +:Description: The time (in seconds) before the cache tiering agent will flush + an object from the cache pool to the storage pool. + +:Type: Integer +:Example: ``600`` 10min + +.. _cache_min_evict_age: + +``cache_min_evict_age`` + +:Description: The time (in seconds) before the cache tiering agent will evict + an object from the cache pool. + +:Type: Integer +:Example: ``1800`` 30min + +.. _fast_read: + +``fast_read`` + +:Description: On Erasure Coding pool, if this flag is turned on, the read request + would issue sub reads to all shards, and waits until it receives enough + shards to decode to serve the client. In the case of jerasure and isa + erasure plugins, once the first K replies return, client's request is + served immediately using the data decoded from these replies. This + helps to tradeoff some resources for better performance. Currently this + flag is only supported for Erasure Coding pool. + +:Type: Boolean +:Defaults: ``0`` + +.. _scrub_min_interval: + +``scrub_min_interval`` + +:Description: The minimum interval in seconds for pool scrubbing when + load is low. If it is 0, the value osd_scrub_min_interval + from config is used. + +:Type: Double +:Default: ``0`` + +.. _scrub_max_interval: + +``scrub_max_interval`` + +:Description: The maximum interval in seconds for pool scrubbing + irrespective of cluster load. If it is 0, the value + osd_scrub_max_interval from config is used. + +:Type: Double +:Default: ``0`` + +.. _deep_scrub_interval: + +``deep_scrub_interval`` + +:Description: The interval in seconds for pool “deep” scrubbing. If it + is 0, the value osd_deep_scrub_interval from config is used. + +:Type: Double +:Default: ``0`` + + +.. _recovery_priority: + +``recovery_priority`` + +:Description: When a value is set it will increase or decrease the computed + reservation priority. This value must be in the range -10 to + 10. Use a negative priority for less important pools so they + have lower priority than any new pools. + +:Type: Integer +:Default: ``0`` + + +.. _recovery_op_priority: + +``recovery_op_priority`` + +:Description: Specify the recovery operation priority for this pool instead of ``osd_recovery_op_priority``. + +:Type: Integer +:Default: ``0`` + + +Get Pool Values +=============== + +To get a value from a pool, execute the following:: + + ceph osd pool get {pool-name} {key} + +You may get values for the following keys: + +``size`` + +:Description: see size_ + +:Type: Integer + +``min_size`` + +:Description: see min_size_ + +:Type: Integer +:Version: ``0.54`` and above + +``pg_num`` + +:Description: see pg_num_ + +:Type: Integer + + +``pgp_num`` + +:Description: see pgp_num_ + +:Type: Integer +:Valid Range: Equal to or less than ``pg_num``. + + +``crush_rule`` + +:Description: see crush_rule_ + + +``hit_set_type`` + +:Description: see hit_set_type_ + +:Type: String +:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object`` + +``hit_set_count`` + +:Description: see hit_set_count_ + +:Type: Integer + + +``hit_set_period`` + +:Description: see hit_set_period_ + +:Type: Integer + + +``hit_set_fpp`` + +:Description: see hit_set_fpp_ + +:Type: Double + + +``cache_target_dirty_ratio`` + +:Description: see cache_target_dirty_ratio_ + +:Type: Double + + +``cache_target_dirty_high_ratio`` + +:Description: see cache_target_dirty_high_ratio_ + +:Type: Double + + +``cache_target_full_ratio`` + +:Description: see cache_target_full_ratio_ + +:Type: Double + + +``target_max_bytes`` + +:Description: see target_max_bytes_ + +:Type: Integer + + +``target_max_objects`` + +:Description: see target_max_objects_ + +:Type: Integer + + +``cache_min_flush_age`` + +:Description: see cache_min_flush_age_ + +:Type: Integer + + +``cache_min_evict_age`` + +:Description: see cache_min_evict_age_ + +:Type: Integer + + +``fast_read`` + +:Description: see fast_read_ + +:Type: Boolean + + +``scrub_min_interval`` + +:Description: see scrub_min_interval_ + +:Type: Double + + +``scrub_max_interval`` + +:Description: see scrub_max_interval_ + +:Type: Double + + +``deep_scrub_interval`` + +:Description: see deep_scrub_interval_ + +:Type: Double + + +``allow_ec_overwrites`` + +:Description: see allow_ec_overwrites_ + +:Type: Boolean + + +``recovery_priority`` + +:Description: see recovery_priority_ + +:Type: Integer + + +``recovery_op_priority`` + +:Description: see recovery_op_priority_ + +:Type: Integer + + +Set the Number of Object Replicas +================================= + +To set the number of object replicas on a replicated pool, execute the following:: + + ceph osd pool set {poolname} size {num-replicas} + +.. important:: The ``{num-replicas}`` includes the object itself. + If you want the object and two copies of the object for a total of + three instances of the object, specify ``3``. + +For example:: + + ceph osd pool set data size 3 + +You may execute this command for each pool. **Note:** An object might accept +I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum +number of required replicas for I/O, you should use the ``min_size`` setting. +For example:: + + ceph osd pool set data min_size 2 + +This ensures that no object in the data pool will receive I/O with fewer than +``min_size`` replicas. + + +Get the Number of Object Replicas +================================= + +To get the number of object replicas, execute the following:: + + ceph osd dump | grep 'replicated size' + +Ceph will list the pools, with the ``replicated size`` attribute highlighted. +By default, ceph creates two replicas of an object (a total of three copies, or +a size of 3). + + + +.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref +.. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter +.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups +.. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites +.. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool + |