diff options
Diffstat (limited to 'doc/cephfs')
-rw-r--r-- | doc/cephfs/add-remove-mds.rst | 14 | ||||
-rw-r--r-- | doc/cephfs/cephfs-io-path.rst | 2 | ||||
-rw-r--r-- | doc/cephfs/cephfs-mirroring.rst | 9 | ||||
-rw-r--r-- | doc/cephfs/cephfs-top.rst | 56 | ||||
-rw-r--r-- | doc/cephfs/client-auth.rst | 118 | ||||
-rw-r--r-- | doc/cephfs/disaster-recovery-experts.rst | 2 | ||||
-rw-r--r-- | doc/cephfs/fs-volumes.rst | 39 | ||||
-rw-r--r-- | doc/cephfs/health-messages.rst | 16 | ||||
-rw-r--r-- | doc/cephfs/multimds.rst | 51 | ||||
-rw-r--r-- | doc/cephfs/quota.rst | 8 | ||||
-rw-r--r-- | doc/cephfs/snap-schedule.rst | 10 | ||||
-rw-r--r-- | doc/cephfs/standby.rst | 14 |
12 files changed, 305 insertions, 34 deletions
diff --git a/doc/cephfs/add-remove-mds.rst b/doc/cephfs/add-remove-mds.rst index 4f5ee06aa..2fec7873d 100644 --- a/doc/cephfs/add-remove-mds.rst +++ b/doc/cephfs/add-remove-mds.rst @@ -1,11 +1,13 @@ .. _cephfs_add_remote_mds: -.. note:: - It is highly recommended to use :doc:`/cephadm/index` or another Ceph - orchestrator for setting up the ceph cluster. Use this approach only if you - are setting up the ceph cluster manually. If one still intends to use the - manual way for deploying MDS daemons, :doc:`/cephadm/services/mds/` can - also be used. +.. warning:: The material on this page is to be used only for manually setting + up a Ceph cluster. If you intend to use an automated tool such as + :doc:`/cephadm/index` to set up a Ceph cluster, do not use the + instructions on this page. + +.. note:: If you are certain that you know what you are doing and you intend to + manually deploy MDS daemons, see :doc:`/cephadm/services/mds/` before + proceeding. ============================ Deploying Metadata Servers diff --git a/doc/cephfs/cephfs-io-path.rst b/doc/cephfs/cephfs-io-path.rst index 8c7810ba0..d5ae17197 100644 --- a/doc/cephfs/cephfs-io-path.rst +++ b/doc/cephfs/cephfs-io-path.rst @@ -47,4 +47,4 @@ client cache. | MDSs | -=-------> | OSDs | +---------------------+ +--------------------+ -.. _Architecture: ../architecture +.. _Architecture: ../../architecture diff --git a/doc/cephfs/cephfs-mirroring.rst b/doc/cephfs/cephfs-mirroring.rst index fd00a1eef..973a2affa 100644 --- a/doc/cephfs/cephfs-mirroring.rst +++ b/doc/cephfs/cephfs-mirroring.rst @@ -93,6 +93,15 @@ providing high-availability. .. note:: Deploying a single mirror daemon is recommended. Running multiple daemons is untested. +The following file types are supported by the mirroring: + +- Regular files (-) +- Directory files (d) +- Symbolic link file (l) + +The other file types are ignored by the mirroring. So they won't be +available on a successfully synchronized peer. + The mirroring module is disabled by default. To enable the mirroring module, run the following command: diff --git a/doc/cephfs/cephfs-top.rst b/doc/cephfs/cephfs-top.rst index 49439a4bd..1588c4f5c 100644 --- a/doc/cephfs/cephfs-top.rst +++ b/doc/cephfs/cephfs-top.rst @@ -63,6 +63,62 @@ By default, `cephfs-top` uses `client.fstop` user to connect to a Ceph cluster:: $ ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r' $ cephfs-top +Description of Fields +--------------------- + +1. chit : Cap hit + Percentage of file capability hits over total number of caps + +2. dlease : Dentry lease + Percentage of dentry leases handed out over the total dentry lease requests + +3. ofiles : Opened files + Number of opened files + +4. oicaps : Pinned caps + Number of pinned caps + +5. oinodes : Opened inodes + Number of opened inodes + +6. rtio : Total size of read IOs + Number of bytes read in input/output operations generated by all process + +7. wtio : Total size of write IOs + Number of bytes written in input/output operations generated by all processes + +8. raio : Average size of read IOs + Mean of number of bytes read in input/output operations generated by all + process over total IO done + +9. waio : Average size of write IOs + Mean of number of bytes written in input/output operations generated by all + process over total IO done + +10. rsp : Read speed + Speed of read IOs with respect to the duration since the last refresh of clients + +11. wsp : Write speed + Speed of write IOs with respect to the duration since the last refresh of clients + +12. rlatavg : Average read latency + Mean value of the read latencies + +13. rlatsd : Standard deviation (variance) for read latency + Dispersion of the metric for the read latency relative to its mean + +14. wlatavg : Average write latency + Mean value of the write latencies + +15. wlatsd : Standard deviation (variance) for write latency + Dispersion of the metric for the write latency relative to its mean + +16. mlatavg : Average metadata latency + Mean value of the metadata latencies + +17. mlatsd : Standard deviation (variance) for metadata latency + Dispersion of the metric for the metadata latency relative to its mean + Command-Line Options -------------------- diff --git a/doc/cephfs/client-auth.rst b/doc/cephfs/client-auth.rst index a7dea5251..75528f91e 100644 --- a/doc/cephfs/client-auth.rst +++ b/doc/cephfs/client-auth.rst @@ -259,3 +259,121 @@ Following is an example of enabling root_squash in a filesystem except within caps mds = "allow rw fsname=a root_squash, allow rw fsname=a path=/volumes" caps mon = "allow r fsname=a" caps osd = "allow rw tag cephfs data=a" + +Updating Capabilities using ``fs authorize`` +============================================ +After Ceph's Reef version, ``fs authorize`` can not only be used to create a +new client with caps for a CephFS but it can also be used to add new caps +(for a another CephFS or another path in same FS) to an already existing +client. + +Let's say we run following and create a new client:: + + $ ceph fs authorize a client.x / rw + [client.x] + key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w== + $ ceph auth get client.x + [client.x] + key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w== + caps mds = "allow rw fsname=a" + caps mon = "allow r fsname=a" + caps osd = "allow rw tag cephfs data=a" + +Previously, running ``fs authorize a client.x / rw`` a second time used to +print an error message. But after Reef, it instead prints message that +there's not update:: + + $ ./bin/ceph fs authorize a client.x / rw + no update for caps of client.x + +Adding New Caps Using ``fs authorize`` +-------------------------------------- +Users can now add caps for another path in same CephFS:: + + $ ceph fs authorize a client.x /dir1 rw + updated caps for client.x + $ ceph auth get client.x + [client.x] + key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w== + caps mds = "allow r fsname=a, allow rw fsname=a path=some/dir" + caps mon = "allow r fsname=a" + caps osd = "allow rw tag cephfs data=a" + +And even add caps for another CephFS on Ceph cluster:: + + $ ceph fs authorize b client.x / rw + updated caps for client.x + $ ceph auth get client.x + [client.x] + key = AQD6tiVk0uJdARAABMaQuLRotxTi3Qdj47FkBA== + caps mds = "allow rw fsname=a, allow rw fsname=b" + caps mon = "allow r fsname=a, allow r fsname=b" + caps osd = "allow rw tag cephfs data=a, allow rw tag cephfs data=b" + +Changing rw permissions in caps +------------------------------- + +It's not possible to modify caps by running ``fs authorize`` except for the +case when read/write permissions have to be changed. This is because the +``fs authorize`` becomes ambiguous. For example, user runs ``fs authorize +cephfs1 client.x /dir1 rw`` to create a client and then runs ``fs authorize +cephfs1 client.x /dir2 rw`` (notice ``/dir1`` is changed to ``/dir2``). +Running second command can be interpreted as changing ``/dir1`` to ``/dir2`` +in current cap or can also be interpreted as authorizing the client with a +new cap for path ``/dir2``. As seen in previous sections, second +interpretation is chosen and therefore it's impossible to update a part of +capability granted except rw permissions. Following is how read/write +permissions for ``client.x`` (that was created above) can be changed:: + + $ ceph fs authorize a client.x / r + [client.x] + key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ== + $ ceph auth get client.x + [client.x] + key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ== + caps mds = "allow r fsname=a" + caps mon = "allow r fsname=a" + caps osd = "allow r tag cephfs data=a" + +``fs authorize`` never deducts any part of caps +----------------------------------------------- +It's not possible to remove caps issued to a client by running ``fs +authorize`` again. For example, if a client cap has ``root_squash`` applied +on a certain CephFS, running ``fs authorize`` again for the same CephFS but +without ``root_squash`` will not lead to any update, the client caps will +remain unchanged:: + + $ ceph fs authorize a client.x / rw root_squash + [client.x] + key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw== + $ ceph auth get client.x + [client.x] + key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw== + caps mds = "allow rw fsname=a root_squash" + caps mon = "allow r fsname=a" + caps osd = "allow rw tag cephfs data=a" + $ ceph fs authorize a client.x / rw + [client.x] + key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw== + no update was performed for caps of client.x. caps of client.x remains unchanged. + +And if a client already has a caps for FS name ``a`` and path ``dir1``, +running ``fs authorize`` again for FS name ``a`` but path ``dir2``, instead +of modifying the caps client already holds, a new cap for ``dir2`` will be +granted:: + + $ ceph fs authorize a client.x /dir1 rw + $ ceph auth get client.x + [client.x] + key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA== + caps mds = "allow rw fsname=a path=/dir1" + caps mon = "allow r fsname=a" + caps osd = "allow rw tag cephfs data=a" + $ ceph fs authorize a client.x /dir2 rw + updated caps for client.x + $ ceph auth get client.x + [client.x] + key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA== + caps mds = "allow rw fsname=a path=dir1, allow rw fsname=a path=dir2" + caps mon = "allow r fsname=a" + caps osd = "allow rw tag cephfs data=a" diff --git a/doc/cephfs/disaster-recovery-experts.rst b/doc/cephfs/disaster-recovery-experts.rst index c881c2423..9a196c88e 100644 --- a/doc/cephfs/disaster-recovery-experts.rst +++ b/doc/cephfs/disaster-recovery-experts.rst @@ -15,7 +15,7 @@ Advanced: Metadata repair tools file system before attempting to repair it. If you do not have access to professional support for your cluster, - consult the ceph-users mailing list or the #ceph IRC channel. + consult the ceph-users mailing list or the #ceph IRC/Slack channel. Journal export diff --git a/doc/cephfs/fs-volumes.rst b/doc/cephfs/fs-volumes.rst index e7fd377bf..3d17be561 100644 --- a/doc/cephfs/fs-volumes.rst +++ b/doc/cephfs/fs-volumes.rst @@ -501,10 +501,14 @@ To initiate a clone operation use:: $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> +.. note:: ``subvolume snapshot clone`` command depends upon the above mentioned config option ``snapshot_clone_no_wait`` + If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified:: $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --group_name <subvol_group_name> +If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified: + Cloned subvolumes can be a part of a different group than the source snapshot (by default, cloned subvolumes are created in default group). To clone to a particular group use:: $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --target_group_name <subvol_group_name> @@ -513,13 +517,15 @@ Similar to specifying a pool layout when creating a subvolume, pool layout can b $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout> -Configure the maximum number of concurrent clones. The default is 4:: +To check the status of a clone operation use:: - $ ceph config set mgr mgr/volumes/max_concurrent_clones <value> + ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout> -To check the status of a clone operation use:: +To check the status of a clone operation use: + +.. prompt:: bash # - $ ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>] + ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>] A clone can be in one of the following states: @@ -616,6 +622,31 @@ On successful cancellation, the cloned subvolume is moved to the ``canceled`` st .. note:: The canceled cloned may be deleted by supplying the ``--force`` option to the `fs subvolume rm` command. +Configurables +~~~~~~~~~~~~~ + +Configure the maximum number of concurrent clone operations. The default is 4: + +.. prompt:: bash # + + ceph config set mgr mgr/volumes/max_concurrent_clones <value> + +Configure the snapshot_clone_no_wait option : + +The ``snapshot_clone_no_wait`` config option is used to reject clone creation requests when cloner threads +(which can be configured using above option i.e. ``max_concurrent_clones``) are not available. +It is enabled by default i.e. the value set is True, whereas it can be configured by using below command. + +.. prompt:: bash # + + ceph config set mgr mgr/volumes/snapshot_clone_no_wait <bool> + +The current value of ``snapshot_clone_no_wait`` can be fetched by using below command. + +.. prompt:: bash # + + ceph config get mgr mgr/volumes/snapshot_clone_no_wait + .. _subvol-pinning: diff --git a/doc/cephfs/health-messages.rst b/doc/cephfs/health-messages.rst index 7edc1262f..8fb23715d 100644 --- a/doc/cephfs/health-messages.rst +++ b/doc/cephfs/health-messages.rst @@ -130,7 +130,9 @@ other daemons, please see :ref:`health-checks`. from properly cleaning up resources used by client requests. This message appears if a client appears to have more than ``max_completed_requests`` (default 100000) requests that are complete on the MDS side but haven't - yet been accounted for in the client's *oldest tid* value. + yet been accounted for in the client's *oldest tid* value. The last tid + used by the MDS to trim completed client requests (or flush) is included + as part of `session ls` (or `client ls`) command as a debug aid. ``MDS_DAMAGE`` -------------- @@ -238,3 +240,15 @@ other daemons, please see :ref:`health-checks`. Description All MDS ranks are unavailable resulting in the file system to be completely offline. + +``MDS_CLIENTS_LAGGY`` +---------------------------- + Message + "Client *ID* is laggy; not evicted because some OSD(s) is/are laggy" + + Description + If OSD(s) is laggy (due to certain conditions like network cut-off, etc) + then it might make clients laggy(session might get idle or cannot flush + dirty data for cap revokes). If ``defer_client_eviction_on_laggy_osds`` is + set to true (default true), client eviction will not take place and thus + this health warning will be generated. diff --git a/doc/cephfs/multimds.rst b/doc/cephfs/multimds.rst index e50a5148e..d105c74ad 100644 --- a/doc/cephfs/multimds.rst +++ b/doc/cephfs/multimds.rst @@ -116,7 +116,7 @@ The mechanism provided for this purpose is called an ``export pin``, an extended attribute of directories. The name of this extended attribute is ``ceph.dir.pin``. Users can set this attribute using standard commands: -:: +.. prompt:: bash # setfattr -n ceph.dir.pin -v 2 path/to/dir @@ -128,7 +128,7 @@ pin. In this way, setting the export pin on a directory affects all of its children. However, the parents pin can be overridden by setting the child directory's export pin. For example: -:: +.. prompt:: bash # mkdir -p a/b # "a" and "a/b" both start without an export pin set @@ -173,7 +173,7 @@ immediate children across a range of MDS ranks. The canonical example use-case would be the ``/home`` directory: we want every user's home directory to be spread across the entire MDS cluster. This can be set via: -:: +.. prompt:: bash # setfattr -n ceph.dir.pin.distributed -v 1 /cephfs/home @@ -183,7 +183,7 @@ may be ephemerally pinned. This is set through the extended attribute ``ceph.dir.pin.random`` with the value set to the percentage of directories that should be pinned. For example: -:: +.. prompt:: bash # setfattr -n ceph.dir.pin.random -v 0.5 /cephfs/tmp @@ -205,7 +205,7 @@ Ephemeral pins may override parent export pins and vice versa. What determines which policy is followed is the rule of the closest parent: if a closer parent directory has a conflicting policy, use that one instead. For example: -:: +.. prompt:: bash # mkdir -p foo/bar1/baz foo/bar2 setfattr -n ceph.dir.pin -v 0 foo @@ -217,7 +217,7 @@ directory will obey the pin on ``foo`` normally. For the reverse situation: -:: +.. prompt:: bash # mkdir -p home/{patrick,john} setfattr -n ceph.dir.pin.distributed -v 1 home @@ -229,7 +229,8 @@ because its export pin overrides the policy on ``home``. To remove a partitioning policy, remove the respective extended attribute or set the value to 0. -.. code::bash +.. prompt:: bash # + $ setfattr -n ceph.dir.pin.distributed -v 0 home # or $ setfattr -x ceph.dir.pin.distributed home @@ -237,10 +238,36 @@ or set the value to 0. For export pins, remove the extended attribute or set the extended attribute value to `-1`. -.. code::bash +.. prompt:: bash # + $ setfattr -n ceph.dir.pin -v -1 home +Dynamic Subtree Partitioning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +CephFS has long had a dynamic metadata blanacer (sometimes called the "default +balancer") which can split or merge subtrees while placing them on "colder" MDS +ranks. Moving the metadata around can improve overall file system throughput +and cache size. + +However, the balancer has suffered from problem with efficiency and performance +so it is by default turned off. This is to avoid an administrator "turning on +multimds" by increasing the ``max_mds`` setting and then finding the balancer +has made a mess of the cluster performance (reverting is straightforward but +can take time). + +The setting to turn on the balancer is: + +.. prompt:: bash # + + ceph fs set <fs_name> balance_automate true + +Turning on the balancer should only be done with appropriate configuration, +such as with the ``bal_rank_mask`` setting (described below). Careful +monitoring of the file system performance and MDS is advised. + + Dynamic subtree partitioning with Balancer on specific ranks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -260,27 +287,27 @@ static pinned subtrees. This option can be configured with the ``ceph fs set`` command. For example: -:: +.. prompt:: bash # ceph fs set <fs_name> bal_rank_mask <hex> Each bitfield of the ``<hex>`` number represents a dedicated rank. If the ``<hex>`` is set to ``0x3``, the balancer runs on active ``0`` and ``1`` ranks. For example: -:: +.. prompt:: bash # ceph fs set <fs_name> bal_rank_mask 0x3 If the ``bal_rank_mask`` is set to ``-1`` or ``all``, all active ranks are masked and utilized by the balancer. As an example: -:: +.. prompt:: bash # ceph fs set <fs_name> bal_rank_mask -1 On the other hand, if the balancer needs to be disabled, the ``bal_rank_mask`` should be set to ``0x0``. For example: -:: +.. prompt:: bash # ceph fs set <fs_name> bal_rank_mask 0x0 diff --git a/doc/cephfs/quota.rst b/doc/cephfs/quota.rst index 0bc56be12..e78173bcc 100644 --- a/doc/cephfs/quota.rst +++ b/doc/cephfs/quota.rst @@ -21,6 +21,14 @@ value:: setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir # 100 MB setfattr -n ceph.quota.max_files -v 10000 /some/dir # 10,000 files +``ceph.quota.max_bytes`` can also be set using human-friendly units:: + + setfattr -n ceph.quota.max_bytes -v 100K /some/dir # 100 KiB + setfattr -n ceph.quota.max_bytes -v 5Gi /some/dir # 5 GiB + +.. note:: Values will be strictly cast to IEC units even when SI units + are input, e.g. 1K to 1024 bytes. + To view quota limit:: $ getfattr -n ceph.quota.max_bytes /some/dir diff --git a/doc/cephfs/snap-schedule.rst b/doc/cephfs/snap-schedule.rst index 2b8873699..ef746be23 100644 --- a/doc/cephfs/snap-schedule.rst +++ b/doc/cephfs/snap-schedule.rst @@ -30,9 +30,9 @@ assumed to be keyword arguments too. Snapshot schedules are identified by path, their repeat interval and their start time. The repeat interval defines the time between two subsequent snapshots. It is -specified by a number and a period multiplier, one of `h(our)`, `d(ay)` and -`w(eek)`. E.g. a repeat interval of `12h` specifies one snapshot every 12 -hours. +specified by a number and a period multiplier, one of `h(our)`, `d(ay)`, +`w(eek)`, `M(onth)` and `Y(ear)`. E.g. a repeat interval of `12h` specifies one +snapshot every 12 hours. The start time is specified as a time string (more details about passing times below). By default the start time is last midnight. So when a snapshot schedule with repeat @@ -52,8 +52,8 @@ space or concatenated pairs of `<number><time period>`. The semantics are that a spec will ensure `<number>` snapshots are kept that are at least `<time period>` apart. For Example `7d` means the user wants to keep 7 snapshots that are at least one day (but potentially longer) apart from each other. -The following time periods are recognized: `h(our), d(ay), w(eek), m(onth), -y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep +The following time periods are recognized: `h(our)`, `d(ay)`, `w(eek)`, `M(onth)`, +`Y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep the last 10 snapshots regardless of timing, All subcommands take optional `fs` argument to specify paths in diff --git a/doc/cephfs/standby.rst b/doc/cephfs/standby.rst index 367c6762b..e20735aaa 100644 --- a/doc/cephfs/standby.rst +++ b/doc/cephfs/standby.rst @@ -118,10 +118,16 @@ enforces this affinity. When failing over MDS daemons, a cluster's monitors will prefer standby daemons with ``mds_join_fs`` equal to the file system ``name`` with the failed ``rank``. If no standby exists with ``mds_join_fs`` equal to the file system ``name``, it will -choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement, -or any other available standby, as a last resort. Note, this does not change the -behavior that ``standby-replay`` daemons are always selected before -other standbys. +choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement. +As a last resort, a standby for another filesystem will be chosen, although this +behavior can be disabled: + +:: + + ceph fs set <fs name> refuse_standby_for_another_fs true + +Note, configuring MDS file system affinity does not change the behavior that +``standby-replay`` daemons are always selected before other standbys. Even further, the monitors will regularly examine the CephFS file systems even when stable to check if a standby with stronger affinity is available to replace an |