diff options
Diffstat (limited to 'doc/rados/configuration/mon-osd-interaction.rst')
-rw-r--r-- | doc/rados/configuration/mon-osd-interaction.rst | 245 |
1 files changed, 245 insertions, 0 deletions
diff --git a/doc/rados/configuration/mon-osd-interaction.rst b/doc/rados/configuration/mon-osd-interaction.rst new file mode 100644 index 000000000..8cf09707d --- /dev/null +++ b/doc/rados/configuration/mon-osd-interaction.rst @@ -0,0 +1,245 @@ +===================================== + Configuring Monitor/OSD Interaction +===================================== + +.. index:: heartbeat + +After you have completed your initial Ceph configuration, you may deploy and run +Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the +:term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage +Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring +reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph +OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph +Monitor doesn't receive reports, or if it receives reports of changes in the +Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph +Cluster Map`. + +Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon +interaction. However, you may override the defaults. The following sections +describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of +monitoring the Ceph Storage Cluster. + +.. index:: heartbeat interval + +OSDs Check Heartbeats +===================== + +Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons at random +intervals less than every 6 seconds. If a neighboring Ceph OSD Daemon doesn't +show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may +consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph +Monitor, which will update the Ceph Cluster Map. You may change this grace +period by adding an ``osd heartbeat grace`` setting under the ``[mon]`` +and ``[osd]`` or ``[global]`` section of your Ceph configuration file, +or by setting the value at runtime. + + +.. ditaa:: + +---------+ +---------+ + | OSD 1 | | OSD 2 | + +---------+ +---------+ + | | + |----+ Heartbeat | + | | Interval | + |<---+ Exceeded | + | | + | Check | + | Heartbeat | + |------------------->| + | | + |<-------------------| + | Heart Beating | + | | + |----+ Heartbeat | + | | Interval | + |<---+ Exceeded | + | | + | Check | + | Heartbeat | + |------------------->| + | | + |----+ Grace | + | | Period | + |<---+ Exceeded | + | | + |----+ Mark | + | | OSD 2 | + |<---+ Down | + + +.. index:: OSD down report + +OSDs Report Down OSDs +===================== + +By default, two Ceph OSD Daemons from different hosts must report to the Ceph +Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors +acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance +that all the OSDs reporting the failure are hosted in a rack with a bad switch +which has trouble connecting to another OSD. To avoid this sort of false alarm, +we consider the peers reporting a failure a proxy for a potential "subcluster" +over the overall cluster that is similarly laggy. This is clearly not true in +all cases, but will sometimes help us localize the grace correction to a subset +of the system that is unhappy. ``mon osd reporter subtree level`` is used to +group the peers into the "subcluster" by their common ancestor type in CRUSH +map. By default, only two reports from different subtree are required to report +another Ceph OSD Daemon ``down``. You can change the number of reporters from +unique subtrees and the common ancestor type required to report a Ceph OSD +Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters`` +and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of +your Ceph configuration file, or by setting the value at runtime. + + +.. ditaa:: + + +---------+ +---------+ +---------+ + | OSD 1 | | OSD 2 | | Monitor | + +---------+ +---------+ +---------+ + | | | + | OSD 3 Is Down | | + |---------------+--------------->| + | | | + | | | + | | OSD 3 Is Down | + | |--------------->| + | | | + | | | + | | |---------+ Mark + | | | | OSD 3 + | | |<--------+ Down + + +.. index:: peering failure + +OSDs Report Peering Failure +=========================== + +If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its +Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for +the most recent copy of the cluster map every 30 seconds. You can change the +Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval`` +setting under the ``[osd]`` section of your Ceph configuration file, or by +setting the value at runtime. + +.. ditaa:: + + +---------+ +---------+ +-------+ +---------+ + | OSD 1 | | OSD 2 | | OSD 3 | | Monitor | + +---------+ +---------+ +-------+ +---------+ + | | | | + | Request To | | | + | Peer | | | + |-------------->| | | + |<--------------| | | + | Peering | | + | | | + | Request To | | + | Peer | | + |----------------------------->| | + | | + |----+ OSD Monitor | + | | Heartbeat | + |<---+ Interval Exceeded | + | | + | Failed to Peer with OSD 3 | + |-------------------------------------------->| + |<--------------------------------------------| + | Receive New Cluster Map | + + +.. index:: OSD status + +OSDs Report Their Status +======================== + +If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will +consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout`` +elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable +event such as a failure, a change in placement group stats, a change in +``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD +Daemon minimum report interval by adding an ``osd mon report interval`` +setting under the ``[osd]`` section of your Ceph configuration file, or by +setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph +Monitor every 120 seconds irrespective of whether any notable changes occur. +You can change the Ceph Monitor report interval by adding an ``osd mon report +interval max`` setting under the ``[osd]`` section of your Ceph configuration +file, or by setting the value at runtime. + + +.. ditaa:: + + +---------+ +---------+ + | OSD 1 | | Monitor | + +---------+ +---------+ + | | + |----+ Report Min | + | | Interval | + |<---+ Exceeded | + | | + |----+ Reportable | + | | Event | + |<---+ Occurs | + | | + | Report To | + | Monitor | + |------------------->| + | | + |----+ Report Max | + | | Interval | + |<---+ Exceeded | + | | + | Report To | + | Monitor | + |------------------->| + | | + |----+ Monitor | + | | Fails | + |<---+ | + +----+ Monitor OSD + | | Report Timeout + |<---+ Exceeded + | + +----+ Mark + | | OSD 1 + |<---+ Down + + + + +Configuration Settings +====================== + +When modifying heartbeat settings, you should include them in the ``[global]`` +section of your configuration file. + +.. index:: monitor heartbeat + +Monitor Settings +---------------- + +.. confval:: mon_osd_min_up_ratio +.. confval:: mon_osd_min_in_ratio +.. confval:: mon_osd_laggy_halflife +.. confval:: mon_osd_laggy_weight +.. confval:: mon_osd_laggy_max_interval +.. confval:: mon_osd_adjust_heartbeat_grace +.. confval:: mon_osd_adjust_down_out_interval +.. confval:: mon_osd_auto_mark_in +.. confval:: mon_osd_auto_mark_auto_out_in +.. confval:: mon_osd_auto_mark_new_in +.. confval:: mon_osd_down_out_interval +.. confval:: mon_osd_down_out_subtree_limit +.. confval:: mon_osd_report_timeout +.. confval:: mon_osd_min_down_reporters +.. confval:: mon_osd_reporter_subtree_level + +.. index:: OSD heartbeat + +OSD Settings +------------ + +.. confval:: osd_heartbeat_interval +.. confval:: osd_heartbeat_grace +.. confval:: osd_mon_heartbeat_interval +.. confval:: osd_mon_heartbeat_stat_stale +.. confval:: osd_mon_report_interval |