summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Clusters_from_Scratch/verification.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sphinx/Clusters_from_Scratch/verification.rst')
-rw-r--r--doc/sphinx/Clusters_from_Scratch/verification.rst222
1 files changed, 222 insertions, 0 deletions
diff --git a/doc/sphinx/Clusters_from_Scratch/verification.rst b/doc/sphinx/Clusters_from_Scratch/verification.rst
new file mode 100644
index 0000000..08fab31
--- /dev/null
+++ b/doc/sphinx/Clusters_from_Scratch/verification.rst
@@ -0,0 +1,222 @@
+Start and Verify Cluster
+------------------------
+
+Start the Cluster
+#################
+
+Now that Corosync is configured, it is time to start the cluster.
+The command below will start the ``corosync`` and ``pacemaker`` services on
+both nodes in the cluster.
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# pcs cluster start --all
+ pcmk-1: Starting Cluster...
+ pcmk-2: Starting Cluster...
+
+.. NOTE::
+
+ An alternative to using the ``pcs cluster start --all`` command
+ is to issue either of the below command sequences on each node in the
+ cluster separately:
+
+ .. code-block:: console
+
+ # pcs cluster start
+ Starting Cluster...
+
+ or
+
+ .. code-block:: console
+
+ # systemctl start corosync.service
+ # systemctl start pacemaker.service
+
+.. IMPORTANT::
+
+ In this example, we are not enabling the ``corosync`` and ``pacemaker``
+ services to start at boot. If a cluster node fails or is rebooted, you will
+ need to run ``pcs cluster start [<NODENAME> | --all]`` to start the cluster
+ on it. While you can enable the services to start at boot (for example,
+ using ``pcs cluster enable [<NODENAME> | --all]``), requiring a manual
+ start of cluster services gives you the opportunity to do a post-mortem
+ investigation of a node failure before returning it to the cluster.
+
+Verify Corosync Installation
+################################
+
+First, use ``corosync-cfgtool`` to check whether cluster communication is happy:
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# corosync-cfgtool -s
+ Local node ID 1, transport knet
+ LINK ID 0 udp
+ addr = 192.168.122.101
+ status:
+ nodeid: 1: localhost
+ nodeid: 2: connected
+
+We can see here that everything appears normal with our fixed IP address (not a
+``127.0.0.x`` loopback address) listed as the ``addr``, and ``localhost`` and
+``connected`` for the statuses of nodeid 1 and nodeid 2, respectively.
+
+If you see something different, you might want to start by checking
+the node's network, firewall, and SELinux configurations.
+
+Next, check the membership and quorum APIs:
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# corosync-cmapctl | grep members
+ runtime.members.1.config_version (u64) = 0
+ runtime.members.1.ip (str) = r(0) ip(192.168.122.101)
+ runtime.members.1.join_count (u32) = 1
+ runtime.members.1.status (str) = joined
+ runtime.members.2.config_version (u64) = 0
+ runtime.members.2.ip (str) = r(0) ip(192.168.122.102)
+ runtime.members.2.join_count (u32) = 1
+ runtime.members.2.status (str) = joined
+
+ [root@pcmk-1 ~]# pcs status corosync
+
+ Membership information
+ ----------------------
+ Nodeid Votes Name
+ 1 1 pcmk-1 (local)
+ 2 1 pcmk-2
+
+You should see both nodes have joined the cluster.
+
+Verify Pacemaker Installation
+#################################
+
+Now that we have confirmed that Corosync is functional, we can check
+the rest of the stack. Pacemaker has already been started, so verify
+the necessary processes are running:
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# ps axf
+ PID TTY STAT TIME COMMAND
+ 2 ? S 0:00 [kthreadd]
+ ...lots of processes...
+ 17121 ? SLsl 0:01 /usr/sbin/corosync -f
+ 17133 ? Ss 0:00 /usr/sbin/pacemakerd
+ 17134 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-based
+ 17135 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-fenced
+ 17136 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-execd
+ 17137 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-attrd
+ 17138 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-schedulerd
+ 17139 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-controld
+
+If that looks OK, check the ``pcs status`` output:
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# pcs status
+ Cluster name: mycluster
+
+ WARNINGS:
+ No stonith devices and stonith-enabled is not false
+
+ Cluster Summary:
+ * Stack: corosync
+ * Current DC: pcmk-2 (version 2.1.2-4.el9-ada5c3b36e2) - partition with quorum
+ * Last updated: Wed Jul 27 00:09:55 2022
+ * Last change: Wed Jul 27 00:07:08 2022 by hacluster via crmd on pcmk-2
+ * 2 nodes configured
+ * 0 resource instances configured
+
+ Node List:
+ * Online: [ pcmk-1 pcmk-2 ]
+
+ Full List of Resources:
+ * No resources
+
+ Daemon Status:
+ corosync: active/disabled
+ pacemaker: active/disabled
+ pcsd: active/enabled
+
+Finally, ensure there are no start-up errors from ``corosync`` or ``pacemaker``
+(aside from messages relating to not having STONITH configured, which are OK at
+this point):
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# journalctl -b | grep -i error
+
+.. NOTE::
+
+ Other operating systems may report startup errors in other locations
+ (for example, ``/var/log/messages``).
+
+Repeat these checks on the other node. The results should be the same.
+
+Explore the Existing Configuration
+##################################
+
+For those who are not of afraid of XML, you can see the raw cluster
+configuration and status by using the ``pcs cluster cib`` command.
+
+.. topic:: The last XML you'll see in this document
+
+ .. code-block:: console
+
+ [root@pcmk-1 ~]# pcs cluster cib
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.13.0" validate-with="pacemaker-3.8" epoch="5" num_updates="4" admin_epoch="0" cib-last-written="Wed Jul 27 00:07:08 2022" update-origin="pcmk-2" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">
+ <configuration>
+ <crm_config>
+ <cluster_property_set id="cib-bootstrap-options">
+ <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
+ <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.2-4.el9-ada5c3b36e2"/>
+ <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
+ <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="mycluster"/>
+ </cluster_property_set>
+ </crm_config>
+ <nodes>
+ <node id="1" uname="pcmk-1"/>
+ <node id="2" uname="pcmk-2"/>
+ </nodes>
+ <resources/>
+ <constraints/>
+ <rsc_defaults>
+ <meta_attributes id="build-resource-defaults">
+ <nvpair id="build-resource-stickiness" name="resource-stickiness" value="1"/>
+ </meta_attributes>
+ </rsc_defaults>
+ </configuration>
+ <status>
+ <node_state id="2" uname="pcmk-2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
+ <lrm id="2">
+ <lrm_resources/>
+ </lrm>
+ </node_state>
+ <node_state id="1" uname="pcmk-1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
+ <lrm id="1">
+ <lrm_resources/>
+ </lrm>
+ </node_state>
+ </status>
+ </cib>
+
+Before we make any changes, it's a good idea to check the validity of
+the configuration.
+
+.. code-block:: console
+
+ [root@pcmk-1 ~]# pcs cluster verify --full
+ Error: invalid cib:
+ (unpack_resources) error: Resource start-up disabled since no STONITH resources have been defined
+ (unpack_resources) error: Either configure some or disable STONITH with the stonith-enabled option
+ (unpack_resources) error: NOTE: Clusters with shared data need STONITH to ensure data integrity
+ crm_verify: Errors found during check: config not valid
+
+ Error: Errors have occurred, therefore pcs is unable to continue
+
+As you can see, the tool has found some errors. The cluster will not start any
+resources until we configure STONITH.