summaryrefslogtreecommitdiffstats
path: root/doc/shared/en-US
diff options
context:
space:
mode:
Diffstat (limited to 'doc/shared/en-US')
-rw-r--r--doc/shared/en-US/pacemaker-intro.txt186
1 files changed, 186 insertions, 0 deletions
diff --git a/doc/shared/en-US/pacemaker-intro.txt b/doc/shared/en-US/pacemaker-intro.txt
new file mode 100644
index 0000000..b2a81cb
--- /dev/null
+++ b/doc/shared/en-US/pacemaker-intro.txt
@@ -0,0 +1,186 @@
+:compat-mode: legacy
+== What Is 'Pacemaker'? ==
+
+*Pacemaker* is a high-availability 'cluster resource manager' -- software that
+runs on a set of hosts (a 'cluster' of 'nodes') in order to preserve integrity
+and minimize downtime of desired services ('resources').
+footnote:[
+'Cluster' is sometimes used in other contexts to refer to hosts grouped
+together for other purposes, such as high-performance computing (HPC), but
+Pacemaker is not intended for those purposes.
+]
+It is maintained by the https://www.ClusterLabs.org/[ClusterLabs] community.
+
+Pacemaker's key features include:
+
+ * Detection of and recovery from node- and service-level failures
+ * Ability to ensure data integrity by fencing faulty nodes
+ * Support for one or more nodes per cluster
+ * Support for multiple resource interface standards (anything that can be
+ scripted can be clustered)
+ * Support (but no requirement) for shared storage
+ * Support for practically any redundancy configuration (active/passive, N+1,
+ etc.)
+ * Automatically replicated configuration that can be updated from any node
+ * Ability to specify cluster-wide relationships between services,
+ such as ordering, colocation and anti-colocation
+ * Support for advanced service types, such as 'clones' (services that need to
+ be active on multiple nodes), 'stateful resources' (clones that can run in
+ one of two modes), and containerized services
+ * Unified, scriptable cluster management tools
+
+.Fencing
+[NOTE]
+====
+'Fencing', also known as 'STONITH' (an acronym for Shoot The Other Node In The
+Head), is the ability to ensure that it is not possible for a node to be
+running a service. This is accomplished via 'fence devices' such as
+intelligent power switches that cut power to the target, or intelligent
+network switches that cut the target's access to the local network.
+
+Pacemaker represents fence devices as a special class of resource.
+
+A cluster cannot safely recover from certain failure conditions, such as an
+unresponsive node, without fencing.
+====
+
+== Cluster Architecture ==
+
+At a high level, a cluster can be viewed as having these parts (which together
+are often referred to as the 'cluster stack'):
+
+ * *Resources:* These are the reason for the cluster's being -- the services
+ that need to be kept highly available.
+
+ * *Resource agents:* These are scripts or operating system components that
+ start, stop, and monitor resources, given a set of resource parameters.
+ These provide a uniform interface between Pacemaker and the managed
+ services.
+
+ * *Fence agents:* These are scripts that execute node fencing actions,
+ given a target and fence device parameters.
+
+ * *Cluster membership layer:* This component provides reliable
+ messaging, membership, and quorum information about the cluster.
+ Currently, Pacemaker supports http://www.corosync.org/[Corosync]
+ as this layer.
+
+ * *Cluster resource manager:* Pacemaker provides the brain that processes
+ and reacts to events that occur in the cluster. These events may include
+ nodes joining or leaving the cluster; resource events caused by failures,
+ maintenance, or scheduled activities; and other administrative actions.
+ To achieve the desired availability, Pacemaker may start and stop resources
+ and fence nodes.
+
+ * *Cluster tools:* These provide an interface for users to interact with the
+ cluster. Various command-line and graphical (GUI) interfaces are available.
+
+Most managed services are not, themselves, cluster-aware. However, many popular
+open-source cluster filesystems make use of a common 'Distributed Lock
+Manager' (DLM), which makes direct use of Corosync for its messaging and
+membership capabilities and Pacemaker for the ability to fence nodes.
+
+.Example Cluster Stack
+image::images/pcmk-stack.png["Example cluster stack",width="10cm",height="7.5cm",align="center"]
+
+== Pacemaker Architecture ==
+
+Pacemaker itself is composed of multiple daemons that work together:
+
+ * pacemakerd
+ * pacemaker-attrd
+ * pacemaker-based
+ * pacemaker-controld
+ * pacemaker-execd
+ * pacemaker-fenced
+ * pacemaker-schedulerd
+
+.Internal Components
+image::images/pcmk-internals.png["Pacemaker software components",align="center",scaledwidth="65%"]
+
+The Pacemaker master process (pacemakerd) spawns all the other daemons, and
+respawns them if they unexpectedly exit.
+
+The 'Cluster Information Base' (CIB) is an
+https://en.wikipedia.org/wiki/XML[XML] representation of the cluster's
+configuration and the state of all nodes and resources. The 'CIB manager'
+(pacemaker-based) keeps the CIB synchronized across the cluster, and handles
+requests to modify it.
+
+The attribute manager (pacemaker-attrd) maintains a database of attributes for
+all nodes, keeps it synchronized across the cluster, and handles requests to
+modify them. These attributes are usually recorded in the CIB.
+
+Given a snapshot of the CIB as input, the 'scheduler' (pacemaker-schedulerd)
+determines what actions are necessary to achieve the desired state of the
+cluster.
+
+The 'local executor' (pacemaker-execd) handles requests to execute
+resource agents on the local cluster node, and returns the result.
+
+The 'fencer' (pacemaker-fenced) handles requests to fence nodes. Given a target
+node, the fencer decides which cluster node(s) should execute which fencing
+device(s), and calls the necessary fencing agents (either directly, or via
+requests to the fencer peers on other nodes), and returns the result.
+
+The 'controller' (pacemaker-controld) is Pacemaker's coordinator,
+maintaining a consistent view of the cluster membership and orchestrating all
+the other components.
+
+Pacemaker centralizes cluster decision-making by electing one of the controller
+instances as the 'Designated Controller' ('DC'). Should the elected DC
+process (or the node it is on) fail, a new one is quickly established.
+The DC responds to cluster events by taking a current snapshot of the CIB,
+feeding it to the scheduler, then asking the executors (either directly on
+the local node, or via requests to controller peers on other nodes) and
+the fencer to execute any necessary actions.
+
+.Old daemon names
+[NOTE]
+====
+The Pacemaker daemons were renamed in version 2.0. You may still find
+references to the old names, especially in documentation targeted to version
+1.1.
+
+[width="95%",cols="1,2",options="header",align="center"]
+|=========================================================
+| Old name | New name
+| attrd | pacemaker-attrd
+| cib | pacemaker-based
+| crmd | pacemaker-controld
+| lrmd | pacemaker-execd
+| stonithd | pacemaker-fenced
+| pacemaker_remoted | pacemaker-remoted
+|=========================================================
+
+====
+
+== Node Redundancy Designs ==
+
+Pacemaker supports practically any
+https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations[node
+redundancy configuration] including 'Active/Active', 'Active/Passive', 'N+1',
+'N+M', 'N-to-1' and 'N-to-N'.
+
+Active/passive clusters with two (or more) nodes using Pacemaker and
+https://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device:[DRBD] are
+a cost-effective high-availability solution for many situations. One of the
+nodes provides the desired services, and if it fails, the other node takes
+over.
+
+.Active/Passive Redundancy
+image::images/pcmk-active-passive.png["Active/Passive Redundancy",width="10cm",height="7.5cm",align="center"]
+
+Pacemaker also supports multiple nodes in a shared-failover design,
+reducing hardware costs by allowing several active/passive clusters to be
+combined and share a common backup node.
+
+.Shared Failover
+image::images/pcmk-shared-failover.png["Shared Failover",width="10cm",height="7.5cm",align="center"]
+
+When shared storage is available, every node can potentially be used for
+failover. Pacemaker can even run multiple copies of services to spread out the
+workload.
+
+.N to N Redundancy
+image::images/pcmk-active-active.png["N to N Redundancy",width="10cm",height="7.5cm",align="center"]