doc/website-v1/start-guide.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208

= Getting Started

So, you've successfully installed `crmsh` on one or more machines, and
now you want to configure a basic cluster. This guide is intended to
provide step-by-step instructions for configuring Pacemaker
with a single resource capable of failing over between a pair of
nodes, and then builds on that base to cover some more advanced topics
of cluster management.

****
Haven't installed yet? Please follow the 
link:/installation[installation instructions]
before continuing this guide. Only `crmsh` and
its dependencies need to be installed before 
following this guide.
****

Before continuing, make sure that this command executes successfully
on all nodes, and returns a version number that is `3.0` or higher:

........
crm --version
........

****
In crmsh 3, the cluster init commands were replaced by the SLE HA
bootstrap scripts. These rely on `csync2` for configuration file
management, so make sure that you have the `csync2` command
installed before proceeding. This requirement may be removed in
the future.
****

.Example cluster
**************************

These are the machines used as an example in this guide. Please
replace the references to these names and IP addresses to the values
appropriate for your cluster:


[options="header,footer"]
|=======================
|Name  |IP
|alice |10.0.0.2
|bob   |10.0.0.3
|=======================
**************************


== The cluster stack

The composition of the GNU/Linux cluster stack has changed somewhat
over the years. The stack described here is the currently most common
variant, but there are other ways of configuring these tools.

Simply put, a High Availability cluster is a set of machines (commonly
referred to as *nodes*) with redundant capacity, such that if one or
more of these machines experience failure of any kind, the other nodes
in the cluster can take over the responsibilities previously handled
by the failed node.

The cluster stack is a set of programs running on all of these nodes,
communicating with each other over the network to monitor each other
and deciding where, when and how resources are stopped, started or
reconfigured.

The main component of the stack is *Pacemaker*, the software
responsible for managing cluster resources, allocating them to cluster
nodes according to the rules specified in the *CIB*.

The CIB is an XML document maintained by Pacemaker, which describes
all cluster resources, their configuration and the constraints that
decide where and how they are managed. This document is not edited
directly, and with the help of `crmsh` it is possible to avoid
exposure to the underlying XML at all.

Beneath Pacemaker in the stack sits *Corosync*, a cluster
communication system. Corosync provides the communication capabilities
and cluster membership functionality used by Pacemaker. Corosync is
configured through the file `/etc/corosync/corosync.conf`. `crmsh`
provides tools for configuring corosync similar to Pacemaker.

Aside from these two components, the stack also consists of a
collection of *Resource Agents*. These are basically scripts that wrap
software that the cluster needs to manage, providing a unified
interface to configuration, supervision and management of the
software. For example, there are agents that handle virtual IP
resources, web servers, databases and filesystems.

`crmsh` is a command line tool which interfaces against all of these
components, providing a unified interface for configuration and
management of the whole cluster stack.

== SSH

`crmsh` runs as a command line tool on any one of the cluster
nodes. In order for to to control all cluster nodes, it needs to be
able to execute commands remotely. `crmsh` does this by invoking
`ssh`.

Configure `/etc/hosts` on each of the nodes so that the names of the
other nodes map to the IP addresses of those nodes. For example in a
cluster consisting of `alice` and `bob`, executing `ping bob` when
logged in as root on `alice` should successfully locate `bob` on the
network. Given the IP addresses of `alice` and `bob` above, the
following should be entered into `/etc/hosts` on both nodes:

........
10.0.0.2      alice
10.0.0.3      bob
........

== Install and configure

To configure the basic cluster, we use the `cluster init` command
provided by `crmsh`. This command has quite a few options for
setting up the cluster, but we will use a fairly basic configuration.

........
crm cluster init --name demo-cluster --nodes "alice bob"
........

The initialization tool will now ask a series of questions about the
configuration, and then proceed to configure and start the cluster
on both nodes.

== Check cluster status

To see if Pacemaker is running, what nodes are part of the cluster and
what resources are active, use the `status` command:

.........
crm status
.........

If this command fails or times out, there is some problem with
Pacemaker or Corosync on the local machine. Perhaps some dependency is
missing, a firewall is blocking cluster communication or some other
unrelated problem has occurred. If this is the case, the `cluster
health` command may be of use.

== Cluster health check

To check the health status of the machines in the cluster, use the
following command:

........
crm cluster health
........

This command will perform multiple diagnostics on all nodes in the
cluster, and return information about low disk space, communication
issues or problems with mismatching software versions between nodes,
for example.

If no cluster has been configured or there is some fundamental problem
with cluster communications, `crmsh` may be unable to figure out what
nodes are part of the cluster. If this is the case, the list of nodes
can be provided to the health command directly:

........
crm cluster health nodes=alice,bob
........

== Adding a resource

To test the cluster and make sure it is working properly, we can
configure a Dummy resource. The Dummy resource agent is a simple
resource that doesn't actually manage any software. It exposes a
single numerical parameter called `state` which can be used to test
the basic functionality of the cluster before introducing the
complexities of actual resources.

To configure a Dummy resource, run the following command:

........
crm configure primitive p0 Dummy
........

This creates a new resource, gives it the name `p0` and sets the
agent for the resource to be the `Dummy` agent.

`crm status` should now show the `p0` resource as started on one
of the cluster nodes:

........
# crm status
Last updated: Wed Jul  2 21:49:26 2014
Last change: Wed Jul  2 21:49:19 2014
Stack: corosync
Current DC: alice (2) - partition with quorum
Version: 1.1.11-c3f1a7f
2 Nodes configured
1 Resources configured


Online: [ alice bob ]

 p0	(ocf::heartbeat:Dummy):	Started alice
........

The resource can be stopped or started using the `resource start` and
`resource stop` commands:

........
crm resource stop p0
crm resource start p0
........