summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Administration/tools.rst
blob: 5a6044d1be06faf8d0f81653b038310b0b39099b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
.. index:: command-line tool

Using Pacemaker Command-Line Tools
----------------------------------

.. index::
   single: command-line tool; output format

.. _cmdline_output:

Controlling Command Line Output
###############################

Some of the pacemaker command line utilities have been converted to a new
output system. Among these tools are ``crm_mon`` and ``stonith_admin``. This
is an ongoing project, and more tools will be converted over time. This system
lets you control the formatting of output with ``--output-as=`` and the
destination of output with ``--output-to=``.

The available formats vary by tool, but at least plain text and XML are
supported by all tools that use the new system. The default format is plain
text. The default destination is stdout but can be redirected to any file.
Some formats support command line options for changing the style of the output.
For instance:

.. code-block:: none

   # crm_mon --help-output
   Usage:
     crm_mon [OPTION?]

   Provides a summary of cluster's current state.

   Outputs varying levels of detail in a number of different formats.

   Output Options:
     --output-as=FORMAT                Specify output format as one of: console (default), html, text, xml
     --output-to=DEST                  Specify file name for output (or "-" for stdout)
     --html-cgi                        Add text needed to use output in a CGI program
     --html-stylesheet=URI             Link to an external CSS stylesheet
     --html-title=TITLE                Page title
     --text-fancy                      Use more highly formatted output

.. index::
   single: crm_mon
   single: command-line tool; crm_mon

.. _crm_mon:

Monitor a Cluster with crm_mon
##############################

The ``crm_mon`` utility displays the current state of an active cluster. It can
show the cluster status organized by node or by resource, and can be used in
either single-shot or dynamically updating mode. It can also display operations
performed and information about failures.

Using this tool, you can examine the state of the cluster for irregularities,
and see how it responds when you cause or simulate failures.

See the manual page or the output of ``crm_mon --help`` for a full description
of its many options.
      
.. topic:: Sample output from crm_mon -1

   .. code-block:: none

      Cluster Summary:
        * Stack: corosync
        * Current DC: node2 (version 2.0.0-1) - partition with quorum
        * Last updated: Mon Jan 29 12:18:42 2018
        * Last change:  Mon Jan 29 12:18:40 2018 by root via crm_attribute	on node3
        * 5 nodes configured
        * 2 resources configured

      Node List:
        * Online: [ node1 node2 node3 node4 node5 ]

      * Active resources:
        * Fencing (stonith:fence_xvm):    Started node1
        * IP	(ocf:heartbeat:IPaddr2):	Started node2
      
.. topic:: Sample output from crm_mon -n -1

   .. code-block:: none

      Cluster Summary:
        * Stack: corosync
        * Current DC: node2 (version 2.0.0-1) - partition with quorum
        * Last updated: Mon Jan 29 12:21:48 2018
        * Last change:  Mon Jan 29 12:18:40 2018 by root via crm_attribute	on node3
        * 5 nodes configured
        * 2 resources configured

      * Node List:
        * Node node1: online
          * Fencing (stonith:fence_xvm):    Started
        * Node node2: online
          * IP	(ocf:heartbeat:IPaddr2):	Started
        * Node node3: online
        * Node node4: online
        * Node node5: online

As mentioned in an earlier chapter, the DC is the node is where decisions are
made. The cluster elects a node to be DC as needed. The only significance of
the choice of DC to an administrator is the fact that its logs will have the
most information about why decisions were made.

.. index::
   pair: crm_mon; CSS

.. _crm_mon_css:

Styling crm_mon HTML output
___________________________

Various parts of ``crm_mon``'s HTML output have a CSS class associated with
them. Not everything does, but some of the most interesting portions do. In
the following example, the status of each node has an ``online`` class and the
details of each resource have an ``rsc-ok`` class.

.. code-block:: html

   <h2>Node List</h2>
   <ul>
   <li>
   <span>Node: cluster01</span><span class="online"> online</span>
   </li>
   <li><ul><li><span class="rsc-ok">ping   (ocf::pacemaker:ping):   Started</span></li></ul></li>
   <li>
   <span>Node: cluster02</span><span class="online"> online</span>
   </li>
   <li><ul><li><span class="rsc-ok">ping   (ocf::pacemaker:ping):   Started</span></li></ul></li>
   </ul>

By default, a stylesheet for styling these classes is included in the head of
the HTML output.  The relevant portions of this stylesheet that would be used
in the above example is:

.. code-block:: css

   <style>
   .online { color: green }
   .rsc-ok { color: green }
   </style>

If you want to override some or all of the styling, simply create your own
stylesheet, place it on a web server, and pass ``--html-stylesheet=<URL>``
to ``crm_mon``. The link is added after the default stylesheet, so your
changes take precedence. You don't need to duplicate the entire default.
Only include what you want to change.

.. index::
   single: cibadmin
   single: command-line tool; cibadmin

.. _cibadmin:

Edit the CIB XML with cibadmin
##############################

The most flexible tool for modifying the configuration is Pacemaker's
``cibadmin`` command.  With ``cibadmin``, you can query, add, remove, update
or replace any part of the configuration. All changes take effect immediately,
so there is no need to perform a reload-like operation.

The simplest way of using ``cibadmin`` is to use it to save the current
configuration to a temporary file, edit that file with your favorite
text or XML editor, and then upload the revised configuration.

.. topic:: Safely using an editor to modify the cluster configuration

   .. code-block:: none

      # cibadmin --query > tmp.xml
      # vi tmp.xml
      # cibadmin --replace --xml-file tmp.xml

Some of the better XML editors can make use of a RELAX NG schema to
help make sure any changes you make are valid.  The schema describing
the configuration can be found in ``pacemaker.rng``, which may be
deployed in a location such as ``/usr/share/pacemaker`` depending on your
operating system distribution and how you installed the software.

If you want to modify just one section of the configuration, you can
query and replace just that section to avoid modifying any others.
      
.. topic:: Safely using an editor to modify only the resources section

   .. code-block:: none

       # cibadmin --query --scope resources > tmp.xml
       # vi tmp.xml
       # cibadmin --replace --scope resources --xml-file tmp.xml

To quickly delete a part of the configuration, identify the object you wish to
delete by XML tag and id. For example, you might search the CIB for all
STONITH-related configuration:
      
.. topic:: Searching for STONITH-related configuration items

   .. code-block:: none

      # cibadmin --query | grep stonith
       <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/>
       <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="1"/>
       <primitive id="child_DoFencing" class="stonith" type="external/vmware">
       <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:1" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:2" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
       <lrm_resource id="child_DoFencing:3" type="external/vmware" class="stonith">

If you wanted to delete the ``primitive`` tag with id ``child_DoFencing``,
you would run:

.. code-block:: none

   # cibadmin --delete --xml-text '<primitive id="child_DoFencing"/>'

See the cibadmin man page for more options.

.. warning::

   Never edit the live ``cib.xml`` file directly. Pacemaker will detect such
   changes and refuse to use the configuration.


.. index::
   single: crm_shadow
   single: command-line tool; crm_shadow

.. _crm_shadow:

Batch Configuration Changes with crm_shadow
###########################################

Often, it is desirable to preview the effects of a series of configuration
changes before updating the live configuration all at once. For this purpose,
``crm_shadow`` creates a "shadow" copy of the configuration and arranges for
all the command-line tools to use it.

To begin, simply invoke ``crm_shadow --create`` with a name of your choice,
and follow the simple on-screen instructions. Shadow copies are identified with
a name to make it possible to have more than one.

.. warning::

   Read this section and the on-screen instructions carefully; failure to do so
   could result in destroying the cluster's active configuration!
      
.. topic:: Creating and displaying the active sandbox

   .. code-block:: none

      # crm_shadow --create test
      Setting up shadow instance
      Type Ctrl-D to exit the crm_shadow shell
      shadow[test]: 
      shadow[test] # crm_shadow --which
      test

From this point on, all cluster commands will automatically use the shadow copy
instead of talking to the cluster's active configuration. Once you have
finished experimenting, you can either make the changes active via the
``--commit`` option, or discard them using the ``--delete`` option. Again, be
sure to follow the on-screen instructions carefully!
      
For a full list of ``crm_shadow`` options and commands, invoke it with the
``--help`` option.

.. topic:: Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched

   .. code-block:: none
   
      shadow[test] # crm_failcount -r rsc_c001n01 -G
      scope=status  name=fail-count-rsc_c001n01 value=0
      shadow[test] # crm_standby --node c001n02 -v on
      shadow[test] # crm_standby --node c001n02 -G
      scope=nodes  name=standby value=on
   
      shadow[test] # cibadmin --erase --force
      shadow[test] # cibadmin --query
      <cib crm_feature_set="3.0.14" validate-with="pacemaker-3.0" epoch="112" num_updates="2" admin_epoch="0" cib-last-written="Mon Jan  8 23:26:47 2018" update-origin="rhel7-1" update-client="crm_node" update-user="root" have-quorum="1" dc-uuid="1">
        <configuration>
          <crm_config/>
          <nodes/>
          <resources/>
          <constraints/>
        </configuration>
        <status/>
      </cib>
      shadow[test] # crm_shadow --delete test --force
      Now type Ctrl-D to exit the crm_shadow shell
      shadow[test] # exit
      # crm_shadow --which
      No active shadow configuration defined
      # cibadmin -Q
      <cib crm_feature_set="3.0.14" validate-with="pacemaker-3.0" epoch="110" num_updates="2" admin_epoch="0" cib-last-written="Mon Jan  8 23:26:47 2018" update-origin="rhel7-1" update-client="crm_node" update-user="root" have-quorum="1">
         <configuration>
            <crm_config>
               <cluster_property_set id="cib-bootstrap-options">
                  <nvpair id="cib-bootstrap-1" name="stonith-enabled" value="1"/>
                  <nvpair id="cib-bootstrap-2" name="pe-input-series-max" value="30000"/>

See the next section, :ref:`crm_simulate`, for how to test your changes before
committing them to the live cluster.


.. index::
   single: crm_simulate
   single: command-line tool; crm_simulate

.. _crm_simulate:

Simulate Cluster Activity with crm_simulate
###########################################

The command-line tool `crm_simulate` shows the results of the same logic
the cluster itself uses to respond to a particular cluster configuration and
status.

As always, the man page is the primary documentation, and should be consulted
for further details. This section aims for a better conceptual explanation and
practical examples.

Replaying cluster decision-making logic
_______________________________________

At any given time, one node in a Pacemaker cluster will be elected DC, and that
node will run Pacemaker's scheduler to make decisions.

Each time decisions need to be made (a "transition"), the DC will have log
messages like "Calculated transition ... saving inputs in ..." with a file
name. You can grab the named file and replay the cluster logic to see why
particular decisions were made. The file contains the live cluster
configuration at that moment, so you can also look at it directly to see the
value of node attributes, etc., at that time.

The simplest usage is (replacing $FILENAME with the actual file name):

.. topic:: Simulate cluster response to a given CIB

   .. code-block:: none

      # crm_simulate --simulate --xml-file $FILENAME

That will show the cluster state when the process started, the actions that
need to be taken ("Transition Summary"), and the resulting cluster state if the
actions succeed. Most actions will have a brief description of why they were
required.

The transition inputs may be compressed. ``crm_simulate`` can handle these
compressed files directly, though if you want to edit the file, you'll need to
uncompress it first.

You can do the same simulation for the live cluster configuration at the
current moment. This is useful mainly when using ``crm_shadow`` to create a
sandbox version of the CIB; the ``--live-check`` option will use the shadow CIB
if one is in effect.

.. topic:: Simulate cluster response to current live CIB or shadow CIB

   .. code-block:: none

      # crm_simulate --simulate --live-check


Why decisions were made
_______________________

To get further insight into the "why", it gets user-unfriendly very quickly. If
you add the ``--show-scores`` option, you will also see all the scores that
went into the decision-making. The node with the highest cumulative score for a
resource will run it. You can look for ``-INFINITY`` scores in particular to
see where complete bans came into effect.

You can also add ``-VVVV`` to get more detailed messages about what's happening
under the hood. You can add up to two more V's even, but that's usually useful
only if you're a masochist or tracing through the source code.


Visualizing the action sequence
_______________________________

Another handy feature is the ability to generate a visual graph of the actions
needed, using the ``--save-dotfile`` option. This relies on the separate
Graphviz [#]_ project.

.. topic:: Generate a visual graph of cluster actions from a saved CIB

   .. code-block:: none

      # crm_simulate --simulate --xml-file $FILENAME --save-dotfile $FILENAME.dot
      # dot $FILENAME.dot -Tsvg > $FILENAME.svg

``$FILENAME.dot`` will contain a GraphViz representation of the cluster's
response to your changes, including all actions with their ordering
dependencies.

``$FILENAME.svg`` will be the same information in a standard graphical format
that you can view in your browser or other app of choice. You could, of course,
use other ``dot`` options to generate other formats.
      
How to interpret the graphical output:

 * Bubbles indicate actions, and arrows indicate ordering dependencies
 * Resource actions have text of the form
   ``<RESOURCE>_<ACTION>_<INTERVAL_IN_MS> <NODE>`` indicating that the
   specified action will be executed for the specified resource on the
   specified node, once if interval is 0 or at specified recurring interval
   otherwise
 * Actions with black text will be sent to the executor (that is, the
   appropriate agent will be invoked)
 * Actions with orange text are "pseudo" actions that the cluster uses
   internally for ordering but require no real activity
 * Actions with a solid green border are part of the transition (that is, the
   cluster will attempt to execute them in the given order -- though a
   transition can be interrupted by action failure or new events)
 * Dashed arrows indicate dependencies that are not present in the transition
   graph
 * Actions with a dashed border will not be executed. If the dashed border is
   blue, the cluster does not feel the action needs to be executed. If the
   dashed border is red, the cluster would like to execute the action but
   cannot. Any actions depending on an action with a dashed border will not be
   able to execute. 
 * Loops should not happen, and should be reported as a bug if found.

.. topic:: Small Cluster Transition

   .. image:: ../shared/images/Policy-Engine-small.png
      :alt: An example transition graph as represented by Graphviz
      :align: center

In the above example, it appears that a new node, ``pcmk-2``, has come online
and that the cluster is checking to make sure ``rsc1``, ``rsc2`` and ``rsc3``
are not already running there (indicated by the ``rscN_monitor_0`` entries).
Once it did that, and assuming the resources were not active there, it would
have liked to stop ``rsc1`` and ``rsc2`` on ``pcmk-1`` and move them to
``pcmk-2``. However, there appears to be some problem and the cluster cannot or
is not permitted to perform the stop actions which implies it also cannot
perform the start actions. For some reason, the cluster does not want to start
``rsc3`` anywhere.

.. topic:: Complex Cluster Transition

   .. image:: ../shared/images/Policy-Engine-big.png
      :alt: Complex transition graph that you're not expected to be able to read
      :align: center


What-if scenarios
_________________

You can make changes to the saved or shadow CIB and simulate it again, to see
how Pacemaker would react differently. You can edit the XML by hand, use
command-line tools such as ``cibadmin`` with either a shadow CIB or the
``CIB_file`` environment variable set to the filename, or use higher-level tool
support (see the man pages of the specific tool you're using for how to perform
actions on a saved CIB file rather than the live CIB).

You can also inject node failures and/or action failures into the simulation;
see the ``crm_simulate`` man page for more details.

This capability is useful when using a shadow CIB to edit the configuration.
Before committing the changes to the live cluster with ``crm_shadow --commit``,
you can use ``crm_simulate`` to see how the cluster will react to the changes.

.. _crm_attribute:

.. index::
   single: attrd_updater
   single: command-line tool; attrd_updater
   single: crm_attribute
   single: command-line tool; crm_attribute

Manage Node Attributes, Cluster Options and Defaults with crm_attribute and attrd_updater
#########################################################################################

``crm_attribute`` and ``attrd_updater`` are confusingly similar tools with subtle
differences.

``attrd_updater`` can query and update node attributes. ``crm_attribute`` can query
and update not only node attributes, but also cluster options, resource
defaults, and operation defaults.

To understand the differences, it helps to understand the various types of node
attribute.

.. table:: **Types of Node Attributes**

   +-----------+----------+-------------------+------------------+----------------+----------------+
   | Type      | Recorded | Recorded in       | Survive full     | Manageable by  | Manageable by  |
   |           | in CIB?  | attribute manager | cluster restart? | crm_attribute? | attrd_updater? |
   |           |          | memory?           |                  |                |                |
   +===========+==========+===================+==================+================+================+
   | permanent | yes      | no                | yes              | yes            | no             |
   +-----------+----------+-------------------+------------------+----------------+----------------+
   | transient | yes      | yes               | no               | yes            | yes            |
   +-----------+----------+-------------------+------------------+----------------+----------------+
   | private   | no       | yes               | no               | no             | yes            |
   +-----------+----------+-------------------+------------------+----------------+----------------+

As you can see from the table above, ``crm_attribute`` can manage permanent and
transient node attributes, while ``attrd_updater`` can manage transient and
private node attributes.

The difference between the two tools lies mainly in *how* they update node
attributes: ``attrd_updater`` always contacts the Pacemaker attribute manager
directly, while ``crm_attribute`` will contact the attribute manager only for
transient node attributes, and will instead modify the CIB directly for
permanent node attributes (and for transient node attributes when unable to
contact the attribute manager).

By contacting the attribute manager directly, ``attrd_updater`` can change
an attribute's "dampening" (whether changes are immediately flushed to the CIB
or after a specified amount of time, to minimize disk writes for frequent
changes), set private node attributes (which are never written to the CIB), and
set attributes for nodes that don't yet exist.

By modifying the CIB directly, ``crm_attribute`` can set permanent node
attributes (which are only in the CIB and not managed by the attribute
manager), and can be used with saved CIB files and shadow CIBs.

However a transient node attribute is set, it is synchronized between the CIB
and the attribute manager, on all nodes.


.. index::
   single: crm_failcount
   single: command-line tool; crm_failcount
   single: crm_node
   single: command-line tool; crm_node
   single: crm_report
   single: command-line tool; crm_report
   single: crm_standby
   single: command-line tool; crm_standby
   single: crm_verify
   single: command-line tool; crm_verify
   single: stonith_admin
   single: command-line tool; stonith_admin

Other Commonly Used Tools
#########################

Other command-line tools include:

* ``crm_failcount``: query or delete resource fail counts
* ``crm_node``: manage cluster nodes
* ``crm_report``: generate a detailed cluster report for bug submissions
* ``crm_resource``: manage cluster resources
* ``crm_standby``: manage standby status of nodes
* ``crm_verify``: validate a CIB
* ``stonith_admin``: manage fencing devices

See the manual pages for details.

.. rubric:: Footnotes

.. [#] Graph visualization software. See http://www.graphviz.org/ for details.