summaryrefslogtreecommitdiffstats
path: root/doc/stonith/README_kdumpcheck.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/stonith/README_kdumpcheck.txt')
-rw-r--r--doc/stonith/README_kdumpcheck.txt151
1 files changed, 151 insertions, 0 deletions
diff --git a/doc/stonith/README_kdumpcheck.txt b/doc/stonith/README_kdumpcheck.txt
new file mode 100644
index 0000000..cc8787c
--- /dev/null
+++ b/doc/stonith/README_kdumpcheck.txt
@@ -0,0 +1,151 @@
+ Kdump check STONITH plugin "kdumpcheck"
+1. Introduction
+ This plugin's purpose is to avoid STONITH for a node which is doing kdump.
+ It confirms whether the node is doing kdump or not when STONITH reset or
+ off operation is executed.
+ If the target node is doing kdump, this plugin considers that STONITH
+ succeeded. If not, it considers that STONITH failed.
+
+ NOTE: This plugin has no ability to shutdown or startup a node.
+ So it has to be used with other STONITH plugin.
+ Then, when this plugin failed, the next plugin which can kill a node
+ is executed.
+ NOTE: This plugin works only on Linux.
+
+2. The way to check
+ When STONITH reset or off is executed, kdumpcheck connects to the target
+ node, and checks the size of /proc/vmcore.
+ It judges that the target node is _not_ doing kdump when the size of
+ /proc/vmcore on the node is zero, or the file doesn't exist.
+ Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
+ is executed.
+
+3. Expanding mkdumprd
+ This plugin requires non-root user and ssh connection even on 2nd kernel.
+ So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
+ This patch is tested with mkdumprd version 5.0.39.
+ The patch adds the following functions:
+ i) Start udevd with specified .rules files.
+ ii) Bring the specified network interface up.
+ iii) Start sshd.
+ iv) Add the specified user to the 2nd kernel.
+ The user is to check whether the node is doing kdump or not.
+ v) Execute sync command after dumping.
+
+ NOTE: i) to iv) expandings are only for the case that filesystem partition
+ is specified as the location where the vmcore should be dumped.
+
+4. Parameters
+ kdumpcheck's parameters are the following.
+ hostlist : The list of hosts that the STONITH device controls.
+ delimiter is "," or " ".
+ indispensable setting. (default:none)
+ identity_file: a full-path of the private key file for the user
+ who checks doing kdump.
+ (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
+ $HOME/.ssh/identity)
+
+ NOTE: To execute this plugin first, set the highest priority to this plugin
+ in all STONITH resources.
+
+5. How to Use
+ To use this tool, do the following steps at all nodes in the cluster.
+ 1) Add an user to check doing kdump.
+ ex.)
+ # useradd kdumpchecker
+ # passwd kdumpchecker
+ 2) Allow passwordless login from the node which will do STONITH to all
+ target nodes for the user added at step 1).
+ ex.)
+ $ cd
+ $ mkdir .ssh
+ $ chmod 700 .ssh
+ $ cd .ssh
+ $ ssh-keygen (generate authentication keys with empty passphrase)
+ $ scp id_rsa.pub kdumpchecker@target_node:"~/.ssh/."
+ $ ssh kdumpchecker@target_node
+ $ cd ~/.ssh
+ $ cat id_rsa.pub >> authorized_keys
+ $ chmod 600 autorized_keys
+ $ rm id_rsa.pub
+ 3) Limit the command that the user can execute.
+ Describe the following commands in a line at the head of the user's
+ public key in target node's authorized_keys file.
+ [command="test -s /proc/vmcore"]
+ And describe some options (like no-pty, no-port-forwarding and so on)
+ according to your security policy.
+ ex.)
+ $ vi ~/.ssh/authorized_keys
+ command="test -s /proc/vmcore",no-port-forwarding,no-X11-forwarding,
+ no-agent-forwarding,no-pty ssh-rsa AAA..snip..== kdumpchecker@node1
+ 4) Add settings in /etc/kdump.conf.
+ network_device : network interface name to check doing kdump.
+ indispensable setting. (default: none)
+ kdump_check_user : user name to check doing kdump.
+ specify non-root user.
+ (default: "kdumpchecker")
+ udev_rules : .rules files' names.
+ specify if you use udev for mapping devices.
+ specified files have to be in /etc/udev/rules.d/.
+ you can specify two or more files.
+ delimiter is "," or " ". (default: none)
+ ex.)
+ # vi /etc/kdump.conf
+ ext3 /dev/sda1
+ network_device eth0
+ kdump_check_user kdumpchecker
+ udev_rules 10-if.rules
+ 5) Apply the patch to /sbin/mkdumprd.
+ # cd /sbin
+ # patch -p 1 < mkdumprd_for_kdumpcheck.patch
+ 6) Restart kdump service.
+ # service kdump restart
+ 7) Describe cib.xml to set STONITH plugin.
+ (See "2. Parameters" and "6. Appendix")
+
+6. Appendix
+ A sample cib.xml.
+ <clone id="clnStonith">
+ <instance_attributes id="instance_attributes.id238245a">
+ <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
+ <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
+ </instance_attributes>
+ <group id="grpStonith">
+ <instance_attributes id="instance_attributes.id2382455"/>
+ <primitive id="grpStonith-kdumpcheck" class="stonith" type="external/kd
+ umpcheck">
+ <instance_attributes id="instance_attributes.id238240a">
+ <nvpair id="nvpair.id238240b" name="hostlist" value="node1,node2"/>
+ <nvpair id="nvpair.id238240c" name="priority" value="1"/>
+ <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
+ </instance_attributes>
+ <operations>
+ <op id="grpStonith-kdumpcheck-start" name="start" interval="0" tim
+ eout="300" on-fail="restart"/>
+ <op id="grpStonith-kdumpcheck-monitor" name="monitor" interval="10"
+ timeout="60" on-fail="restart"/>
+ <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" timeou
+ t="300" on-fail="block"/>
+ </operations>
+ <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
+ </primitive>
+ <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
+ <instance_attributes id="instance_attributes.id2382402a">
+ <nvpair id="nvpair.id2382408a" name="hostlist" value="node1,node2"/
+ >
+ <nvpair id="nvpair.id238066b" name="priority" value="2"/>
+ <nvpair id="nvpair.id2382408c" name="stonith-timeout" value="60s"/>
+ </instance_attributes>
+ <operations>
+ <op id="grpStonith-ssh-start" name="start" interval="0" timeout="30
+ 0" on-fail="restart"/>
+ <op id="grpStonith-ssh-monitor" name="monitor" interval="10" timeou
+ t="60" on-fail="restart"/>
+ <op id="grpStonith-ssh-stop" name="stop" interval="0" timeout="300"
+ on-fail="block"/>
+ </operations>
+ <meta_attributes id="primitive-grpStonith-ssh.meta"/>
+ </primitive>
+ </group>
+ </clone>
+