1 files changed, 478 insertions, 0 deletions
diff --git a/doc/hb_report.8.txt b/doc/hb_report.8.txt
new file mode 100644
index 0000000..5efbc32
--- /dev/null
+++ b/doc/hb_report.8.txt
@@ -0,0 +1,478 @@
+:man source:   hb_report
+:man version:  1.2
+:man manual:   Pacemaker documentation
+
+hb_report(8)
+============
+
+
+NAME
+----
+hb_report - create report for CRM based clusters (Pacemaker)
+
+
+SYNOPSIS
+--------
+*hb_report* -f {time|"cts:"testnum} [-t time] [-u user] [-l file]
+       [-n nodes] [-E files] [-p patt] [-L patt] [-e prog]
+	   [-MSDCZAQVsvhd] [dest]
+
+
+DESCRIPTION
+-----------
+The hb_report(1) is a utility to collect all information (logs,
+configuration files, system information, etc) relevant to
+Pacemaker (CRM) over the given period of time.
+
+
+OPTIONS
+-------
+dest::
+	The report name. It can also contain a path where to put the
+	report tarball. If left out, the tarball is created in the
+	current directory named "hb_report-current_date", for instance
+	hb_report-Wed-03-Mar-2010.
+
+*-d*::
+	Don't create the compressed tar, but leave the result in a
+	directory.
+
+*-f* { time | "cts:"testnum }::
+	The start time from which to collect logs. The time is in the
+	format as used by the Date::Parse perl module. For cts tests,
+	specify the "cts:" string followed by the test number. This
+	option is required.
+
+*-t* time::
+	The end time to which to collect logs. Defaults to now.
+
+*-n* nodes::
+	A list of space separated hostnames (cluster members).
+	hb_report may try to find out the set of nodes by itself, but
+	if it runs on the loghost which, as it is usually the case,
+	does not belong to the cluster, that may be difficult. Also,
+	OpenAIS doesn't contain a list of nodes and if Pacemaker is
+	not running, there is no way to find it out automatically.
+	This option is cumulative (i.e. use -n "a b" or -n a -n b).
+
+*-l* file::
+	Log file location. If, for whatever reason, hb_report cannot
+	find the log files, you can specify its absolute path.
+
+*-E* files::
+	Extra log files to collect. This option is cumulative. By
+	default, /var/log/messages are collected along with the
+	cluster logs.
+
+*-M*::
+	Don't collect extra log files, but only the file containing
+	messages from the cluster subsystems.
+
+*-L* patt::
+	A list of regular expressions to match in log files for
+	analysis. This option is additive (default: "CRIT: ERROR:").
+
+*-p* patt::
+	Additional patterns to match parameter name which contain
+	sensitive information. This option is additive (default: "passw.*").
+
+*-Q*::
+	Quick run. Gathering some system information can be expensive.
+	With this option, such operations are skipped and thus
+	information collecting sped up. The operations considered
+	I/O or CPU intensive: verifying installed packages content,
+	sanitizing files for sensitive information, and producing dot
+	files from PE inputs.
+
+*-A*::
+	This is an OpenAIS cluster. hb_report has some heuristics to
+	find the cluster stack, but that is not always reliable.
+	By default, hb_report assumes that it is run on a Heartbeat
+	cluster.
+
+*-u* user::
+	The ssh user. hb_report will try to login to other nodes
+	without specifying a user, then as "root", and finally as
+	"hacluster". If you have another user for administration over
+	ssh, please use this option.
+
+*-X* ssh-options::
+	Extra ssh options. These will be added to every ssh
+	invocation. Alternatively, use `$HOME/.ssh/config` to setup
+	desired ssh connection options.
+
+*-S*::
+	Single node operation. Run hb_report only on this node and
+	don't try to start slave collectors on other members of the
+	cluster. Under normal circumstances this option is not
+	needed. Use if ssh(1) does not work to other nodes.
+
+*-Z*::
+	If the destination directory exist, remove it instead of
+	exiting (this is default for CTS).
+
+*-V*::
+	Print the version including the last repository changeset.
+
+*-v*::
+	Increase verbosity. Normally used to debug unexpected
+	behaviour.
+
+*-h*::
+	Show usage and some examples.
+
+*-D* (obsolete)::
+	Don't invoke editor to fill the description text file.
+
+*-e* prog (obsolete)::
+	Your favourite text editor. Defaults to $EDITOR, vim, vi,
+	emacs, or nano, whichever is found first.
+
+*-C* (obsolete)::
+	Remove the destination directory once the report has been put
+	in a tarball.
+
+EXAMPLES
+--------
+Last night during the backup there were several warnings
+encountered (logserver is the log host):
+
+	logserver# hb_report -f 3:00 -t 4:00 -n "node1 node2" report
+
+collects everything from all nodes from 3am to 4am last night.
+The files are compressed to a tarball report.tar.bz2.
+
+Just found a problem during testing:
+
+	# note the current time
+	node1# date
+	Fri Sep 11 18:51:40 CEST 2009
+	node1# /etc/init.d/heartbeat start
+	node1# nasty-command-that-breaks-things
+	node1# sleep 120 #wait for the cluster to settle
+	node1# hb_report -f 18:51 hb1
+
+	# if hb_report can't figure out that this is corosync
+	node1# hb_report -f 18:51 -A hb1
+
+	# if hb_report can't figure out the cluster members
+	node1# hb_report -f 18:51 -n "node1 node2" hb1
+
+The files are compressed to a tarball hb1.tar.bz2.
+
+INTERPRETING RESULTS
+--------------------
+The compressed tar archive is the final product of hb_report.
+This is one example of its content, for a CTS test case on a
+three node OpenAIS cluster:
+
+	$ ls -RF 001-Restart
+
+	001-Restart:
+	analysis.txt     events.txt  logd.cf       s390vm13/  s390vm16/
+	description.txt  ha-log.txt  openais.conf  s390vm14/
+
+	001-Restart/s390vm13:
+	STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysinfo.txt
+	cib.txt  dlm_dump.txt    logd.cf@     pengine/        sysstats.txt
+	cib.xml  events.txt      messages     permissions.txt
+
+	001-Restart/s390vm13/pengine:
+	pe-input-738.bz2  pe-input-740.bz2  pe-warn-450.bz2
+	pe-input-739.bz2  pe-warn-449.bz2   pe-warn-451.bz2
+
+	001-Restart/s390vm14:
+	STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysstats.txt
+	cib.txt  dlm_dump.txt    logd.cf@     permissions.txt
+	cib.xml  events.txt      messages     sysinfo.txt
+
+	001-Restart/s390vm16:
+	STOPPED  crm_verify.txt  hb_uuid.txt  messages        sysinfo.txt
+	cib.txt  dlm_dump.txt    hostcache    openais.conf@   sysstats.txt
+	cib.xml  events.txt      logd.cf@     permissions.txt
+
+The top directory contains information which pertains to the
+cluster or event as a whole. Files with exactly the same content
+on all nodes will also be at the top, with per-node links created
+(as it is in this example the case with openais.conf and logd.cf).
+
+The cluster log files are named ha-log.txt regardless of the
+actual log file name on the system. If it is found on the
+loghost, then it is placed in the top directory. If not, the top
+directory ha-log.txt contains all nodes logs merged and sorted by
+time. Files named messages are excerpts of /var/log/messages from
+nodes.
+
+Most files are copied verbatim or they contain output of a
+command. For instance, cib.xml is a copy of the CIB found in
+/var/lib/heartbeat/crm/cib.xml. crm_verify.txt is output of the
+crm_verify(8) program.
+
+Some files are result of a more involved processing:
+
+	*analysis.txt*::
+	A set of log messages matching user defined patterns (may be
+	provided with the -L option).
+
+	*events.txt*::
+	A set of log messages matching event patterns. It should
+	provide information about major cluster motions without
+	unnecessary details.  These patterns are devised by the
+	cluster experts.  Currently, the patterns cover membership
+	and quorum changes, resource starts and stops, fencing
+	(stonith) actions, and cluster starts and stops. events.txt
+	is always generated for each node. In case the central
+	cluster log was found, also combined for all nodes.
+
+	*permissions.txt*::
+	One of the more common problem causes are file and directory
+	permissions. hb_report looks for a set of predefined
+	directories and checks their permissions. Any issues are
+	reported here.
+
+	*backtraces.txt*::
+	gdb generated backtrace information for cores dumped
+	within the specified period.
+
+	*sysinfo.txt*::
+	Various release information about the platform, kernel,
+	operating system, packages, and anything else deemed to be
+	relevant. The static part of the system.
+
+	*sysstats.txt*::
+	Output of various system commands such as ps(1), uptime(1),
+	netstat(8), and ifconfig(8). The dynamic part of the system.
+
+description.txt should contain a user supplied description of the
+problem, but since it is very seldom used, it will be dropped
+from the future releases.
+
+PREREQUISITES
+-------------
+
+ssh::
+	It is not strictly required, but you won't regret having a
+	password-less ssh. It is not too difficult to setup and will save
+	you a lot of time. If you can't have it, for example because your
+	security policy does not allow such a thing, or you just prefer
+	menial work, then you will have to resort to the semi-manual
+	semi-automated report generation. See below for instructions.
+	+
+	If you need to supply a password for your passphrase/login, then
+	always use the `-u` option.
+	+
+	For extra ssh(1) options, if you're too lazy to setup
+	$HOME/.ssh/config, use the `-X` option. Do not forget to put
+	the options in quotes.
+
+sudo::
+	If the ssh user (as specified with the `-u` option) is other
+	than `root`, then `hb_report` uses `sudo` to collect the
+	information which is readable only by the `root` user. In that
+	case it is required to setup the `sudoers` file properly. The
+	user (or group to which the user belongs) should have the
+	following line:
+	+
+	<user> ALL = NOPASSWD: /usr/sbin/hb_report
+	+
+	See the `sudoers(5)` man page for more details.
+
+Times::
+	In order to find files and messages in the given period and to
+	parse the `-f` and `-t` options, `hb_report` uses perl and one of the
+	`Date::Parse` or `Date::Manip` perl modules. Note that you need
+	only one of these. Furthermore, on nodes which have no logs and
+	where you don't run `hb_report` directly, no date parsing is
+	necessary. In other words, if you run this on a loghost then you
+	don't need these perl modules on the cluster nodes.
+	+
+	On rpm based distributions, you can find `Date::Parse` in
+	`perl-TimeDate` and on Debian and its derivatives in
+	`libtimedate-perl`.
+
+Core dumps::
+	To backtrace core dumps gdb is needed and the packages with
+	the debugging info. The debug info packages may be installed
+	at the time the report is created. Let's hope that you will
+	need this really seldom.
+
+TIMES
+-----
+
+Specifying times can at times be a nuisance. That is why we have
+chosen to use one of the perl modules--they do allow certain
+freedom when talking dates. You can either read the instructions
+at the
+http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse
+examples page].
+or just rely on common sense and try stuff like:
+
+	3:00          (today at 3am)
+	15:00         (today at 3pm)
+	2007/9/1 2pm  (September 1st at 2pm)
+	Tue Sep 15 20:46:27 CEST 2009 (September 15th etc)
+
+`hb_report` will (probably) complain if it can't figure out what do
+you mean.
+
+Try to delimit the event as close as possible in order to reduce
+the size of the report, but still leaving a minute or two around
+for good measure.
+
+`-f` is not optional. And don't forget to quote dates when they
+contain spaces.
+
+
+Should I send all this to the rest of Internet?
+-----------------------------------------------
+
+By default, the sensitive data in CIB and PE files is not mangled
+by `hb_report` because that makes PE input files mostly useless.
+If you still have no other option but to send the report to a
+public mailing list and do not want the sensitive data to be
+included, use the `-s` option. Without this option, `hb_report`
+will issue a warning if it finds information which should not be
+exposed. By default, parameters matching 'passw.*' are considered
+sensitive.  Use the `-p` option to specify additional regular
+expressions to match variable names which may contain information
+you don't want to leak. For example:
+
+	# hb_report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report
+
+Heartbeat's ha.cf is always sanitized. Logs and other files are
+not filtered.
+
+LOGS
+----
+
+It may be tricky to find syslog logs. The scheme used is to log a
+unique message on all nodes and then look it up in the usual
+syslog locations. This procedure is not foolproof, in particular
+if the syslog files are in a non-standard directory. We look in
+/var/log /var/logs /var/syslog /var/adm /var/log/ha
+/var/log/cluster. In case we can't find the logs, please supply
+their location:
+
+	# hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1
+
+If you have different log locations on different nodes, well,
+perhaps you'd like to make them the same and make life easier for
+everybody.
+
+Files starting with "ha-" are preferred. In case syslog sends
+messages to more than one file, if one of them is named ha-log or
+ha-debug those will be favoured over syslog or messages.
+
+hb_report supports also archived logs in case the period
+specified extends that far in the past. The archives must reside
+in the same directory as the current log and their names must
+be prefixed with the name of the current log (syslog-1.gz or
+messages-20090105.bz2).
+
+If there is no separate log for the cluster, possibly unrelated
+messages from other programs are included. We don't filter logs,
+but just pick a segment for the period you specified.
+
+MANUAL REPORT COLLECTION
+------------------------
+
+So, your ssh doesn't work. In that case, you will have to run
+this procedure on all nodes. Use `-S` so that `hb_report` doesn't
+bother with ssh:
+
+	# hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1
+
+If you also have a log host which is not in the cluster, then
+you'll have to copy the log to one of the nodes and tell us where
+it is:
+
+	# hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1
+
+OPERATION
+---------
+hb_report collects files and other information in a fairly
+straightforward way. The most complex tasks are discovering the
+log file locations (if syslog is used which is the most common
+case) and coordinating the operation on multiple nodes.
+
+The instance of hb_report running on the host where it was
+invoked is the master instance. Instances running on other nodes
+are slave instances. The master instance communicates with slave
+instances by ssh. There are multiple ssh invocations per run, so
+it is essential that the ssh works without password, i.e. with
+the public key authentication and authorized_keys.
+
+The operation consists of three phases. Each phase must finish
+on all nodes before the next one can commence. The first phase
+consists of logging unique messages through syslog on all nodes.
+This is the shortest of all phases.
+
+The second phase is the most involved. During this phase all
+local information is collected, which includes:
+
+- logs (both current and archived if the start time is far in the past)
+- various configuration files (corosync, heartbeat, logd)
+- the CIB (both as xml and as represented by the crm shell)
+- pengine inputs (if this node was the DC at any point in
+  time over the given period)
+- system information and status
+- package information and status
+- dlm lock information
+- backtraces (if there were core dumps)
+
+The third phase is collecting information from all nodes and
+analyzing it. The analyzis consists of the following tasks:
+
+- identify files equal on all nodes which may then be moved to
+  the top directory
+- save log messages matching user defined patterns
+  (defaults to ERRORs and CRITical conditions)
+- report if there were coredumps and by whom
+- report crm_verify(8) results
+- save log messages matching major events to events.txt
+- in case logging is configured without loghost, node logs and
+  events files are combined using a perl utility
+
+
+BUGS
+----
+Finding logs may at times be extremely difficult, depending on
+how weird the syslog configuration. It would be nice to ask
+syslog-ng developers to provide a way to find out the log
+destination based on facility and priority.
+
+If you think you found a bug, please rerun with the -v option and
+attach the output to bugzilla.
+
+hb_report can function in a satisfactory way only if ssh works to
+all nodes using authorized_keys (without password).
+
+There are way too many options.
+
+
+AUTHOR
+------
+Written by Dejan Muhamedagic, <dejan@suse.de>
+
+
+RESOURCES
+---------
+Pacemaker: <http://clusterlabs.org/>
+
+Heartbeat and other Linux HA resources: <http://linux-ha.org/wiki>
+
+OpenAIS: <http://www.openais.org/>
+
+Corosync: <http://www.corosync.org/>
+
+
+SEE ALSO
+--------
+Date::Parse(3)
+
+
+COPYING
+-------
+Copyright \(C) 2007-2009 Dejan Muhamedagic. Free use of this
+software is granted under the terms of the GNU General Public License (GPL).
+