summaryrefslogtreecommitdiffstats
path: root/cts/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 06:53:20 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 06:53:20 +0000
commite5a812082ae033afb1eed82c0f2df3d0f6bdc93f (patch)
treea6716c9275b4b413f6c9194798b34b91affb3cc7 /cts/README.md
parentInitial commit. (diff)
downloadpacemaker-e5a812082ae033afb1eed82c0f2df3d0f6bdc93f.tar.xz
pacemaker-e5a812082ae033afb1eed82c0f2df3d0f6bdc93f.zip
Adding upstream version 2.1.6.upstream/2.1.6
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'cts/README.md')
-rw-r--r--cts/README.md369
1 files changed, 369 insertions, 0 deletions
diff --git a/cts/README.md b/cts/README.md
new file mode 100644
index 0000000..0ff1065
--- /dev/null
+++ b/cts/README.md
@@ -0,0 +1,369 @@
+# Pacemaker Cluster Test Suite (CTS)
+
+The Cluster Test Suite (CTS) refers to all Pacemaker testing code that can be
+run in an installed environment. (Pacemaker also has unit tests that must be
+run from a source distribution.)
+
+CTS includes:
+
+* Regression tests: These test specific Pacemaker components individually (no
+ integration tests). The primary front end is cts-regression in this
+ directory. Run it with the --help option to see its usage.
+
+ cts-regression is a wrapper for individual component regression tests also
+ in this directory (cts-cli, cts-exec, cts-fencing, and cts-scheduler).
+
+ The CLI and scheduler regression tests can also be run from a source
+ distribution. The other regression tests can only run in an installed
+ environment, and the cluster should not be running on the node running these
+ tests.
+
+* The CTS lab: This is a cluster exerciser for intensively testing the behavior
+ of an entire working cluster. It is primarily for developers and packagers of
+ the Pacemaker source code, but it can be useful for users who wish to see how
+ their cluster will react to various situations. In an installed deployment,
+ the CTS lab is in the cts subdirectory of this directory; in a source
+ distibution, it is in cts/lab.
+
+ The CTS lab runs a randomized series of predefined tests on the cluster. CTS
+ can be run against a pre-existing cluster configuration or overwrite the
+ existing configuration with a test configuration.
+
+* Helpers: Some of the component regression tests and the CTS lab require
+ certain helpers to be installed as root. These include a dummy LSB init
+ script, dummy systemd service, etc. In a source distribution, the source for
+ these is in cts/support.
+
+ The tests will install these as needed and uninstall them when done. This
+ means that the cluster configuration created by the CTS lab will generate
+ failures if started manually after the lab exits. However, the helper
+ installer can be run manually to make the configuration usable, if you want
+ to do your own further testing with it:
+
+ /usr/libexec/pacemaker/cts-support install
+
+ As you might expect, you can also remove the helpers with:
+
+ /usr/libexec/pacemaker/cts-support uninstall
+
+* Cluster benchmark: The benchmark subdirectory of this directory contains some
+ cluster test environment benchmarking code. It is not particularly useful for
+ end users.
+
+* LXC generator: The lxc\_autogen.sh script can be used to create some guest
+ nodes for testing using LXC containers. It is not particularly useful for end
+ users. In an installed deployment, it is in the cts subdirectory of this
+ directory; in a source distribution, it is in this directory.
+
+* Valgrind suppressions: When memory-testing Pacemaker code with valgrind,
+ various bugs in non-Pacemaker libraries and such can clutter the results. The
+ valgrind-pcmk.suppressions file in this directory can be used with valgrind's
+ --suppressions option to eliminate many of these.
+
+
+## Using the CTS lab
+
+### Requirements
+
+* Three or more machines (one test exerciser and at least two cluster nodes).
+
+* The test cluster nodes should be on the same subnet and have journalling
+ filesystems (ext4, xfs, etc.) for all of their filesystems other than
+ /boot. You also need a number of free IP addresses on that subnet if you
+ intend to test IP address takeover.
+
+* The test exerciser machine doesn't need to be on the same subnet as the test
+ cluster machines. Minimal demands are made on the exerciser; it just has to
+ stay up during the tests.
+
+* Tracking problems is easier if all machines' clocks are closely synchronized.
+ NTP does this automatically, but you can do it by hand if you want.
+
+* The account on the exerciser used to run the CTS lab (which does not need to
+ be root) must be able to ssh as root to the cluster nodes without a password
+ challenge. See the Mini-HOWTO at the end of this file for details about how
+ to configure ssh for this.
+
+* The exerciser needs to be able to resolve all cluster node names, whether by
+ DNS or /etc/hosts.
+
+* CTS is not guaranteed to run on all platforms that Pacemaker itself does.
+ It calls commands such as service that may not be provided by all OSes.
+
+
+### Preparation
+
+* Install Pacemaker, including the testing code, on all machines. The testing
+ code must be the same version as the rest of Pacemaker, and the Pacemaker
+ version must be the same on the exerciser and all cluster nodes.
+
+ You can install from source, although many distributions package the testing
+ code (named pacemaker-cts or similar). Typically, everything needed by the
+ CTS lab is installed in /usr/share/pacemaker/tests/cts.
+
+* Configure the cluster layer (Corosync) on the cluster machines (*not* the
+ exerciser), and verify it works. Node names used in the cluster configuration
+ *must* match the hosts' names as returned by `uname -n`; they do not have to
+ match the machines' fully qualified domain names.
+
+
+### Run
+
+The primary interface to the CTS lab is the CTSlab.py executable:
+
+ /usr/share/pacemaker/tests/cts/CTSlab.py [options] <number-of-tests-to-run>
+
+As part of the options, specify the cluster nodes with --nodes, for example:
+
+ --nodes "pcmk-1 pcmk-2 pcmk-3"
+
+Most people will want to save the output to a file, for example:
+
+ --outputfile ~/cts.log
+
+Unless you want to test a pre-existing cluster configuration, you also want
+(*warning*: with these options, any existing configuration will be lost):
+
+ --clobber-cib
+ --populate-resources
+
+You can test floating IP addresses (*not* already used by any host), one per
+cluster node, by specifying the first, for example:
+
+ --test-ip-base 192.168.9.100
+
+Configure some sort of fencing, for example to use fence\_xvm:
+
+ --stonith xvm
+
+Putting all the above together, a command line might look like:
+
+ /usr/share/pacemaker/tests/cts/CTSlab.py --nodes "pcmk-1 pcmk-2 pcmk-3" \
+ --outputfile ~/cts.log --clobber-cib --populate-resources \
+ --test-ip-base 192.168.9.100 --stonith xvm 50
+
+For more options, run with the --help option.
+
+There are also a couple of wrappers for CTSlab.py that some users may find more
+convenient: cts, which is typically installed in the same place as the rest of
+the testing code; and cluster\_test, which is in the source directory and
+typically not installed.
+
+To extract the result of a particular test, run:
+
+ crm_report -T $test
+
+
+### Optional: Memory testing
+
+Pacemaker has various options for testing memory management. On cluster nodes,
+Pacemaker components use various environment variables to control these
+options. How these variables are set varies by OS, but usually they are set in
+a file such as /etc/sysconfig/pacemaker or /etc/default/pacemaker.
+
+Valgrind is a program for detecting memory management problems such as
+use-after-free errors. If you have valgrind installed, you can enable it by
+setting the following environment variables on all cluster nodes:
+
+ PCMK_valgrind_enabled=pacemaker-attrd,pacemaker-based,pacemaker-controld,pacemaker-execd,pacemaker-fenced,pacemaker-schedulerd
+ VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25
+ --log-file=/var/lib/pacemaker/valgrind-%p
+ --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
+ --gen-suppressions=all"
+
+If running the CTS lab with valgrind enabled on the cluster nodes, add these
+options to CTSlab.py:
+
+ --valgrind-tests --valgrind-procs "pacemaker-attrd pacemaker-based pacemaker-controld pacemaker-execd pacemaker-schedulerd pacemaker-fenced"
+
+These options should only be set while specifically testing memory management,
+because they may slow down the cluster significantly, and they will disable
+writes to the CIB. If desired, you can enable valgrind on a subset of pacemaker
+components rather than all of them as listed above.
+
+Valgrind will put a text file for each process in the location specified by
+valgrind's --log-file option. See
+https://www.valgrind.org/docs/manual/mc-manual.html for explanations of the
+messages valgrind generates.
+
+Separately, if you are using the GNU C library, the G\_SLICE,
+MALLOC\_PERTURB\_, and MALLOC\_CHECK\_ environment variables can be set to
+affect the library's memory management functions.
+
+When using valgrind, G\_SLICE should be set to "always-malloc", which helps
+valgrind track memory by always using the malloc() and free() routines
+directly. When not using valgrind, G\_SLICE can be left unset, or set to
+"debug-blocks", which enables the C library to catch many memory errors
+but may impact performance.
+
+If the MALLOC\_PERTURB\_ environment variable is set to an 8-bit integer, the C
+library will initialize all newly allocated bytes of memory to the integer
+value, and will set all newly freed bytes of memory to the bitwise inverse of
+the integer value. This helps catch uses of uninitialized or freed memory
+blocks that might otherwise go unnoticed. Example:
+
+ MALLOC_PERTURB_=221
+
+If the MALLOC\_CHECK\_ environment variable is set, the C library will check for
+certain heap corruption errors. The most useful value in testing is 3, which
+will cause the library to print a message to stderr and abort execution.
+Example:
+
+ MALLOC_CHECK_=3
+
+Valgrind should be enabled for either all nodes or none when used with the CTS
+lab, but the C library variables may be set differently on different nodes.
+
+
+### Optional: Remote node testing
+
+If the pacemaker-remoted daemon is installed on all cluster nodes, CTS will
+enable remote node tests.
+
+The remote node tests choose a random node, stop the cluster on it, start
+pacemaker-remoted on it, and add an ocf:pacemaker:remote resource to turn it
+into a remote node. When the test is done, CTS will turn the node back into
+a cluster node.
+
+To avoid conflicts, CTS will rename the node, prefixing the original node name
+with "remote-". For example, "pcmk-1" will become "remote-pcmk-1". These names
+do not need to be resolvable.
+
+The name change may require special fencing configuration, if the fence agent
+expects the node name to be the same as its hostname. A common approach is to
+specify the "remote-" names in pcmk\_host\_list. If you use
+pcmk\_host\_list=all, CTS will expand that to all cluster nodes and their
+"remote-" names. You may additionally need a pcmk\_host\_map argument to map
+the "remote-" names to the hostnames. Example:
+
+ --stonith xvm --stonith-args \
+ pcmk_host_list=all,pcmk_host_map=remote-pcmk-1:pcmk-1;remote-pcmk-2:pcmk-2
+
+
+### Optional: Remote node testing with valgrind
+
+When running the remote node tests, the Pacemaker components on the *cluster*
+nodes can be run under valgrind as described in the "Memory testing" section.
+However, pacemaker-remoted cannot be run under valgrind that way, because it is
+started by the OS's regular boot system and not by Pacemaker.
+
+Details vary by system, but the goal is to set the VALGRIND\_OPTS environment
+variable and then start pacemaker-remoted by prefixing it with the path to
+valgrind.
+
+The init script and systemd service file provided with pacemaker-remoted will
+load the pacemaker environment variables from the same location used by other
+Pacemaker components, so VALGRIND\_OPTS will be set correctly if using one of
+those.
+
+For an OS using systemd, you can override the ExecStart parameter to run
+valgrind. For example:
+
+ mkdir /etc/systemd/system/pacemaker_remote.service.d
+ cat >/etc/systemd/system/pacemaker_remote.service.d/valgrind.conf <<EOF
+ [Service]
+ ExecStart=
+ ExecStart=/usr/bin/valgrind /usr/sbin/pacemaker-remoted
+ EOF
+
+
+### Optional: Container testing
+
+If the --container-tests option is given to CTSlab.py, it will enable
+testing of LXC resources (currently only the RemoteLXC test,
+which starts a remote node using an LXC container).
+
+The container tests have additional package dependencies (see the toplevel
+INSTALL.md). Also, SELinux must be enabled (in either permissive or enforcing
+mode), libvirtd must be enabled and running, and root must be able to ssh
+without a password between all cluster nodes (not just from the exerciser).
+Before running the tests, you can verify your environment with:
+
+ /usr/share/pacemaker/tests/cts/lxc_autogen.sh -v
+
+LXC tests will create two containers with hardcoded parameters: a NAT'ed bridge
+named virbr0 using the IP network 192.168.123.0/24 will be created on the
+cluster node hosting the containers; the host will be assigned
+52:54:00:A8:12:35 as the MAC address and 192.168.123.1 as the IP address.
+Each container will be assigned a random MAC address starting with 52:54:,
+the IP address 192.168.123.11 or 192.168.123.12, the hostname lxc1 or lxc2
+(which will be added to the host's /etc/hosts file), and 196MB RAM.
+
+The test will revert all of the configuration when it is done.
+
+
+### Mini-HOWTO: Allow passwordless remote SSH connections
+
+The CTS scripts run "ssh -l root" so you don't have to do any of your testing
+logged in as root on the exerciser. Here is how to allow such connections
+without requiring a password to be entered each time:
+
+* On your test exerciser, create an SSH key if you do not already have one.
+ Most commonly, SSH keys will be in your ~/.ssh directory, with the
+ private key file not having an extension, and the public key file
+ named the same with the extension ".pub" (for example, ~/.ssh/id\_rsa.pub).
+
+ If you don't already have a key, you can create one with:
+
+ ssh-keygen -t rsa
+
+* From your test exerciser, authorize your SSH public key for root on all test
+ machines (both the exerciser and the cluster test machines):
+
+ ssh-copy-id -i ~/.ssh/id_rsa.pub root@$MACHINE
+
+ You will probably have to provide your password, and possibly say
+ "yes" to some questions about accepting the identity of the test machines.
+
+ The above assumes you have a RSA SSH key in the specified location;
+ if you have some other type of key (DSA, ECDSA, etc.), use its file name
+ in the -i option above.
+
+* To verify, try this command from the exerciser machine for each
+ of your cluster machines, and for the exerciser machine itself.
+
+ ssh -l root $MACHINE
+
+ If this works without prompting for a password, you're in business.
+ If not, look at the documentation for your version of ssh.
+
+
+## Note on the maintenance
+
+### Tests for scheduler
+
+The source `*.xml` files are preferably kept in sync with the newest
+major (and only major, which is enough) schema version, since these
+tests are not meant to double as schema upgrade ones (except some cases
+expressly designated as such).
+
+Currently and unless something goes wrong, the procedure of upgrading
+these tests en masse is as easy as:
+
+ cd "$(git rev-parse --show-toplevel)/cts" # if not already
+ pushd "$(git rev-parse --show-toplevel)/xml"
+ ./regression.sh cts_scheduler -G
+ popd
+ git add --interactive .
+ git commit -m 'XML: upgrade-M.N.xsl: apply on scheduler CTS test cases'
+ git reset HEAD && git checkout . # if some differences still remain
+ ./cts-scheduler # absolutely vital to check nothing got broken!
+
+Now, sadly, there's no proved automated way to minimize instances like this:
+
+ <primitive id="rsc1" class="ocf" provider="heartbeat" type="apache">
+ </primitive>
+
+that may be left behind into more canonical:
+
+ <primitive id="rsc1" class="ocf" provider="heartbeat" type="apache"/>
+
+so manual editing is tasked, or perhaps `--format` or `--c14n`
+to `xmllint` will be of help (without any other side effects).
+
+If the overall process gets stuck anywhere, common sense to the rescue.
+The initial part of the above recipe can be repeated anytime to verify
+there's nothing to upgrade artificially like this, which is a desired
+state. Note that `regression.sh` script performs validation of both
+the input and output, should the upgrade take place, implicitly, so
+there's no need of revalidation in the happy case.