summaryrefslogtreecommitdiffstats
path: root/agents/virt/docs
diff options
context:
space:
mode:
Diffstat (limited to 'agents/virt/docs')
-rw-r--r--agents/virt/docs/README125
-rw-r--r--agents/virt/docs/TODO7
-rw-r--r--agents/virt/docs/architecture.txt16
-rw-r--r--agents/virt/docs/fence_virt.txt127
4 files changed, 275 insertions, 0 deletions
diff --git a/agents/virt/docs/README b/agents/virt/docs/README
new file mode 100644
index 0000000..e2b19bc
--- /dev/null
+++ b/agents/virt/docs/README
@@ -0,0 +1,125 @@
+TODO: update
+
+I. Fence_xvm - Virtual machine fencing agent
+
+Fence_xvm is an agent which establishes a communications link between
+a cluster of virtual machines (VC) and a cluster of domain0/physical
+nodes which are hosting the virtual cluster. Its operations are
+fairly simple.
+
+ (a) Start a listener service.
+ (b) Send a multicast packet requesting that a VM be fenced.
+ (c) Authenticate client.
+ (e) Read response.
+ (f) Exit with success/failure, depending on the response received.
+
+If any of the above steps fail, the fencing agent exits with a failure
+code and fencing is retried by the virtual cluster at a later time.
+Because of the simplicty of fence_xvm, it is not necessary that
+fence_xvm be run from within a virtualized guest - all it needs is
+libnspr and libnss and a shared private key (for authentication; we
+would hate to receive a false positive response from a node not in the
+cluster!).
+
+
+II. Fence_virtd - Virtual machine fencing host
+
+Fence_virtd is a daemon which runs on physical hosts (e.g. in domain0)
+of the cluster hosting the virtual cluster. It listens on a port
+for multicast traffic from virtual cluster(s), and takes actions.
+Multiple disjoint virtual clusters can coexist on a single physical
+host cluster, but this requires multiple instances of fence_virtd.
+
+NOTE: fence_virtd *MUST* be run on ALL nodes in a given cluster which
+will be hosting virtual machines if fence_xvm is to be used for
+fencing!
+
+There are a couple of ways the multicast packet is handled,
+depending on the state of the host OS. It might be hosting the VM,
+or it might not. Furthermore, the VM might "reside" on a host which
+has failed.
+
+In order to be able to guarantee safe fencing of a VM even if the
+last- known host is down, we must store the last-known locations of
+each virtual machine in some sort of cluster-wide way. For this, we
+use the corosync CPG API. Every few seconds, fence_virtd queries the
+hypervisor via libvirt and stores any local VM states and sends those
+states over CPG to all other members. In the event of a physical node
+failure (which consequently causes the failure of one or more guests),
+we can then read the stored VM state corresponding to the guest we need
+to fence to find out the previous owner. With that information, we can
+infer if the known host node has been fenced. If so, then the VM is clean
+as well. The physical cluster must, therefore, have fencing in order for
+fence_virtd to work.
+
+Operation of a node hosting a VM which needs to be fenced:
+
+ (a) Receive multicast packet
+ (b) Authenticate multicast packet
+ (c) Open connection to host contained within multicast
+ packet.
+ (d) Authenticate server.
+ (e) Carry out fencing operation (e.g. call libvirt to destroy or
+ reboot the VM; there is no "on" method at this point).
+ (f) If operation succeeds, send success response.
+
+Operation of high-node-ID:
+
+ (a) Receive multicast packet
+ (b) Authenticate multicast packet
+ (c) Read VM state from stored CPG messages
+ (d) Check liveliness of nodeID hosting VM (if alive, do nothing)
+ (e) Open connection to host contained within multicast
+ packet.
+ (f) Check with CMAN to see if last-known host has been fenced.
+ (g) If last-known host has been fenced, send success response.
+ (h) Authenticate server & send response.
+
+NOTE: There is always a possibility that a VM is started again
+before the fencing operation and CPG update for that VM
+occurs. If the VM has booted and rejoined the cluster, fencing will
+not be necessary. If it is in the process of booting, but has not
+yet joined the cluster, fencing will also not be necessary - because
+it will not be using cluster resources yet.
+
+
+III. Security considerations
+
+While fencing is generally expected to run on a more or less trusted
+network, there are cases where it may not be.
+
+* The multicast packet is subject to replay attacks, but because no
+fencing action is taken based solely on the information contained
+within the packet, this should not allow an attacker to maliciously
+fence a VM from outside the cluster, though it may be possible to
+cause a DoS of fence_virtd if enough multicast packets are sent.
+
+* The only currently supported authentication mechanisms are simple
+challenge-response based on a shared private key and pseudorandom
+number generation.
+
+* An attacker with access to the shared key(s) can easily fence any
+known VM, even if they are not on a cluster node.
+
+* Different shared keys should be used for different virtual
+clusters on the same subnet (whether in the same physical cluster
+or not). Additionally, multiple fence_virtd instances must be run
+(each listening on a different multicast IP + port combination).
+
+IV. Configuration
+
+Generate a random key file. An example of how to generate it is:
+
+ dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4096 count=1
+
+Distribute the generated key file to all domUs in a cluster as well
+as all dom0s which will be hosting that particular cluster of domUs.
+The key should not be placed on shared file systems (because shared
+file systems require the cluster, which requires fencing...).
+
+Start fence_virtd on all hosts
+
+Configure fence_xvm on the domU cluster...
+
+rest...tbd
+
diff --git a/agents/virt/docs/TODO b/agents/virt/docs/TODO
new file mode 100644
index 0000000..17456cf
--- /dev/null
+++ b/agents/virt/docs/TODO
@@ -0,0 +1,7 @@
+High Priority / Blockers for v1.0;
+
+* endian-clean / 64-bit clean data structure analysis
+
+Future Stuff:
+
+* clean up development bits so third parties can develop plugins
diff --git a/agents/virt/docs/architecture.txt b/agents/virt/docs/architecture.txt
new file mode 100644
index 0000000..54fda11
--- /dev/null
+++ b/agents/virt/docs/architecture.txt
@@ -0,0 +1,16 @@
+The actual architecture of fence_virtd is very simple. We have a set
+of listener plugins which listens for fencing requests for virtual
+machines.
+
+These plugins are assigned callbacks which are entry functions in to
+the backend plugins. The backend plugins perform the actual fencing
+request.
+
+In the middle, we have only enough code to provide basic integration
+functions between the listener and backend plugins. This includes a
+very simple confiugration plugin which we pass to each of the plugins.
+
+Because we are passing function pointers in to the plugins themselves
+for configuration (rather than having the plugins call an API directly,
+for example), we are able to swap out the configuration subsystem for
+other, more full-featured configuration systems, such as libccs.
diff --git a/agents/virt/docs/fence_virt.txt b/agents/virt/docs/fence_virt.txt
new file mode 100644
index 0000000..e554ce4
--- /dev/null
+++ b/agents/virt/docs/fence_virt.txt
@@ -0,0 +1,127 @@
+We need a fencing agent which can work in a variety of guest cluster
+configurations and host configurations.
+
+Requirements
+
+1. Nonrequirement of guest to host networking. Virtual machines
+ may be configured to run using a nework unknown to the host
+ operating system. Therefore, the ability to run without network
+ communication between the guest and the hsot is required.
+
+2. Ease of configuration. The absolute minimum possible configuration
+ must be available.
+
+3. Nonrequirement of host clustering software. Multiple layers of
+ configuration sucks. While I fundamentally disagree with the general
+ idea that running CMAN on the host constitutes a "heavyweight
+ cluster", perception is important.
+
+4. Ability to support RHEV-M, oVirt server, and other virtual machine
+ management technologies. This is beneficial from a security standpoint
+ since it is assumed the management server will be aware of what VMs
+ are allowed to fence what other VMs.
+
+5. Upgrade compatibility with fence_xvm from a configuration standpoint.
+ This may be provided by a symlink over fence_xvm. If this feature
+ can not be provided as a matter of design, a method to convert an
+ existing fence_xvm/fence_xvmd configuration to fence_virt must be
+ present.
+
+
+Guest to Host Interaction
+-------------------------
+
+The proposal is to use various communications media plugins in order
+to facilitate flexibility with respect to how virtual machine
+environments are configured.
+
+There are at least 3 simple plugins for guest/client to host/server
+communications:
+
+ * Direct serial. The guest sends fencing requests out via /dev/ttySX
+ in the guest. The host is listening on a Unix domain socket[1],
+ and forwards fencing requests accordingly.
+
+ This satisifies most of the requirements, but adds a conundrum
+ when configuring guest clusters, as /dev/ttySX may be /dev/ttySY
+ on another guest. So, either we must account for this per-guest
+ configuration discrepancy or we must make it an administrative
+ requirement to provide the same serial device on each host
+
+ * Multicast. This violates the networking requirement, but this is
+ okay since this method of operation is optional. This operational
+ mode provides for one of the simpler configurations: all that is
+ needed is the guest's name or UUID. The guest to host
+ communications operates in the same manner as fence_xvm/fence_xvmd,
+ except that there is an implied requirement on restricting the
+ multicast packets accepted to be from the local guests.
+
+ * VM Channel over Serial. This works like direct serial, but
+ instead of owning the whole device, the device may be shared between
+ multiple applications. The server subscribes to a channel and
+ listens for fencing requests on the channel; the client in the
+ guest OS connects to the channel and issues fencing requests across
+ it. One interesting thing is that it may be possible to provide
+ unprivileged users the ability to fence using this method (I
+ do not claim to know if this is useful or not).
+
+
+Host to Hypervisor interaction
+------------------------------
+
+Similar to the way we have plugins for guest to host interaction,
+we also have plugins which actually do the real work. These plugins
+are responsible for all of the actual real work performed, including
+tracking VMs if required, forwarding requests to the appropriate hosts
+or management services, and handling the responses.
+
+We propose at 5 plugins in this case:
+
+ * Libvirt (local-only). There is no intracommunication and no
+ migration support is provided
+
+ * Cluster CPG (+ libvirt). This the way fence_xvmd
+ operates today. This setup has the most requirements on the
+ infrastructure, as it requires guest to host networking _and_
+ host-to-host clustering in order to keep track of virtual
+ machines. The benefit is that it is self-contained and requires
+ no external management nodes. VM states are stored so that other
+ CPG group members know the locations of other VMs and can make
+ some decisions about whether a VM is dead based on whether a host
+ is dead (i.e. if fencing is in use or can be performed on the
+ host).
+
+ * Libvirt-QMF ... ??? Subscription to the appropriate cluster
+ specific AMQP channel is required on the host side, but this
+ handles routing the message very easily. The fencing request
+ is forwarded to the other listeners on the channel, the VM owner
+ takes the action requested and returns a value. When new VMs
+ are created, the event is broadcast out via the AMQP channel so
+ other hosts know the locations of other VMs and can make some
+ decisions about whether a VM is dead based on whether a host
+ is dead (i.e. if fencing is in use or can be performed on the
+ host).
+
+ * oVirt Manager. The request is forwarded to the oVirt Manager
+ and the oVirt manager is responsible for taking the appropriate
+ action and responding to the request.
+
+ * RHEV-M. The request is forwarded to the RHEV-M node, which is
+ responsible for taking the appropriate action and responding to
+ the request.
+
+
+These plugins have no requirements on which guest to host communication
+plugin is used (you could, if you wanted, use 'direct serial' with
+'cluster cpg', or 'multicast' with 'RHEV-H' for example).
+
+These plugins must also be able to discover where appropriate. For
+example, the cpg plugin can only be used if corosync/openais
+is running. A defined plugin preference order should be specified/documented
+so that the host daemon behaves in a predictable manner in absence of
+host-side configuration data (about which plugin to use).
+
+
+[1] TCP was also explored, however, the security is much better
+ using a Unix domain socket, despite the additional complexity
+ of listening for VM creation events.