4 files changed, 275 insertions, 0 deletions
diff --git a/agents/virt/docs/README b/agents/virt/docs/README
new file mode 100644
index 0000000..e2b19bc
--- /dev/null
+++ b/agents/virt/docs/README
@@ -0,0 +1,125 @@
+TODO: update
+
+I. Fence_xvm - Virtual machine fencing agent
+
+Fence_xvm is an agent which establishes a communications link between
+a cluster of virtual machines (VC) and a cluster of domain0/physical
+nodes which are hosting the virtual cluster.  Its operations are
+fairly simple.
+
+  (a) Start a listener service.
+  (b) Send a multicast packet requesting that a VM be fenced.
+  (c) Authenticate client.
+  (e) Read response.
+  (f) Exit with success/failure, depending on the response received.
+
+If any of the above steps fail, the fencing agent exits with a failure
+code and fencing is retried by the virtual cluster at a later time.
+Because of the simplicty of fence_xvm, it is not necessary that
+fence_xvm be run from within a virtualized guest - all it needs is
+libnspr and libnss and a shared private key (for authentication; we
+would hate to receive a false positive response from a node not in the
+cluster!).
+
+
+II. Fence_virtd - Virtual machine fencing host
+
+Fence_virtd is a daemon which runs on physical hosts (e.g. in domain0)
+of the cluster hosting the virtual cluster.  It listens on a port
+for multicast traffic from virtual cluster(s), and takes actions.
+Multiple disjoint virtual clusters can coexist on a single physical
+host cluster, but this requires multiple instances of fence_virtd.
+
+NOTE: fence_virtd *MUST* be run on ALL nodes in a given cluster which
+will be hosting virtual machines if fence_xvm is to be used for 
+fencing!
+
+There are a couple of ways the multicast packet is handled,
+depending on the state of the host OS.  It might be hosting the VM,
+or it might not.  Furthermore, the VM might "reside" on a host which
+has failed.
+
+In order to be able to guarantee safe fencing of a VM even if the
+last- known host is down, we must store the last-known locations of
+each virtual machine in some sort of cluster-wide way.  For this, we
+use the corosync CPG API.  Every few seconds, fence_virtd queries the
+hypervisor via libvirt and stores any local VM states and sends those
+states over CPG to all other members.  In the event of a physical node
+failure (which consequently causes the failure of one or more guests),
+we can then read the stored VM state corresponding to the guest we need
+to fence to find out the previous owner. With that information, we can
+infer if the known host node has been fenced.  If so, then the VM is clean
+as well. The physical cluster must, therefore, have fencing in order for
+fence_virtd to work.
+
+Operation of a node hosting a VM which needs to be fenced:
+  
+  (a) Receive multicast packet
+  (b) Authenticate multicast packet
+  (c) Open connection to host contained within multicast
+      packet.
+  (d) Authenticate server.
+  (e) Carry out fencing operation (e.g. call libvirt to destroy or
+      reboot the VM; there is no "on" method at this point).
+  (f) If operation succeeds, send success response.
+
+Operation of high-node-ID:
+
+  (a) Receive multicast packet
+  (b) Authenticate multicast packet
+  (c) Read VM state from stored CPG messages
+  (d) Check liveliness of nodeID hosting VM (if alive, do nothing)
+  (e) Open connection to host contained within multicast
+      packet.
+  (f) Check with CMAN to see if last-known host has been fenced.
+  (g) If last-known host has been fenced, send success response.
+  (h) Authenticate server & send response.
+
+NOTE: There is always a possibility that a VM is started again
+before the fencing operation and CPG update for that VM
+occurs.  If the VM has booted and rejoined the cluster, fencing will
+not be necessary.  If it is in the process of booting, but has not
+yet joined the cluster, fencing will also not be necessary - because
+it will not be using cluster resources yet.
+
+
+III. Security considerations
+
+While fencing is generally expected to run on a more or less trusted
+network, there are cases where it may not be.
+
+* The multicast packet is subject to replay attacks, but because no
+fencing action is taken based solely on the information contained
+within the packet, this should not allow an attacker to maliciously
+fence a VM from outside the cluster, though it may be possible to
+cause a DoS of fence_virtd if enough multicast packets are sent.
+
+* The only currently supported authentication mechanisms are simple
+challenge-response based on a shared private key and pseudorandom
+number generation.
+
+* An attacker with access to the shared key(s) can easily fence any
+known VM, even if they are not on a cluster node.
+
+* Different shared keys should be used for different virtual
+clusters on the same subnet (whether in the same physical cluster
+or not).  Additionally, multiple fence_virtd instances must be run
+(each listening on a different multicast IP + port combination).
+
+IV.  Configuration
+
+Generate a random key file.  An example of how to generate it is:
+
+    dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4096 count=1
+
+Distribute the generated key file to all domUs in a cluster as well
+as all dom0s which will be hosting that particular cluster of domUs.
+The key should not be placed on shared file systems (because shared
+file systems require the cluster, which requires fencing...).
+
+Start fence_virtd on all hosts
+
+Configure fence_xvm on the domU cluster...
+
+rest...tbd
+
diff --git a/agents/virt/docs/TODO b/agents/virt/docs/TODO
new file mode 100644
index 0000000..17456cf
--- /dev/null
+++ b/agents/virt/docs/TODO
@@ -0,0 +1,7 @@
+High Priority / Blockers for v1.0;
+
+* endian-clean / 64-bit clean data structure analysis
+
+Future Stuff:
+
+* clean up development bits so third parties can develop plugins
diff --git a/agents/virt/docs/architecture.txt b/agents/virt/docs/architecture.txt
new file mode 100644
index 0000000..54fda11
--- /dev/null
+++ b/agents/virt/docs/architecture.txt
@@ -0,0 +1,16 @@
+The actual architecture of fence_virtd is very simple.  We have a set
+of listener plugins which listens for fencing requests for virtual
+machines.
+
+These plugins are assigned callbacks which are entry functions in to
+the backend plugins.  The backend plugins perform the actual fencing
+request.
+
+In the middle, we have only enough code to provide basic integration
+functions between the listener and backend plugins.  This includes a 
+very simple confiugration plugin which we pass to each of the plugins.
+
+Because we are passing function pointers in to the plugins themselves
+for configuration (rather than having the plugins call an API directly,
+for example), we are able to swap out the configuration subsystem for
+other, more full-featured configuration systems, such as libccs.
diff --git a/agents/virt/docs/fence_virt.txt b/agents/virt/docs/fence_virt.txt
new file mode 100644
index 0000000..e554ce4
--- /dev/null
+++ b/agents/virt/docs/fence_virt.txt
@@ -0,0 +1,127 @@
+We need a fencing agent which can work in a variety of guest cluster
+configurations and host configurations.
+
+Requirements
+
+1. Nonrequirement of guest to host networking.  Virtual machines
+   may be configured to run using a nework unknown to the host
+   operating system.  Therefore, the ability to run without network
+   communication between the guest and the hsot is required.
+
+2. Ease of configuration.  The absolute minimum possible configuration
+   must be available.
+
+3. Nonrequirement of host clustering software.  Multiple layers of
+   configuration sucks.  While I fundamentally disagree with the general
+   idea that running CMAN on the host constitutes a "heavyweight
+   cluster", perception is important.
+
+4. Ability to support RHEV-M, oVirt server, and other virtual machine
+   management technologies.  This is beneficial from a security standpoint
+   since it is assumed the management server will be aware of what VMs
+   are allowed to fence what other VMs.
+
+5. Upgrade compatibility with fence_xvm from a configuration standpoint.
+   This may be provided by a symlink over fence_xvm.  If this feature
+   can not be provided as a matter of design, a method to convert an
+   existing fence_xvm/fence_xvmd configuration to fence_virt must be
+   present.
+
+
+Guest to Host Interaction
+-------------------------
+
+The proposal is to use various communications media plugins in order
+to facilitate flexibility with respect to how virtual machine 
+environments are configured.
+
+There are at least 3 simple plugins for guest/client to host/server
+communications:
+
+ * Direct serial.  The guest sends fencing requests out via /dev/ttySX
+   in the guest.  The host is listening on a Unix domain socket[1],
+   and forwards fencing requests accordingly.
+
+   This satisifies most of the requirements, but adds a conundrum
+   when configuring guest clusters, as /dev/ttySX may be /dev/ttySY
+   on another guest.  So, either we must account for this per-guest
+   configuration discrepancy or we must make it an administrative
+   requirement to provide the same serial device on each host
+
+ * Multicast.  This violates the networking requirement, but this is
+   okay since this method of operation is optional.  This operational
+   mode provides for one of the simpler configurations: all that is
+   needed is the guest's name or UUID.  The guest to host
+   communications operates in the same manner as fence_xvm/fence_xvmd,
+   except that there is an implied requirement on restricting the
+   multicast packets accepted to be from the local guests.
+
+ * VM Channel over Serial.  This works like direct serial, but
+   instead of owning the whole device, the device may be shared between
+   multiple applications.  The server subscribes to a channel and
+   listens for fencing requests on the channel; the client in the
+   guest OS connects to the channel and issues fencing requests across
+   it.  One interesting thing is that it may be possible to provide
+   unprivileged users the ability to fence using this method (I
+   do not claim to know if this is useful or not).
+
+
+Host to Hypervisor interaction
+------------------------------
+
+Similar to the way we have plugins for guest to host interaction, 
+we also have plugins which actually do the real work.  These plugins
+are responsible for all of the actual real work performed, including
+tracking VMs if required, forwarding requests to the appropriate hosts
+or management services, and handling the responses.
+
+We propose at 5 plugins in this case:
+
+ * Libvirt (local-only).  There is no intracommunication and no
+   migration support is provided 
+
+ * Cluster CPG (+ libvirt).  This the way fence_xvmd
+   operates today.  This setup has the most requirements on the
+   infrastructure, as it requires guest to host networking _and_
+   host-to-host clustering in order to keep track of virtual
+   machines.  The benefit is that it is self-contained and requires
+   no external management nodes.  VM states are stored so that other
+   CPG group members know the locations of other VMs and can make
+   some decisions about whether a VM is dead based on whether a host
+   is dead (i.e. if fencing is in use or can be performed on the
+   host).
+
+ * Libvirt-QMF ... ???  Subscription to the appropriate cluster
+   specific AMQP channel is required on the host side, but this
+   handles routing the message very easily.  The fencing request
+   is forwarded to the other listeners on the channel, the VM owner
+   takes the action requested and returns a value.  When new VMs
+   are created, the event is broadcast out via the AMQP channel so
+   other hosts know the locations of other VMs and can make some
+   decisions about whether a VM is dead based on whether a host
+   is dead (i.e. if fencing is in use or can be performed on the
+   host).
+
+ * oVirt Manager.  The request is forwarded to the oVirt Manager
+   and the oVirt manager is responsible for taking the appropriate
+   action and responding to the request.
+   
+ * RHEV-M.  The request is forwarded to the RHEV-M node, which is
+   responsible for taking the appropriate action and responding to
+   the request.
+
+
+These plugins have no requirements on which guest to host communication
+plugin is used (you could, if you wanted, use 'direct serial' with
+'cluster cpg', or 'multicast' with 'RHEV-H' for example).
+
+These plugins must also be able to discover where appropriate.  For
+example, the cpg plugin can only be used if corosync/openais
+is running.  A defined plugin preference order should be specified/documented
+so that the host daemon behaves in a predictable manner in absence of
+host-side configuration data (about which plugin to use).
+
+
+[1] TCP was also explored, however, the security is much better 
+    using a Unix domain socket, despite the additional complexity 
+    of listening for VM creation events.