1 files changed, 1182 insertions, 0 deletions
diff --git a/ctdb/doc/ctdb.7.xml b/ctdb/doc/ctdb.7.xml
new file mode 100644
index 0000000..0f3fbc6
--- /dev/null
+++ b/ctdb/doc/ctdb.7.xml
@@ -0,0 +1,1182 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+	"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<refentry id="ctdb.7">
+
+<refmeta>
+	<refentrytitle>ctdb</refentrytitle>
+	<manvolnum>7</manvolnum>
+	<refmiscinfo class="source">ctdb</refmiscinfo>
+	<refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo>
+</refmeta>
+
+
+<refnamediv>
+	<refname>ctdb</refname>
+        <refpurpose>Clustered TDB</refpurpose>
+</refnamediv>
+
+<refsect1>
+  <title>DESCRIPTION</title>
+
+  <para>
+    CTDB is a clustered database component in clustered Samba that
+    provides a high-availability load-sharing CIFS server cluster.
+  </para>
+
+  <para>
+    The main functions of CTDB are:
+  </para>
+
+  <itemizedlist>
+    <listitem>
+      <para>
+	Provide a clustered version of the TDB database with automatic
+	rebuild/recovery of the databases upon node failures.
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+      Monitor nodes in the cluster and services running on each node.
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+	Manage a pool of public IP addresses that are used to provide
+	services to clients.  Alternatively, CTDB can be used with
+	LVS.
+      </para>
+    </listitem>
+  </itemizedlist>
+
+  <para>
+    Combined with a cluster filesystem CTDB provides a full
+    high-availablity (HA) environment for services such as clustered
+    Samba, NFS and other services.
+  </para>
+
+  <para>
+    In addition to the CTDB manual pages there is much more
+    information available at
+    <ulink url="https://wiki.samba.org/index.php/CTDB_and_Clustered_Samba"/>.
+  </para>
+</refsect1>
+
+<refsect1>
+  <title>ANATOMY OF A CTDB CLUSTER</title>
+
+  <para>
+    A CTDB cluster is a collection of nodes with 2 or more network
+    interfaces.  All nodes provide network (usually file/NAS) services
+    to clients.  Data served by file services is stored on shared
+    storage (usually a cluster filesystem) that is accessible by all
+    nodes.
+  </para>
+  <para>
+    CTDB provides an "all active" cluster, where services are load
+    balanced across all nodes.
+  </para>
+</refsect1>
+
+  <refsect1>
+    <title>Cluster leader</title>
+
+    <para>
+      CTDB uses a <emphasis>cluster leader and follower</emphasis>
+      model of cluster management.  All nodes in a cluster elect one
+      node to be the leader.  The leader node coordinates privileged
+      operations such as database recovery and IP address failover.
+    </para>
+
+    <para>
+      CTDB previously referred to the leader as the <emphasis>recovery
+      master</emphasis> or <emphasis>recmaster</emphasis>.  References
+      to these terms may still be found in documentation and code.
+    </para>
+  </refsect1>
+
+  <refsect1>
+    <title>Cluster Lock</title>
+
+    <para>
+      CTDB uses a cluster lock to assert its privileged role in the
+      cluster.  This node takes the cluster lock when it becomes
+      leader and holds the lock until it is no longer leader.  The
+      <emphasis>cluster lock</emphasis> helps CTDB to avoid a
+      <emphasis>split brain</emphasis>, where a cluster becomes
+      partitioned and each partition attempts to operate
+      independently.  Issues that can result from a split brain
+      include file data corruption, because file locking metadata may
+      not be tracked correctly.
+    </para>
+
+    <para>
+      CTDB previously referred to the cluster lock as the
+      <emphasis>recovery lock</emphasis>.  The abbreviation
+      <emphasis>reclock</emphasis> is still used - just "clock" would
+      be confusing.
+    </para>
+
+    <para>
+      <emphasis>CTDB is unable configure a default cluster
+      lock</emphasis>, because this would depend on factors such as
+      cluster filesystem mountpoints.  However, <emphasis>running CTDB
+      without a cluster lock is not recommended</emphasis> as there
+      will be no split brain protection.
+    </para>
+
+    <para>
+      When a cluster lock is configured it is used as the election
+      mechanism.  Nodes race to take the cluster lock and the winner
+      is the cluster leader.  This avoids problems when a node wins an
+      election but is unable to take the lock - this can occur if a
+      cluster becomes partitioned (for example, due to a communication
+      failure) and a different leader is elected by the nodes in each
+      partition, or if the cluster filesystem has a high failover
+      latency.
+    </para>
+
+    <para>
+      By default, the cluster lock is implemented using a file
+      (specified by <parameter>cluster lock</parameter> in the
+      <literal>[cluster]</literal> section of
+      <citerefentry><refentrytitle>ctdb.conf</refentrytitle>
+      <manvolnum>5</manvolnum></citerefentry>) residing in shared
+      storage (usually) on a cluster filesystem.  To support a
+      cluster lock the cluster filesystem must support lock
+      coherence.  See
+      <citerefentry><refentrytitle>ping_pong</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry> for more details.
+    </para>
+
+    <para>
+      The cluster lock can also be implemented using an arbitrary
+      cluster mutex helper (or call-out).  This is indicated by using
+      an exclamation point ('!') as the first character of the
+      <parameter>cluster lock</parameter> parameter.  For example, a
+      value of <command>!/usr/local/bin/myhelper cluster</command>
+      would run the given helper with the specified arguments.  The
+      helper will continue to run as long as it holds its mutex.  See
+      <filename>ctdb/doc/cluster_mutex_helper.txt</filename> in the
+      source tree, and related code, for clues about writing helpers.
+    </para>
+
+    <para>
+      When a file is specified for the <parameter>cluster
+      lock</parameter> parameter (i.e. no leading '!') the file lock
+      is implemented by a default helper
+      (<command>/usr/local/libexec/ctdb/ctdb_mutex_fcntl_helper</command>).
+      This helper has arguments as follows:
+
+      <!-- cmdsynopsis would not require long line but does not work :-( -->
+      <synopsis>
+<command>ctdb_mutex_fcntl_helper</command> <parameter>FILE</parameter> <optional><parameter>RECHECK-INTERVAL</parameter></optional>
+      </synopsis>
+
+      <command>ctdb_mutex_fcntl_helper</command> will take a lock on
+      FILE and then check every RECHECK-INTERVAL seconds to ensure
+      that FILE still exists and that its inode number is unchanged
+      from when the lock was taken.  The default value for
+      RECHECK-INTERVAL is 5.
+    </para>
+
+    <para>
+      CTDB does sanity checks to ensure that the cluster lock is held
+      as expected.
+    </para>
+  </refsect1>
+
+  <refsect1>
+    <title>Private vs Public addresses</title>
+
+    <para>
+      Each node in a CTDB cluster has multiple IP addresses assigned
+      to it:
+
+      <itemizedlist>
+	<listitem>
+	  <para>
+	    A single private IP address that is used for communication
+	    between nodes.
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    One or more public IP addresses that are used to provide
+	    NAS or other services.
+	  </para>
+	</listitem>
+      </itemizedlist>
+    </para>
+
+    <refsect2>
+      <title>Private address</title>
+
+      <para>
+        Each node is configured with a unique, permanently assigned
+        private address.  This address is configured by the operating
+        system.  This address uniquely identifies a physical node in
+        the cluster and is the address that CTDB daemons will use to
+        communicate with the CTDB daemons on other nodes.
+      </para>
+
+      <para>
+	Private addresses are listed in the file
+	<filename>/usr/local/etc/ctdb/nodes</filename>).  This file
+	contains the list of private addresses for all nodes in the
+	cluster, one per line. This file must be the same on all nodes
+	in the cluster.
+      </para>
+
+      <para>
+	Some users like to put this configuration file in their
+	cluster filesystem.  A symbolic link should be used in this
+	case.
+      </para>
+
+      <para>
+	Private addresses should not be used by clients to connect to
+	services provided by the cluster.
+      </para>
+      <para>
+        It is strongly recommended that the private addresses are
+        configured on a private network that is separate from client
+        networks.  This is because the CTDB protocol is both
+        unauthenticated and unencrypted.  If clients share the private
+        network then steps need to be taken to stop injection of
+        packets to relevant ports on the private addresses.  It is
+        also likely that CTDB protocol traffic between nodes could
+        leak sensitive information if it can be intercepted.
+      </para>
+
+      <para>
+	Example <filename>/usr/local/etc/ctdb/nodes</filename> for a four node
+	cluster:
+      </para>
+      <screen format="linespecific">
+192.168.1.1
+192.168.1.2
+192.168.1.3
+192.168.1.4
+      </screen>
+    </refsect2>
+
+    <refsect2>
+      <title>Public addresses</title>
+
+      <para>
+	Public addresses are used to provide services to clients.
+	Public addresses are not configured at the operating system
+	level and are not permanently associated with a particular
+	node.  Instead, they are managed by CTDB and are assigned to
+	interfaces on physical nodes at runtime.
+      </para>
+      <para>
+        The CTDB cluster will assign/reassign these public addresses
+        across the available healthy nodes in the cluster. When one
+        node fails, its public addresses will be taken over by one or
+        more other nodes in the cluster.  This ensures that services
+        provided by all public addresses are always available to
+        clients, as long as there are nodes available capable of
+        hosting this address.
+      </para>
+
+      <para>
+	The public address configuration is stored in
+	<filename>/usr/local/etc/ctdb/public_addresses</filename> on
+	each node.  This file contains a list of the public addresses
+	that the node is capable of hosting, one per line.  Each entry
+	also contains the netmask and the interface to which the
+	address should be assigned.  If this file is missing then no
+	public addresses are configured.
+      </para>
+
+      <para>
+	Some users who have the same public addresses on all nodes
+	like to put this configuration file in their cluster
+	filesystem.  A symbolic link should be used in this case.
+      </para>
+
+      <para>
+	Example <filename>/usr/local/etc/ctdb/public_addresses</filename> for a
+	node that can host 4 public addresses, on 2 different
+	interfaces:
+      </para>
+      <screen format="linespecific">
+10.1.1.1/24 eth1
+10.1.1.2/24 eth1
+10.1.2.1/24 eth2
+10.1.2.2/24 eth2
+      </screen>
+
+      <para>
+	In many cases the public addresses file will be the same on
+	all nodes.  However, it is possible to use different public
+	address configurations on different nodes.
+      </para>
+
+      <para>
+	Example: 4 nodes partitioned into two subgroups:
+      </para>
+      <screen format="linespecific">
+Node 0:/usr/local/etc/ctdb/public_addresses
+	10.1.1.1/24 eth1
+	10.1.1.2/24 eth1
+
+Node 1:/usr/local/etc/ctdb/public_addresses
+	10.1.1.1/24 eth1
+	10.1.1.2/24 eth1
+
+Node 2:/usr/local/etc/ctdb/public_addresses
+	10.1.2.1/24 eth2
+	10.1.2.2/24 eth2
+
+Node 3:/usr/local/etc/ctdb/public_addresses
+	10.1.2.1/24 eth2
+	10.1.2.2/24 eth2
+      </screen>
+      <para>
+	In this example nodes 0 and 1 host two public addresses on the
+	10.1.1.x network while nodes 2 and 3 host two public addresses
+	for the 10.1.2.x network.
+      </para>
+      <para>
+	Public address 10.1.1.1 can be hosted by either of nodes 0 or
+	1 and will be available to clients as long as at least one of
+	these two nodes are available.
+      </para>
+      <para>
+	If both nodes 0 and 1 become unavailable then public address
+	10.1.1.1 also becomes unavailable. 10.1.1.1 can not be failed
+	over to nodes 2 or 3 since these nodes do not have this public
+	address configured.
+      </para>
+      <para>
+        The <command>ctdb ip</command> command can be used to view the
+        current assignment of public addresses to physical nodes.
+      </para>
+    </refsect2>
+  </refsect1>
+
+
+  <refsect1>
+    <title>Node status</title>
+
+    <para>
+      The current status of each node in the cluster can be viewed by the 
+      <command>ctdb status</command> command.
+    </para>
+
+    <para>
+      A node can be in one of the following states:
+    </para>
+
+    <variablelist>
+      <varlistentry>
+	<term>OK</term>
+	<listitem>
+	  <para>
+	    This node is healthy and fully functional.  It hosts public
+	    addresses to provide services.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>DISCONNECTED</term>
+	<listitem>
+	  <para>
+	    This node is not reachable by other nodes via the private
+	    network.  It is not currently participating in the cluster.
+	    It <emphasis>does not</emphasis> host public addresses to
+	    provide services.  It might be shut down.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>DISABLED</term>
+	<listitem>
+	  <para>
+	    This node has been administratively disabled. This node is
+	    partially functional and participates in the cluster.
+	    However, it <emphasis>does not</emphasis> host public
+	    addresses to provide services.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>UNHEALTHY</term>
+	<listitem>
+	  <para>
+	    A service provided by this node has failed a health check
+	    and should be investigated.  This node is partially
+	    functional and participates in the cluster.  However, it
+	    <emphasis>does not</emphasis> host public addresses to
+	    provide services.  Unhealthy nodes should be investigated
+	    and may require an administrative action to rectify.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>BANNED</term>
+	<listitem>
+	  <para>
+	    CTDB is not behaving as designed on this node.  For example,
+	    it may have failed too many recovery attempts.  Such nodes
+	    are banned from participating in the cluster for a
+	    configurable time period before they attempt to rejoin the
+	    cluster.  A banned node <emphasis>does not</emphasis> host
+	    public addresses to provide services.  All banned nodes
+	    should be investigated and may require an administrative
+	    action to rectify.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>STOPPED</term>
+	<listitem>
+	  <para>
+	    This node has been administratively exclude from the
+	    cluster.  A stopped node does no participate in the cluster
+	    and <emphasis>does not</emphasis> host public addresses to
+	    provide services.  This state can be used while performing
+	    maintenance on a node.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>PARTIALLYONLINE</term>
+	<listitem>
+	  <para>
+	    A node that is partially online participates in a cluster
+	    like a healthy (OK) node.  Some interfaces to serve public
+	    addresses are down, but at least one interface is up.  See
+	    also <command>ctdb ifaces</command>.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+    </variablelist>
+  </refsect1>
+
+  <refsect1>
+    <title>CAPABILITIES</title>
+
+    <para>
+      Cluster nodes can have several different capabilities enabled.
+      These are listed below.
+    </para>
+
+    <variablelist>
+
+      <varlistentry>
+	<term>LEADER</term>
+	<listitem>
+	  <para>
+	    Indicates that a node can become the CTDB cluster leader.
+	    The current leader is decided via an
+	    election held by all active nodes with this capability.
+	  </para>
+	  <para>
+	    Default is YES.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+      <varlistentry>
+	<term>LMASTER</term>
+	<listitem>
+	  <para>
+	    Indicates that a node can be the location master (LMASTER)
+	    for database records.  The LMASTER always knows which node
+	    has the latest copy of a record in a volatile database.
+	  </para>
+	  <para>
+	    Default is YES.
+	  </para>
+	</listitem>
+      </varlistentry>
+
+    </variablelist>
+
+    <para>
+      The LEADER and LMASTER capabilities can be disabled when CTDB
+      is used to create a cluster spanning across WAN links. In this
+      case CTDB acts as a WAN accelerator.
+    </para>
+
+  </refsect1>
+
+  <refsect1>
+    <title>LVS</title>
+
+    <para>
+      LVS is a mode where CTDB presents one single IP address for the
+      entire cluster. This is an alternative to using public IP
+      addresses and round-robin DNS to loadbalance clients across the
+      cluster.
+    </para>
+
+    <para>
+      This is similar to using a layer-4 loadbalancing switch but with
+      some restrictions.
+    </para>
+
+    <para>
+      One extra LVS public address is assigned on the public network
+      to each LVS group.  Each LVS group is a set of nodes in the
+      cluster that presents the same LVS address public address to the
+      outside world.  Normally there would only be one LVS group
+      spanning an entire cluster, but in situations where one CTDB
+      cluster spans multiple physical sites it might be useful to have
+      one LVS group for each site.  There can be multiple LVS groups
+      in a cluster but each node can only be member of one LVS group.
+    </para>
+
+    <para>
+      Client access to the cluster is load-balanced across the HEALTHY
+      nodes in an LVS group.  If no HEALTHY nodes exists then all
+      nodes in the group are used, regardless of health status.  CTDB
+      will, however never load-balance LVS traffic to nodes that are
+      BANNED, STOPPED, DISABLED or DISCONNECTED.  The <command>ctdb
+      lvs</command> command is used to show which nodes are currently
+      load-balanced across.
+    </para>
+
+    <para>
+      In each LVS group, one of the nodes is selected by CTDB to be
+      the LVS leader.  This node receives all traffic from clients
+      coming in to the LVS public address and multiplexes it across
+      the internal network to one of the nodes that LVS is using.
+      When responding to the client, that node will send the data back
+      directly to the client, bypassing the LVS leader node.  The
+      command <command>ctdb lvs leader</command> will show which node
+      is the current LVS leader.
+    </para>
+
+    <para>
+      The path used for a client I/O is:
+      <orderedlist>
+	<listitem>
+	  <para>
+	    Client sends request packet to LVS leader.
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    LVS leader passes the request on to one node across the
+	    internal network.
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    Selected node processes the request.
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    Node responds back to client.
+	  </para>
+	</listitem>
+      </orderedlist>
+    </para>
+
+    <para>
+      This means that all incoming traffic to the cluster will pass
+      through one physical node, which limits scalability. You can
+      send more data to the LVS address that one physical node can
+      multiplex. This means that you should not use LVS if your I/O
+      pattern is write-intensive since you will be limited in the
+      available network bandwidth that node can handle.  LVS does work
+      very well for read-intensive workloads where only smallish READ
+      requests are going through the LVS leader bottleneck and the
+      majority of the traffic volume (the data in the read replies)
+      goes straight from the processing node back to the clients. For
+      read-intensive i/o patterns you can achieve very high throughput
+      rates in this mode.
+    </para>
+
+    <para>
+      Note: you can use LVS and public addresses at the same time.
+    </para>
+
+    <para>
+      If you use LVS, you must have a permanent address configured for
+      the public interface on each node. This address must be routable
+      and the cluster nodes must be configured so that all traffic
+      back to client hosts are routed through this interface. This is
+      also required in order to allow samba/winbind on the node to
+      talk to the domain controller.  This LVS IP address can not be
+      used to initiate outgoing traffic.
+    </para>
+    <para>
+      Make sure that the domain controller and the clients are
+      reachable from a node <emphasis>before</emphasis> you enable
+      LVS.  Also ensure that outgoing traffic to these hosts is routed
+      out through the configured public interface.
+    </para>
+
+    <refsect2>
+      <title>Configuration</title>
+
+      <para>
+	To activate LVS on a CTDB node you must specify the
+	<varname>CTDB_LVS_PUBLIC_IFACE</varname>,
+	<varname>CTDB_LVS_PUBLIC_IP</varname> and
+	<varname>CTDB_LVS_NODES</varname> configuration variables.
+	<varname>CTDB_LVS_NODES</varname> specifies a file containing
+	the private address of all nodes in the current node's LVS
+	group.
+      </para>
+
+      <para>
+	Example:
+	<screen format="linespecific">
+CTDB_LVS_PUBLIC_IFACE=eth1
+CTDB_LVS_PUBLIC_IP=10.1.1.237
+CTDB_LVS_NODES=/usr/local/etc/ctdb/lvs_nodes
+	</screen>
+      </para>
+
+      <para>
+	Example <filename>/usr/local/etc/ctdb/lvs_nodes</filename>:
+      </para>
+      <screen format="linespecific">
+192.168.1.2
+192.168.1.3
+192.168.1.4
+      </screen>
+
+      <para>
+	Normally any node in an LVS group can act as the LVS leader.
+	Nodes that are highly loaded due to other demands maybe
+	flagged with the "follower-only" option in the
+	<varname>CTDB_LVS_NODES</varname> file to limit the LVS
+	functionality of those nodes.
+      </para>
+
+      <para>
+	LVS nodes file that excludes 192.168.1.4 from being
+	the LVS leader node:
+      </para>
+      <screen format="linespecific">
+192.168.1.2
+192.168.1.3
+192.168.1.4 follower-only
+      </screen>
+
+    </refsect2>
+  </refsect1>
+
+  <refsect1>
+    <title>TRACKING AND RESETTING TCP CONNECTIONS</title>
+
+    <para>
+      CTDB tracks TCP connections from clients to public IP addresses,
+      on known ports.  When an IP address moves from one node to
+      another, all existing TCP connections to that IP address are
+      reset.  The node taking over this IP address will also send
+      gratuitous ARPs (for IPv4, or neighbour advertisement, for
+      IPv6).  This allows clients to reconnect quickly, rather than
+      waiting for TCP timeouts, which can be very long.
+    </para>
+
+    <para>
+      It is important that established TCP connections do not survive
+      a release and take of a public IP address on the same node.
+      Such connections can get out of sync with sequence and ACK
+      numbers, potentially causing a disruptive ACK storm.
+    </para>
+
+  </refsect1>
+
+  <refsect1>
+    <title>NAT GATEWAY</title>
+
+    <para>
+      NAT gateway (NATGW) is an optional feature that is used to
+      configure fallback routing for nodes.  This allows cluster nodes
+      to connect to external services (e.g. DNS, AD, NIS and LDAP)
+      when they do not host any public addresses (e.g. when they are
+      unhealthy).
+    </para>
+    <para>
+      This also applies to node startup because CTDB marks nodes as
+      UNHEALTHY until they have passed a "monitor" event.  In this
+      context, NAT gateway helps to avoid a "chicken and egg"
+      situation where a node needs to access an external service to
+      become healthy.
+    </para>
+    <para>
+      Another way of solving this type of problem is to assign an
+      extra static IP address to a public interface on every node.
+      This is simpler but it uses an extra IP address per node, while
+      NAT gateway generally uses only one extra IP address.
+    </para>
+
+    <refsect2>
+      <title>Operation</title>
+
+      <para>
+	One extra NATGW public address is assigned on the public
+	network to each NATGW group.  Each NATGW group is a set of
+	nodes in the cluster that shares the same NATGW address to
+	talk to the outside world.  Normally there would only be one
+	NATGW group spanning an entire cluster, but in situations
+	where one CTDB cluster spans multiple physical sites it might
+	be useful to have one NATGW group for each site.
+      </para>
+      <para>
+	There can be multiple NATGW groups in a cluster but each node
+	can only be member of one NATGW group.
+      </para>
+      <para>
+	In each NATGW group, one of the nodes is selected by CTDB to
+	be the NATGW leader and the other nodes are consider to be
+	NATGW followers.  NATGW followers establish a fallback default route
+	to the NATGW leader via the private network.  When a NATGW
+	follower hosts no public IP addresses then it will use this route
+	for outbound connections.  The NATGW leader hosts the NATGW
+	public IP address and routes outgoing connections from
+	follower nodes via this IP address.  It also establishes a
+	fallback default route.
+      </para>
+    </refsect2>
+
+    <refsect2>
+      <title>Configuration</title>
+
+      <para>
+	NATGW is usually configured similar to the following example configuration:
+      </para>
+      <screen format="linespecific">
+CTDB_NATGW_NODES=/usr/local/etc/ctdb/natgw_nodes
+CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
+CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
+CTDB_NATGW_PUBLIC_IFACE=eth0
+CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
+      </screen>
+
+      <para>
+	Normally any node in a NATGW group can act as the NATGW
+	leader.  Some configurations may have special nodes that lack
+	connectivity to a public network.  In such cases, those nodes
+	can be flagged with the "follower-only" option in the
+	<varname>CTDB_NATGW_NODES</varname> file to limit the NATGW
+	functionality of those nodes.
+      </para>
+
+      <para>
+	See the <citetitle>NAT GATEWAY</citetitle> section in
+	<citerefentry><refentrytitle>ctdb-script.options</refentrytitle>
+	<manvolnum>5</manvolnum></citerefentry> for more details of
+	NATGW configuration.
+      </para>
+    </refsect2>
+
+
+    <refsect2>
+      <title>Implementation details</title>
+
+      <para>
+	When the NATGW functionality is used, one of the nodes is
+	selected to act as a NAT gateway for all the other nodes in
+	the group when they need to communicate with the external
+	services.  The NATGW leader is selected to be a node that is
+	most likely to have usable networks.
+      </para>
+
+      <para>
+	The NATGW leader hosts the NATGW public IP address
+	<varname>CTDB_NATGW_PUBLIC_IP</varname> on the configured public
+	interfaces <varname>CTDB_NATGW_PUBLIC_IFACE</varname> and acts as
+	a router, masquerading outgoing connections from follower nodes
+	via this IP address.  If
+	<varname>CTDB_NATGW_DEFAULT_GATEWAY</varname> is set then it
+	also establishes a fallback default route to the configured
+	this gateway with a metric of 10.  A metric 10 route is used
+	so it can co-exist with other default routes that may be
+	available.
+      </para>
+
+      <para>
+	A NATGW follower establishes its fallback default route to the
+	NATGW leader via the private network
+	<varname>CTDB_NATGW_PRIVATE_NETWORK</varname>with a metric of 10.
+	This route is used for outbound connections when no other
+	default route is available because the node hosts no public
+	addresses.  A metric 10 routes is used so that it can co-exist
+	with other default routes that may be available when the node
+	is hosting public addresses.
+      </para>
+
+      <para>
+	<varname>CTDB_NATGW_STATIC_ROUTES</varname> can be used to
+	have NATGW create more specific routes instead of just default
+	routes.
+      </para>
+
+      <para>
+	This is implemented in the <filename>11.natgw</filename>
+	eventscript.  Please see the eventscript file and the
+	<citetitle>NAT GATEWAY</citetitle> section in
+	<citerefentry><refentrytitle>ctdb-script.options</refentrytitle>
+	<manvolnum>5</manvolnum></citerefentry> for more details.
+      </para>
+
+    </refsect2>
+  </refsect1>
+
+  <refsect1>
+    <title>POLICY ROUTING</title>
+
+    <para>
+      Policy routing is an optional CTDB feature to support complex
+      network topologies.  Public addresses may be spread across
+      several different networks (or VLANs) and it may not be possible
+      to route packets from these public addresses via the system's
+      default route.  Therefore, CTDB has support for policy routing
+      via the <filename>13.per_ip_routing</filename> eventscript.
+      This allows routing to be specified for packets sourced from
+      each public address.  The routes are added and removed as CTDB
+      moves public addresses between nodes.
+    </para>
+
+    <refsect2>
+      <title>Configuration variables</title>
+
+      <para>
+	There are 4 configuration variables related to policy routing:
+	<varname>CTDB_PER_IP_ROUTING_CONF</varname>,
+	<varname>CTDB_PER_IP_ROUTING_RULE_PREF</varname>,
+	<varname>CTDB_PER_IP_ROUTING_TABLE_ID_LOW</varname>,
+	<varname>CTDB_PER_IP_ROUTING_TABLE_ID_HIGH</varname>.  See the
+	<citetitle>POLICY ROUTING</citetitle> section in
+	<citerefentry><refentrytitle>ctdb-script.options</refentrytitle>
+	<manvolnum>5</manvolnum></citerefentry> for more details.
+      </para>
+    </refsect2>
+
+    <refsect2>
+      <title>Configuration</title>
+
+      <para>
+	The format of each line of
+	<varname>CTDB_PER_IP_ROUTING_CONF</varname> is:
+      </para>
+      
+      <screen>
+&lt;public_address&gt; &lt;network&gt; [ &lt;gateway&gt; ]
+      </screen>
+
+      <para>
+	Leading whitespace is ignored and arbitrary whitespace may be
+	used as a separator.  Lines that have a "public address" item
+	that doesn't match an actual public address are ignored.  This
+	means that comment lines can be added using a leading
+	character such as '#', since this will never match an IP
+	address.
+      </para>
+
+      <para>
+	A line without a gateway indicates a link local route.
+      </para>
+
+      <para>
+	For example, consider the configuration line:
+      </para>
+
+      <screen>
+  192.168.1.99	192.168.1.0/24
+      </screen>
+
+      <para>
+	If the corresponding public_addresses line is:
+      </para>
+
+      <screen>
+  192.168.1.99/24     eth2,eth3
+      </screen>
+
+      <para>
+	<varname>CTDB_PER_IP_ROUTING_RULE_PREF</varname> is 100, and
+	CTDB adds the address to eth2 then the following routing
+	information is added:
+      </para>
+
+      <screen>
+  ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
+  ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
+      </screen>
+
+      <para>  
+	This causes traffic from 192.168.1.99 to 192.168.1.0/24 go via
+	eth2.
+      </para>
+
+      <para>
+	The <command>ip rule</command> command will show (something
+	like - depending on other public addresses and other routes on
+	the system):
+      </para>
+
+      <screen>
+  0:		from all lookup local 
+  100:		from 192.168.1.99 lookup ctdb.192.168.1.99
+  32766:	from all lookup main 
+  32767:	from all lookup default 
+      </screen>
+
+      <para>
+	<command>ip route show table ctdb.192.168.1.99</command> will show:
+      </para>
+
+      <screen>
+  192.168.1.0/24 dev eth2 scope link
+      </screen>
+
+      <para>
+	The usual use for a line containing a gateway is to add a
+	default route corresponding to a particular source address.
+	Consider this line of configuration:
+      </para>
+
+      <screen>
+  192.168.1.99	0.0.0.0/0	192.168.1.1
+      </screen>
+
+      <para>
+	In the situation described above this will cause an extra
+	routing command to be executed:
+      </para>
+
+      <screen>
+  ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
+      </screen>
+
+      <para>
+	With both configuration lines, <command>ip route show table
+	ctdb.192.168.1.99</command> will show:
+      </para>
+
+      <screen>
+  192.168.1.0/24 dev eth2 scope link 
+  default via 192.168.1.1 dev eth2 
+      </screen>
+    </refsect2>
+
+    <refsect2>
+      <title>Sample configuration</title>
+
+      <para>
+	Here is a more complete example configuration.
+      </para>
+
+      <screen>
+/usr/local/etc/ctdb/public_addresses:
+
+  192.168.1.98	eth2,eth3
+  192.168.1.99	eth2,eth3
+
+/usr/local/etc/ctdb/policy_routing:
+
+  192.168.1.98 192.168.1.0/24
+  192.168.1.98 192.168.200.0/24	192.168.1.254
+  192.168.1.98 0.0.0.0/0 	192.168.1.1
+  192.168.1.99 192.168.1.0/24
+  192.168.1.99 192.168.200.0/24	192.168.1.254
+  192.168.1.99 0.0.0.0/0 	192.168.1.1
+      </screen>
+
+      <para>
+	The routes local packets as expected, the default route is as
+	previously discussed, but packets to 192.168.200.0/24 are
+	routed via the alternate gateway 192.168.1.254.
+      </para>
+
+    </refsect2>
+  </refsect1>
+
+  <refsect1>
+    <title>NOTIFICATIONS</title>
+
+    <para>
+      When certain state changes occur in CTDB, it can be configured
+      to perform arbitrary actions via notifications.  For example,
+      sending SNMP traps or emails when a node becomes unhealthy or
+      similar.
+    </para>
+
+    <para>
+      The notification mechanism runs all executable files ending in
+      ".script" in
+      <filename>/usr/local/etc/ctdb/events/notification/</filename>,
+      ignoring any failures and continuing to run all files.
+    </para>
+
+    <para>
+      CTDB currently generates notifications after CTDB changes to
+      these states:
+    </para>
+
+    <simplelist>
+      <member>init</member>
+      <member>setup</member>
+      <member>startup</member>
+      <member>healthy</member>
+      <member>unhealthy</member>
+    </simplelist>
+
+  </refsect1>
+
+  <refsect1>
+    <title>LOG LEVELS</title>
+
+    <para>
+      Valid log levels, in increasing order of verbosity, are:
+    </para>
+
+    <simplelist>
+      <member>ERROR</member>
+      <member>WARNING</member>
+      <member>NOTICE</member>
+      <member>INFO</member>
+      <member>DEBUG</member>
+    </simplelist>
+  </refsect1>
+
+
+  <refsect1>
+    <title>REMOTE CLUSTER NODES</title>
+    <para>
+It is possible to have a CTDB cluster that spans across a WAN link. 
+For example where you have a CTDB cluster in your datacentre but you also
+want to have one additional CTDB node located at a remote branch site.
+This is similar to how a WAN accelerator works but with the difference 
+that while a WAN-accelerator often acts as a Proxy or a MitM, in 
+the ctdb remote cluster node configuration the Samba instance at the remote site
+IS the genuine server, not a proxy and not a MitM, and thus provides 100%
+correct CIFS semantics to clients.
+    </para>
+
+    <para>
+	See the cluster as one single multihomed samba server where one of
+	the NICs (the remote node) is very far away.
+    </para>
+
+    <para>
+	NOTE: This does require that the cluster filesystem you use can cope
+	with WAN-link latencies. Not all cluster filesystems can handle
+	WAN-link latencies! Whether this will provide very good WAN-accelerator
+	performance or it will perform very poorly depends entirely
+	on how optimized your cluster filesystem is in handling high latency
+	for data and metadata operations.
+    </para>
+
+    <para>
+	To activate a node as being a remote cluster node you need to
+	set the following two parameters in
+	/usr/local/etc/ctdb/ctdb.conf for the remote node:
+        <screen format="linespecific">
+[legacy]
+  lmaster capability = false
+  leader capability = false
+	</screen>
+    </para>
+
+    <para>
+	Verify with the command "ctdb getcapabilities" that that node no longer
+	has the leader or the lmaster capabilities.
+    </para>
+
+  </refsect1>
+
+
+  <refsect1>
+    <title>SEE ALSO</title>
+
+    <para>
+      <citerefentry><refentrytitle>ctdb</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdbd</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb_diagnostics</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ltdbtool</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>onnode</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ping_pong</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb.conf</refentrytitle>
+      <manvolnum>5</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb-script.options</refentrytitle>
+      <manvolnum>5</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb.sysconfig</refentrytitle>
+      <manvolnum>5</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb-statistics</refentrytitle>
+      <manvolnum>7</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdb-tunables</refentrytitle>
+      <manvolnum>7</manvolnum></citerefentry>,
+
+      <ulink url="https://wiki.samba.org/index.php/CTDB_and_Clustered_Samba"/>,
+
+      <ulink url="http://ctdb.samba.org/"/>
+    </para>
+  </refsect1>
+
+  <refentryinfo>
+    <author>
+      <contrib>
+	This documentation was written by
+	Ronnie Sahlberg,
+	Amitay Isaacs,
+	Martin Schwenke
+      </contrib>
+    </author>
+
+    <copyright>
+      <year>2007</year>
+      <holder>Andrew Tridgell</holder>
+      <holder>Ronnie Sahlberg</holder>
+    </copyright>
+    <legalnotice>
+      <para>
+	This program is free software; you can redistribute it and/or
+	modify it under the terms of the GNU General Public License as
+	published by the Free Software Foundation; either version 3 of
+	the License, or (at your option) any later version.
+      </para>
+      <para>
+	This program is distributed in the hope that it will be
+	useful, but WITHOUT ANY WARRANTY; without even the implied
+	warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+	PURPOSE.  See the GNU General Public License for more details.
+      </para>
+      <para>
+	You should have received a copy of the GNU General Public
+	License along with this program; if not, see
+	<ulink url="http://www.gnu.org/licenses"/>.
+      </para>
+    </legalnotice>
+  </refentryinfo>
+
+</refentry>