diff options
Diffstat (limited to '')
-rw-r--r-- | tools/README.sfex | 336 |
1 files changed, 336 insertions, 0 deletions
diff --git a/tools/README.sfex b/tools/README.sfex new file mode 100644 index 0000000..ff850d1 --- /dev/null +++ b/tools/README.sfex @@ -0,0 +1,336 @@ +Shared Disk File EXclusiveness Control Program version 1.3 +OCF Resource Agent for Heartbeat v2 +FOR USE IN LINUX 2.6 KERNEL OPERATING SYSTEM ENVIRONMENTS ONLY. + +Copyright (c) 2007 NIPPON TELEGRAPH AND TELEPHONE CORPORATION + +Note: Before using this information and the product it supports, +read the general information in section 4.0 "Trademarks and Notices" +in this document. + +Last Update Date: 10/10/2007 + + +======================================================================= +CONTENTS +-------- +1.0 Overview +2.0 Installation and Setup Instructions +3.0 Configuration Information +4.0 Trademarks and Notices +5.0 Disclaimer + +======================================================================= + +1.0 Overview +-------------- +Shared Disk File EXclusiveness Control Program, called "SF-EX" for short, +can prevent a destruction of data on shared disk file system due to +Split-Brain. + +======================================================================= + +1.1 Limitations +--------------------- +This program is tested on the following environment. + + Heartbeat 2.1.2-2 + Red Hat Enterprise Linux ES release 4 (Nahant Update 5) EM64T + +======================================================================= + +2.0 Installation and Setup Instructions +----------------------------------------- + + 2.1.1 Prerequisites + SF-EX is released as a source-code package in the format + of a gunzip compressed tar file. To unpack the source + package, type the following command in the Linux console + window: + + $ tar zxf sfex-1.3.tar.gz + + The source files will uncompress to the "sf-ex-x.x" + directory. + + 2.1.3 Build and Installation + + Change unpacked directory first. + + $ cd sfex-1.3 + + Type the following command in the Linux console window: + Press Enter after each command. + + $ ./configure + $ make + $ su + (you need root's password) + # make install + + "make install" will copy the modules to /usr/lib64/heartbeat + + NOTE: "make install" should be done on all nodes + which Heartbeat would run. + + NOTE: in case of 32bit system + If you want to run SF-EX on 32bit system, the modules + should be setup on /usr/lib/heartbeat. + Use the following configure option on 32bit system. + + $ ./configure --with-lib-dir=/usr/lib/heartbeat + + 2.1.3 Initialization of a device + Before running SF-EX, one device should be initialized + as below. + + sfex_init [-b <blocksize>] [-n <numlocks>] <device> + + Example: + # /usr/lib/heartbeat/sfex_init -b 512 -n 10 /dev/sdb1 + + Initialized device is going to be used as a control area + for SF-EX. + See 3.2.2, if further information is necessary. + + 2.1.4 Access without O_DIRECT + If you are planning to access a device without using + O_DIRECT, the following option is available. + + Example: + $ ./configure -enable-directio=no + + Default value for --enable-directio is "yes". + +======================================================================= + +3.0 Configuration Information +----------------------------- + +3.1 Configuration Settings +-------------------------- + + 3.1.1 Edit your cib.xml + The following example shows a typical configuration + for SF-EX and Filesystem. + + 3.1.2 Example for cib.xml + + /dev/sda1 control area for SF-EX + /dev/sda2 Filesystem + +--- skip --- +<resources> + <group id="grp"> + <primitive id="prmEx" class="ocf" type="sfex" provider="heartbeat"> + <operations> + <op id="ex_start" name="start" timeout="180s" on_fail="fence"/> + <op id="ex_monitor" name="monitor" timeout="60s" on_fail="fence" interval="10s" /> + <op id="ex_stop" name="stop" timeout="60s" on_fail="fence"/> + </operations> + <instance_attributes id="atrEx"> + <attributes> + <nvpair id="dsk" name="device" value="/dev/sda1"/> + <nvpair id="idx" name="index" value="1"/> + <nvpair id="clt" name="collision_timeout" value="1"/> + <nvpair id="lct" name="lock_timeout" value="70"/> + <nvpair id="mnt" name="monitor_interval" value="10"/> + <nvpair id="fck" name="fsck" value="/sbin/fsck -p /dev/sdb2"/> + <nvpair id="fcm" name="fsck_mode" value="check"/> + <nvpair id="hlt" name="halt" value="/sbin/halt -f -n -p"/> + </attributes> + </instance_attributes> + </primitive> + <primitive id="prmFs" class="ocf" type="Filesystem" provider="heartbeat"> + <operations> + <op id="fs_start" name="start" timeout="60s" on_fail="fence"/> + <op id="fs_monitor" name="monitor" timeout="60s" on_fail="fence" interval="10s" /> + <op id="fs_stop" name="stop" timeout="60s" on_fail="fence"/> + </operations> + <instance_attributes id="atrFs"> + <attributes> + <nvpair id="dev" name="device" value="/dev/sdb2"/> + <nvpair id="dir" name="directory" value="/mnt/shared-disk"/> + <nvpair id="fst" name="fstype" value="ext3"/> + </attributes> + </instance_attributes> + </primitive> + </group> +</resources> +--- skip --- + + +3.2 Outline of each module +-------------------------- + 3.2.1 sfex + Resource Agent script for Heartbeat. + + 3.2.2 sfex_init + sfex_init [-b <blocksize>] [-n <numlocks>] <device> + + -b <blocksize> --- The size of the block is specified + by the number of bytes. In general, to prevent a partial + writing to the disk, the size of block is set to 512 + bytes etc. + Note a set value because this value is used also for + the alignment adjustment in the input-output buffer in + the program when direct I/O is used(When you specify + --enable-directio option for configure script). + (In Linux kernel 2.6, "direct I/O " does not work if this + value is not a multiple of 512.) Default is 512 bytes. + + -n <numlocks> --- The number of storing lock data is + specified by integer of one or more. When you want to + control two or more resources by one meta-data, you set + the value of two or more to numlocks. A necessary disk + area for meta data are (blocksize*(1+numlocks))bytes. + Default is 1. + + <device> --- This is file path which stored mata-data. + It is usually expressed in "/dev/...", because it is + partition on the shared disk. + + exit code --- + 0 - Normal end. + 3 - Error occurs while processing it. + The content of the error is displayed into stderr. + 4 - The mistake is found in the command line parameter. + + 3.2.3 sfex_stat + sfex_stat [-i <index>] <device> + + -i <index> --- The index is number of the resource that + display the lock. This number is specified by the integer + of one or more. When two or more resources are exclusively + controlled by one meta-data, this option is used. + Default is 1. + + <device> --- This is file path which stored mata-data. + It is usually expressed in "/dev/...", because it is + partition on the shared disk. + + exit code --- + 0 - Normal end. Own node is holding lock. + 2 - Normal end. Own node does not hold a lock. + 3 - Error occurs while processing it. + The content of the error is displayed into stderr. + 4 - The mistake is found in the command line parameter. + + 3.2.4 sfex_lock + sfex_lock + [-i <index>] + [-c <collision_timeout>] + [-t <lock_timeout>] + <device> + + -i <index> --- The index is number of the resource that + acquire the lock. This number is specified by the integer + of one or more. When two or more resources are exclusively + controlled by one meta-data, this option is used. + Default is 1. + + -c <collision_timeout> --- The waiting time to detect + the collision of the lock with other nodes is specified. + Time that is very longer than "once synchronous read from + device which stored meta-data + once + synchronous write" is specified usually. Default is 1 second. + This value need not be changed by using this option usually. + Because it is not thought to take one second or more to + synchronous read and write. + + -t <lock_timeout> --- This specifies the validity term + of lock. The unit is a second. This timer prevents the + resource being locked for a long time when node crashes + with the lock acquired. Therefore, the lock holding node + must update lock data at intervals that are shorter than + this timer. The sfex_update command is used for updating + lock. Default is 60 seconds. + + <device> --- This is file path which stored mata-data. + It is usually expressed in "/dev/...", because it is + partition on the shared disk. + + exit code --- + 0 - Acquire a lock from unlock status. + 1 - Acquire a lock from lock timeout status. + 2 - Lock acquisition failed. + 3 - Error occurs while processing it. The content of the + error is displayed into stderr. + 4 - The mistake is found in the command line parameter. + + 3.2.5 sfex_unlock + sfex_unlock [-i <index>] <device> + + -i <index> --- The index is number of the resource that + releases the lock. This number is specified by the integer + of one or more. When two or more resources are exclusively + controlled by one meta-data, this option is used. + Default is 1. + + <device> --- This is file path which stored mata-data. + It is usually expressed in "/dev/...", because it is + partition on the shared disk. + + exit code --- + 0 - Lock release success. + 1 - Lock release done already. + The lock has already been acquired by other nodes. + 3 - Error occurs while processing it. + The content of the error is displayed into stderr. + 4 - The mistake is found in the command line parameter. + + 3.2.6 sfex_update + sfex_update [-i <index>] <device> + + -i <index> --- The index is number of the resource that + update the lock. This number is specified by the integer + of one or more. When two or more resources are exclusively + controlled by one meta-data, this option is used. + Default is 1. + + <device> --- This is file path which stored mata-data. + It is usually expressed in "/dev/...", because it is + partition on the shared disk. + + exit code --- + 0 - Lock update success. + 2 - Lock update failed. + The lock is acquired by other nodes. + 3 - Error occurs while processing it. + The content of the error is displayed into stderr. + 4 - The mistake is found in the command line parameter. + +======================================================================= + +4.0 Trademarks and Notices +---------------------------- + + Heartbeat is a registered trademark of The High Availability + Linux Project. + + Linux is a registered trademark of Linus Torvalds. + + Other company, product, and service names may be + trademarks or service marks of others. + +======================================================================= + +5.0 Disclaimer +---------------- + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND + CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, + INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND + PARTICULARLY THE NON-INFRINGEMENT OF ANY THIRD PARTY'S + INTELLECTUAL PROPERTY RIGHTS ARE DISCLAIMED. IN NO EVENT + SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY + DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE + USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH + DAMAGE. + |