summaryrefslogtreecommitdiffstats
path: root/tools/README.sfex
blob: ff850d1f4c1b4fe3441330cf72abf5e0fea67a01 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
Shared Disk File EXclusiveness Control Program version 1.3
OCF Resource Agent for Heartbeat v2
FOR USE IN LINUX 2.6 KERNEL OPERATING SYSTEM ENVIRONMENTS ONLY. 

Copyright (c) 2007 NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Note: Before using this information and the product it supports, 
read the general information in section 4.0 "Trademarks and Notices" 
in this document.

Last Update Date:  10/10/2007


=======================================================================
CONTENTS
--------
1.0  Overview
2.0  Installation and Setup Instructions
3.0  Configuration Information
4.0  Trademarks and Notices
5.0  Disclaimer

=======================================================================

1.0   Overview
--------------
Shared Disk File EXclusiveness Control Program, called "SF-EX" for short, 
can prevent a destruction of data on shared disk file system due to 
Split-Brain.

=======================================================================

1.1  Limitations
---------------------
This program is tested on the following environment.

	Heartbeat 2.1.2-2
	Red Hat Enterprise Linux ES release 4 (Nahant Update 5) EM64T

=======================================================================

2.0   Installation and Setup Instructions
-----------------------------------------

	2.1.1 Prerequisites
		SF-EX is released as a source-code package in the format
		of a gunzip compressed tar file. To unpack the source
		package, type the following command in the Linux console
		window:
		
		$ tar zxf sfex-1.3.tar.gz

		The source files will uncompress to the "sf-ex-x.x"
            	directory. 

	2.1.3 Build and Installation

		Change unpacked directory first.

		$ cd sfex-1.3

		Type the following command in the Linux console window:
		Press Enter after each command.
		
		$ ./configure
		$ make
		$ su
		(you need root's password)
		# make install

		"make install" will copy the modules to /usr/lib64/heartbeat

		NOTE: "make install" should be done on all nodes
		which Heartbeat would run.

		NOTE: in case of 32bit system
		If you want to run SF-EX on 32bit system, the modules
		should be setup on /usr/lib/heartbeat.
		Use the following configure option on 32bit system.

		$ ./configure --with-lib-dir=/usr/lib/heartbeat

	2.1.3 Initialization of a device
		Before running SF-EX, one device should be initialized
		as below.
		
		sfex_init [-b <blocksize>] [-n <numlocks>] <device>

		Example:
		# /usr/lib/heartbeat/sfex_init -b 512 -n 10 /dev/sdb1

		Initialized device is going to be used as a control area
		for SF-EX.
		See 3.2.2, if further information is necessary.

	2.1.4 Access without O_DIRECT
		If you are planning to access a device without using
		O_DIRECT, the following option is available.

		Example:
		$ ./configure -enable-directio=no

		Default value for --enable-directio is "yes".

=======================================================================

3.0 Configuration Information
-----------------------------

3.1 Configuration Settings
--------------------------

	3.1.1 Edit your cib.xml
		The following example shows a typical configuration
		for SF-EX and Filesystem.
		
	3.1.2 Example for cib.xml
		
		/dev/sda1	control area for SF-EX
		/dev/sda2	Filesystem

--- skip ---
<resources>
 <group id="grp">
  <primitive id="prmEx" class="ocf" type="sfex" provider="heartbeat">
   <operations>
    <op id="ex_start"   name="start"   timeout="180s" on_fail="fence"/>
    <op id="ex_monitor" name="monitor" timeout="60s"  on_fail="fence" interval="10s" />
    <op id="ex_stop"    name="stop"    timeout="60s"  on_fail="fence"/>
   </operations>
   <instance_attributes id="atrEx">
    <attributes>
     <nvpair id="dsk" name="device"            value="/dev/sda1"/>
     <nvpair id="idx" name="index"             value="1"/>
     <nvpair id="clt" name="collision_timeout" value="1"/>
     <nvpair id="lct" name="lock_timeout"      value="70"/>
     <nvpair id="mnt" name="monitor_interval"  value="10"/>
     <nvpair id="fck" name="fsck"              value="/sbin/fsck -p /dev/sdb2"/>
     <nvpair id="fcm" name="fsck_mode"         value="check"/>
     <nvpair id="hlt" name="halt"              value="/sbin/halt -f -n -p"/>
    </attributes>
   </instance_attributes>
  </primitive>
  <primitive id="prmFs" class="ocf" type="Filesystem" provider="heartbeat">
   <operations>
    <op id="fs_start"   name="start"   timeout="60s"  on_fail="fence"/>
    <op id="fs_monitor" name="monitor" timeout="60s"  on_fail="fence" interval="10s" />
    <op id="fs_stop"    name="stop"    timeout="60s"  on_fail="fence"/>
   </operations>
   <instance_attributes id="atrFs">
    <attributes>
     <nvpair id="dev"  name="device"           value="/dev/sdb2"/>
     <nvpair id="dir"  name="directory"        value="/mnt/shared-disk"/>
     <nvpair id="fst"  name="fstype"           value="ext3"/>
    </attributes>
   </instance_attributes>
  </primitive>
 </group>
</resources>
--- skip ---


3.2 Outline of each module
--------------------------
	3.2.1 sfex
		Resource Agent script for Heartbeat.

	3.2.2 sfex_init
		sfex_init [-b <blocksize>] [-n <numlocks>] <device>

		-b <blocksize> --- The size of the block is specified 
		by the number of bytes. In general, to prevent a partial 
		writing to the disk, the size of block is set to 512 
		bytes etc. 
		Note a set value because this value is used also for 
		the alignment adjustment in the input-output buffer in 
		the program when direct I/O is used(When you specify 
		 --enable-directio option for configure script). 
		(In Linux kernel 2.6, "direct I/O " does not work if this 
		value is not a multiple of 512.) Default is 512 bytes.

		-n <numlocks> --- The number of storing lock data is 
		specified by integer of one or more. When you want to 
		control two or more resources by one meta-data, you set 
		the value of two or more to numlocks. A necessary disk 
		area for meta data are (blocksize*(1+numlocks))bytes. 
		Default is 1.

		<device> --- This is file path which stored mata-data. 
		It is usually expressed in "/dev/...", because it is 
		partition on the shared disk.

		exit code --- 
		0 - Normal end. 
		3 - Error occurs while 	processing it. 
    		    The content of the error is displayed into stderr. 
		4 - The mistake is found in the command line parameter.

	3.2.3 sfex_stat
		sfex_stat [-i <index>] <device>

		-i <index> --- The index is number of the resource that 
		display the lock. This number is specified by the integer 
		of one or more. When two or more resources are exclusively 
		controlled by one meta-data, this option is used. 
		Default is 1.

		<device> --- This is file path which stored mata-data. 
		It is usually expressed in "/dev/...", because it is 
		partition on the shared disk.

		exit code --- 
		0 - Normal end. Own node is holding lock. 
		2 - Normal end. Own node does not hold a lock. 
		3 - Error occurs while processing it. 
		    The content of the error is displayed into stderr. 
		4 - The mistake is found in the command line parameter.

	3.2.4 sfex_lock
		sfex_lock 
			[-i <index>] 
			[-c <collision_timeout>] 
			[-t <lock_timeout>] 
			<device>

		-i <index> --- The index is number of the resource that 
		acquire the lock. This number is specified by the integer 
		of one or more. When two or more resources are exclusively 
		controlled by one meta-data, this option is used. 
		Default is 1.

		-c <collision_timeout> --- The waiting time to detect 
		the collision of the lock with other nodes is specified. 
		Time that is very longer than "once synchronous read from 
		device which stored meta-data + once 
		synchronous write" is specified usually. Default is 1 second.
		This value need not be changed by using this option usually.  
		Because it is not thought to take one second or more to 
		synchronous read and write.

		-t <lock_timeout> --- This specifies the validity term 
		of lock. The unit is a second. This timer prevents the 
		resource being locked for a long time when node crashes 
		with the lock acquired. Therefore, the lock holding node 
		must update lock data at intervals that are shorter than 
		this timer. The sfex_update command is used for updating 
		lock. Default is 60 seconds.

		<device> --- This is file path which stored mata-data. 
		It is usually expressed in "/dev/...", because it is 
		partition on the shared disk.

		exit code --- 
		0 - Acquire a lock from unlock status. 
		1 - Acquire a lock from lock timeout status. 
		2 - Lock acquisition failed. 
		3 - Error occurs while processing it. The content of the 
		    error is displayed into stderr. 
		4 - The mistake is found in the command line parameter.

	3.2.5 sfex_unlock
		sfex_unlock [-i <index>] <device>

		-i <index> --- The index is number of the resource that 
		releases the lock. This number is specified by the integer 
		of one or more. When two or more resources are exclusively 
		controlled by one meta-data, this option is used. 
		Default is 1.

		<device> --- This is file path which stored mata-data. 
		It is usually expressed in "/dev/...", because it is 
		partition on the shared disk.

		exit code --- 
		0 - Lock release success. 
		1 - Lock release done already. 
		    The lock has already been acquired by other nodes. 
		3 - Error occurs while processing it. 
		    The content of the error is displayed into stderr. 
		4 - The mistake is found in the command line parameter.

	3.2.6 sfex_update
		sfex_update [-i <index>] <device>

		-i <index> --- The index is number of the resource that 
		update the lock. This number is specified by the integer 
		of one or more. When two or more resources are exclusively 
		controlled by one meta-data, this option is used.
		Default is 1.

		<device> --- This is file path which stored mata-data. 
		It is usually expressed in "/dev/...", because it is 
		partition on the shared disk.

		exit code --- 
		0 - Lock update success. 
		2 - Lock update failed. 
		    The lock is acquired by other nodes. 
		3 - Error occurs while processing it. 
		    The content of the error is displayed into stderr. 
		4 - The mistake is found in the command line parameter.

=======================================================================

4.0   Trademarks and Notices
----------------------------

	Heartbeat is a registered trademark of The High Availability 
        Linux Project.

	Linux is a registered trademark of Linus Torvalds.

	Other company, product, and service names may be 
	trademarks or service marks of others.

=======================================================================

5.0   Disclaimer
----------------

	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 
	CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, 
	INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 
	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND 
	PARTICULARLY THE NON-INFRINGEMENT OF ANY THIRD PARTY'S 
	INTELLECTUAL PROPERTY RIGHTS ARE DISCLAIMED. IN NO EVENT 
	SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY 
	DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
	CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT 
	OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
	OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
	LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 
	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE 
	USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH 
	DAMAGE.