diff options
Diffstat (limited to 'man/sam_overview.3')
-rw-r--r-- | man/sam_overview.3 | 181 |
1 files changed, 181 insertions, 0 deletions
diff --git a/man/sam_overview.3 b/man/sam_overview.3 new file mode 100644 index 0000000..13b45ad --- /dev/null +++ b/man/sam_overview.3 @@ -0,0 +1,181 @@ +.\"/* +.\" * Copyright (c) 2009-2010 Red Hat, Inc. +.\" * +.\" * All rights reserved. +.\" * +.\" * Author: Jan Friesse (jfriesse@redhat.com) +.\" * Author: Steven Dake (sdake@redhat.com) +.\" * +.\" * This software licensed under BSD license, the text of which follows: +.\" * +.\" * Redistribution and use in source and binary forms, with or without +.\" * modification, are permitted provided that the following conditions are met: +.\" * +.\" * - Redistributions of source code must retain the above copyright notice, +.\" * this list of conditions and the following disclaimer. +.\" * - Redistributions in binary form must reproduce the above copyright notice, +.\" * this list of conditions and the following disclaimer in the documentation +.\" * and/or other materials provided with the distribution. +.\" * - Neither the name of the Red Hat, Inc. nor the names of its +.\" * contributors may be used to endorse or promote products derived from this +.\" * software without specific prior written permission. +.\" * +.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +.\" * THE POSSIBILITY OF SUCH DAMAGE. +.\" */ +.TH "SAM_OVERVIEW" 3 "21/05/2010" "corosync Man Page" "Corosync Cluster Engine Programmer's Manual" + +.SH NAME +.P +sam_overview \- Overview of the Simple Availability Manager + +.SH OVERVIEW +.P +The SAM library provide a tool to check the health of an application. +The main purpose of SAM is to restart a local process when it fails to respond +to a healthcheck request in a configured time interval. + +.P +During \fBsam_initialize(3)\fR, a duplicate copy of the process is created using +the \fBfork(3)\fR system call. This duplicate process copy contains the logic +for executing the SAM server. The SAM server is responsible for requesting +healthchecks from the active process, and controlling the lifecycle of the +active process when it fails. If the active process fails to respond to the +healthcheck request sent by the SAM server, it will be sent a user configurable +signal (default SIGTERM) to request shutdown of the application. After a configured time interval, the +process will be forcibly killed by being sent a SIGKILL signal. Once the +active process terminates, the SAM server will create a new active process. + +.P +The Simple Availability Manager is meant to be used in conjunction with the +cpg service. Used together, it is possible to restart a cpg process that fails +healthchecking during operation. + +.P +The main features of SAM include: + +.RS +.IP \(bu 3 +A configurable recovery policy. +.IP \(bu 3 +A configurable time interval for health check operations. +.IP \(bu 3 +A notification via signal before recovery action is taken. +.IP \(bu 3 +A mechanism to indicate to the application the number of times an active +process has been created by the SAM server. +.IP \(bu 3 +Both application driven health checking and event driven health checking. +.RE + +.SH Initializing SAM +.P +The SAM library is initialized by \fBsam_initialize(3)\fR. +\fBsam_initalize(3)\fR may only be called once per process. Calling it more +then once has undefined results and is not recommended or tested. + +.SH Setting warning callback +.P +User configurable signal (default \fISIGTERM\fR) is sent to the application when a recovery action is +planned. The application can use the \fBsignal(3)\fR system call to monitor +for this signal. + +.P +There are no special constraints on what SAM apis may be called in a warning +callback. After \fItime_interval\fR expires, a SIGKILL signal is sent to the +active process to force its termination. + +.SH Registering the active process +.P +The active process is registered with SAM by calling \fBsam_register(3)\fR. +This function should only be called one time in a process. After a recovery +action is taken, the new active process will begin execution at the next line +of code in a user process after \fBsam_register(3)\fR. + +.SH Enabling event driven healthchecking +.P +Two types of healthchecking are available to the user. The first model is one +where the user application healthchecks during its normal operation. It is +never requested to healtcheck, and if the active process doesn't respond within +the time interval, the process will be restarted. + +.P +A more useful mechanism for healthchecking is event driven healthchecking. +Because this model is directed by the SAM server, It isn't necessary to guess +or add timers to the active process to signal a healthcheck operation is +successful. To use event driven healthchecking, +the \fBsam_hc_callback_register(3)\fR function should be executed. + +.SH Quorum integration +.P +SAM has special policies (\fISAM_RECOVERY_POLICY_QUIT\fR and \fISAM_RECOVERY_POLICY_RESTART\fR) +for integration with quorum service. This policies changes SAM behaviour in two aspects. +.RS +.IP \(bu 3 +Call of \fBsam_start(3)\fR blocks until corosync becomes quorate +.IP \(bu 3 +User selected recovery action is taken immediately after lost of quorum. +.RE + +.SH Storing user data +.P +Sometimes there is need to store some data, which survives between instances. +One can in such case use files, databases, ... or much simpler in memory solution +presented by \fBsam_data_store(3)\fR, \fBsam_data_restore(3)\fR and \fBsam_data_getsize(3)\fR +functions. + +.SH Confdb integration +.P +SAM has policy flag used for confdb system integration (\fISAM_RECOVERY_POLICY_CONFDB\fR). +If process is registered with this flag, new confdb object PROCESS_NAME:PID is created with following +keys: +.RS +.IP \(bu 3 +\fIrecovery\fR - will be quit or restart depending on policy +.IP \(bu 3 +\fIpoll_period\fR - period of health checking in milliseconds +.IP \(bu 3 +\fIlast_updated\fR - Timestamp (in nanoseconds) of the last health check. +.IP \(bu 3 +\fIstate\fR - state of process (can be one of registered, started, failed, waiting for quorum) +.RE + +.P +Object is automatically deleted if process exits with stopped health checking. + +.P +Confdb integration with corosync watchdog can be used in implicit and explicit way. + +.P +Implicit way is achieved by setting recovery policy to QUIT and let process exit with started health checking. +If this happened, object is not deleted and corosync watchdog will take required action. + +.P +Explicit way is useful for situations, when developer can deal with some non-fatal fall of application. +This mode is achieved by setting policy to RESTART and using SAM same as without Confdb integration. +If real fail is needed (like too many restarts at all, per/sec, ...), it's possible to use \fBsam_mark_failed(3)\fR +and let corosync watchdog take required action. + +.SH BUGS +.SH "SEE ALSO" +.BR sam_initialize (3), +.BR sam_data_getsize (3), +.BR sam_data_restore (3), +.BR sam_data_store (3), +.BR sam_finalize (3), +.BR sam_mark_failed (3), +.BR sam_start (3), +.BR sam_stop (3), +.BR sam_register (3), +.BR sam_warn_signal_set (3), +.BR sam_hc_send (3), +.BR sam_hc_callback_register (3) |