summaryrefslogtreecommitdiffstats
path: root/man/sam_overview.3
diff options
context:
space:
mode:
Diffstat (limited to 'man/sam_overview.3')
-rw-r--r--man/sam_overview.3181
1 files changed, 181 insertions, 0 deletions
diff --git a/man/sam_overview.3 b/man/sam_overview.3
new file mode 100644
index 0000000..13b45ad
--- /dev/null
+++ b/man/sam_overview.3
@@ -0,0 +1,181 @@
+.\"/*
+.\" * Copyright (c) 2009-2010 Red Hat, Inc.
+.\" *
+.\" * All rights reserved.
+.\" *
+.\" * Author: Jan Friesse (jfriesse@redhat.com)
+.\" * Author: Steven Dake (sdake@redhat.com)
+.\" *
+.\" * This software licensed under BSD license, the text of which follows:
+.\" *
+.\" * Redistribution and use in source and binary forms, with or without
+.\" * modification, are permitted provided that the following conditions are met:
+.\" *
+.\" * - Redistributions of source code must retain the above copyright notice,
+.\" * this list of conditions and the following disclaimer.
+.\" * - Redistributions in binary form must reproduce the above copyright notice,
+.\" * this list of conditions and the following disclaimer in the documentation
+.\" * and/or other materials provided with the distribution.
+.\" * - Neither the name of the Red Hat, Inc. nor the names of its
+.\" * contributors may be used to endorse or promote products derived from this
+.\" * software without specific prior written permission.
+.\" *
+.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+.\" * THE POSSIBILITY OF SUCH DAMAGE.
+.\" */
+.TH "SAM_OVERVIEW" 3 "21/05/2010" "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
+
+.SH NAME
+.P
+sam_overview \- Overview of the Simple Availability Manager
+
+.SH OVERVIEW
+.P
+The SAM library provide a tool to check the health of an application.
+The main purpose of SAM is to restart a local process when it fails to respond
+to a healthcheck request in a configured time interval.
+
+.P
+During \fBsam_initialize(3)\fR, a duplicate copy of the process is created using
+the \fBfork(3)\fR system call. This duplicate process copy contains the logic
+for executing the SAM server. The SAM server is responsible for requesting
+healthchecks from the active process, and controlling the lifecycle of the
+active process when it fails. If the active process fails to respond to the
+healthcheck request sent by the SAM server, it will be sent a user configurable
+signal (default SIGTERM) to request shutdown of the application. After a configured time interval, the
+process will be forcibly killed by being sent a SIGKILL signal. Once the
+active process terminates, the SAM server will create a new active process.
+
+.P
+The Simple Availability Manager is meant to be used in conjunction with the
+cpg service. Used together, it is possible to restart a cpg process that fails
+healthchecking during operation.
+
+.P
+The main features of SAM include:
+
+.RS
+.IP \(bu 3
+A configurable recovery policy.
+.IP \(bu 3
+A configurable time interval for health check operations.
+.IP \(bu 3
+A notification via signal before recovery action is taken.
+.IP \(bu 3
+A mechanism to indicate to the application the number of times an active
+process has been created by the SAM server.
+.IP \(bu 3
+Both application driven health checking and event driven health checking.
+.RE
+
+.SH Initializing SAM
+.P
+The SAM library is initialized by \fBsam_initialize(3)\fR.
+\fBsam_initalize(3)\fR may only be called once per process. Calling it more
+then once has undefined results and is not recommended or tested.
+
+.SH Setting warning callback
+.P
+User configurable signal (default \fISIGTERM\fR) is sent to the application when a recovery action is
+planned. The application can use the \fBsignal(3)\fR system call to monitor
+for this signal.
+
+.P
+There are no special constraints on what SAM apis may be called in a warning
+callback. After \fItime_interval\fR expires, a SIGKILL signal is sent to the
+active process to force its termination.
+
+.SH Registering the active process
+.P
+The active process is registered with SAM by calling \fBsam_register(3)\fR.
+This function should only be called one time in a process. After a recovery
+action is taken, the new active process will begin execution at the next line
+of code in a user process after \fBsam_register(3)\fR.
+
+.SH Enabling event driven healthchecking
+.P
+Two types of healthchecking are available to the user. The first model is one
+where the user application healthchecks during its normal operation. It is
+never requested to healtcheck, and if the active process doesn't respond within
+the time interval, the process will be restarted.
+
+.P
+A more useful mechanism for healthchecking is event driven healthchecking.
+Because this model is directed by the SAM server, It isn't necessary to guess
+or add timers to the active process to signal a healthcheck operation is
+successful. To use event driven healthchecking,
+the \fBsam_hc_callback_register(3)\fR function should be executed.
+
+.SH Quorum integration
+.P
+SAM has special policies (\fISAM_RECOVERY_POLICY_QUIT\fR and \fISAM_RECOVERY_POLICY_RESTART\fR)
+for integration with quorum service. This policies changes SAM behaviour in two aspects.
+.RS
+.IP \(bu 3
+Call of \fBsam_start(3)\fR blocks until corosync becomes quorate
+.IP \(bu 3
+User selected recovery action is taken immediately after lost of quorum.
+.RE
+
+.SH Storing user data
+.P
+Sometimes there is need to store some data, which survives between instances.
+One can in such case use files, databases, ... or much simpler in memory solution
+presented by \fBsam_data_store(3)\fR, \fBsam_data_restore(3)\fR and \fBsam_data_getsize(3)\fR
+functions.
+
+.SH Confdb integration
+.P
+SAM has policy flag used for confdb system integration (\fISAM_RECOVERY_POLICY_CONFDB\fR).
+If process is registered with this flag, new confdb object PROCESS_NAME:PID is created with following
+keys:
+.RS
+.IP \(bu 3
+\fIrecovery\fR - will be quit or restart depending on policy
+.IP \(bu 3
+\fIpoll_period\fR - period of health checking in milliseconds
+.IP \(bu 3
+\fIlast_updated\fR - Timestamp (in nanoseconds) of the last health check.
+.IP \(bu 3
+\fIstate\fR - state of process (can be one of registered, started, failed, waiting for quorum)
+.RE
+
+.P
+Object is automatically deleted if process exits with stopped health checking.
+
+.P
+Confdb integration with corosync watchdog can be used in implicit and explicit way.
+
+.P
+Implicit way is achieved by setting recovery policy to QUIT and let process exit with started health checking.
+If this happened, object is not deleted and corosync watchdog will take required action.
+
+.P
+Explicit way is useful for situations, when developer can deal with some non-fatal fall of application.
+This mode is achieved by setting policy to RESTART and using SAM same as without Confdb integration.
+If real fail is needed (like too many restarts at all, per/sec, ...), it's possible to use \fBsam_mark_failed(3)\fR
+and let corosync watchdog take required action.
+
+.SH BUGS
+.SH "SEE ALSO"
+.BR sam_initialize (3),
+.BR sam_data_getsize (3),
+.BR sam_data_restore (3),
+.BR sam_data_store (3),
+.BR sam_finalize (3),
+.BR sam_mark_failed (3),
+.BR sam_start (3),
+.BR sam_stop (3),
+.BR sam_register (3),
+.BR sam_warn_signal_set (3),
+.BR sam_hc_send (3),
+.BR sam_hc_callback_register (3)