Adding upstream version 16.2.upstream/16.2

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-13 13:44:03 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-13 13:44:03 +0000
commit: 293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
tree: fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/high-availability.sgml
parent: Initial commit. (diff)
download: postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz
postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip
1 files changed, 2328 insertions, 0 deletions
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
new file mode 100644
index 0000000..40b37c7
--- /dev/null
+++ b/doc/src/sgml/high-availability.sgml
@@ -0,0 +1,2328 @@
+<!-- doc/src/sgml/high-availability.sgml -->
+
+<chapter id="high-availability">
+ <title>High Availability, Load Balancing, and Replication</title>
+
+ <indexterm><primary>high availability</primary></indexterm>
+ <indexterm><primary>failover</primary></indexterm>
+ <indexterm><primary>replication</primary></indexterm>
+ <indexterm><primary>load balancing</primary></indexterm>
+ <indexterm><primary>clustering</primary></indexterm>
+ <indexterm><primary>data partitioning</primary></indexterm>
+
+ <para>
+  Database servers can work together to allow a second server to
+  take over quickly if the primary server fails (high
+  availability), or to allow several computers to serve the same
+  data (load balancing).  Ideally, database servers could work
+  together seamlessly.  Web servers serving static web pages can
+  be combined quite easily by merely load-balancing web requests
+  to multiple machines.  In fact, read-only database servers can
+  be combined relatively easily too.  Unfortunately, most database
+  servers have a read/write mix of requests, and read/write servers
+  are much harder to combine.  This is because though read-only
+  data needs to be placed on each server only once, a write to any
+  server has to be propagated to all servers so that future read
+  requests to those servers return consistent results.
+ </para>
+
+ <para>
+  This synchronization problem is the fundamental difficulty for
+  servers working together.  Because there is no single solution
+  that eliminates the impact of the sync problem for all use cases,
+  there are multiple solutions.  Each solution addresses this
+  problem in a different way, and minimizes its impact for a specific
+  workload.
+ </para>
+
+ <para>
+  Some solutions deal with synchronization by allowing only one
+  server to modify the data.  Servers that can modify data are
+  called read/write, <firstterm>master</firstterm> or <firstterm>primary</firstterm> servers.
+  Servers that track changes in the primary are called <firstterm>standby</firstterm>
+  or <firstterm>secondary</firstterm> servers. A standby server that cannot be connected
+  to until it is promoted to a primary server is called a <firstterm>warm
+  standby</firstterm> server, and one that can accept connections and serves read-only
+  queries is called a <firstterm>hot standby</firstterm> server.
+ </para>
+
+ <para>
+  Some solutions are synchronous,
+  meaning that a data-modifying transaction is not considered
+  committed until all servers have committed the transaction.  This
+  guarantees that a failover will not lose any data and that all
+  load-balanced servers will return consistent results no matter
+  which server is queried. In contrast, asynchronous solutions allow some
+  delay between the time of a commit and its propagation to the other servers,
+  opening the possibility that some transactions might be lost in
+  the switch to a backup server, and that load balanced servers
+  might return slightly stale results.  Asynchronous communication
+  is used when synchronous would be too slow.
+ </para>
+
+ <para>
+  Solutions can also be categorized by their granularity.  Some solutions
+  can deal only with an entire database server, while others allow control
+  at the per-table or per-database level.
+ </para>
+
+ <para>
+  Performance must be considered in any choice.  There is usually a
+  trade-off between functionality and
+  performance.  For example, a fully synchronous solution over a slow
+  network might cut performance by more than half, while an asynchronous
+  one might have a minimal performance impact.
+ </para>
+
+ <para>
+  The remainder of this section outlines various failover, replication,
+  and load balancing solutions.
+ </para>
+
+ <sect1 id="different-replication-solutions">
+ <title>Comparison of Different Solutions</title>
+
+ <variablelist>
+
+  <varlistentry>
+   <term>Shared Disk Failover</term>
+   <listitem>
+
+    <para>
+     Shared disk failover avoids synchronization overhead by having only one
+     copy of the database.  It uses a single disk array that is shared by
+     multiple servers.  If the main database server fails, the standby server
+     is able to mount and start the database as though it were recovering from
+     a database crash.  This allows rapid failover with no data loss.
+    </para>
+
+    <para>
+     Shared hardware functionality is common in network storage devices.
+     Using a network file system is also possible, though care must be
+     taken that the file system has full <acronym>POSIX</acronym> behavior (see <xref
+     linkend="creating-cluster-nfs"/>).  One significant limitation of this
+     method is that if the shared disk array fails or becomes corrupt, the
+     primary and standby servers are both nonfunctional.  Another issue is
+     that the standby server should never access the shared storage while
+     the primary server is running.
+    </para>
+
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>File System (Block Device) Replication</term>
+   <listitem>
+
+    <para>
+     A modified version of shared hardware functionality is file system
+     replication, where all changes to a file system are mirrored to a file
+     system residing on another computer.  The only restriction is that
+     the mirroring must be done in a way that ensures the standby server
+     has a consistent copy of the file system &mdash; specifically, writes
+     to the standby must be done in the same order as those on the primary.
+     <productname>DRBD</productname> is a popular file system replication solution
+     for Linux.
+    </para>
+
+<!--
+https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
+
+Oracle RAC is a shared disk approach and just send cache invalidations
+to other nodes but not actual data. As the disk is shared, data is
+only committed once to disk and there is a distributed locking
+protocol to make nodes agree on a serializable transactional order.
+-->
+
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Write-Ahead Log Shipping</term>
+   <listitem>
+
+    <para>
+     Warm and hot standby servers can be kept current by reading a
+     stream of write-ahead log (<acronym>WAL</acronym>)
+     records.  If the main server fails, the standby contains
+     almost all of the data of the main server, and can be quickly
+     made the new primary database server.  This can be synchronous or
+     asynchronous and can only be done for the entire database server.
+    </para>
+    <para>
+     A standby server can be implemented using file-based log shipping
+     (<xref linkend="warm-standby"/>) or streaming replication (see
+     <xref linkend="streaming-replication"/>), or a combination of both. For
+     information on hot standby, see <xref linkend="hot-standby"/>.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Logical Replication</term>
+   <listitem>
+    <para>
+     Logical replication allows a database server to send a stream of data
+     modifications to another server.  <productname>PostgreSQL</productname>
+     logical replication constructs a stream of logical data modifications
+     from the WAL.  Logical replication allows replication of data changes on
+     a per-table basis.  In addition, a server that is publishing its own
+     changes can also subscribe to changes from another server, allowing data
+     to flow in multiple directions.  For more information on logical
+     replication, see <xref linkend="logical-replication"/>.  Through the
+     logical decoding interface (<xref linkend="logicaldecoding"/>),
+     third-party extensions can also provide similar functionality.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Trigger-Based Primary-Standby Replication</term>
+   <listitem>
+
+    <para>
+     A trigger-based replication setup typically funnels data modification
+     queries to a designated primary server. Operating on a per-table basis,
+     the primary server sends data changes (typically) asynchronously to the
+     standby servers.  Standby servers can answer queries while the primary is
+     running, and may allow some local data changes or write activity.  This
+     form of replication is often used for offloading large analytical or data
+     warehouse queries.
+    </para>
+
+    <para>
+     <productname>Slony-I</productname> is an example of this type of
+     replication, with per-table granularity, and support for multiple standby
+     servers.  Because it updates the standby server asynchronously (in
+     batches), there is possible data loss during fail over.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>SQL-Based Replication Middleware</term>
+   <listitem>
+
+    <para>
+     With SQL-based replication middleware, a program intercepts
+     every SQL query and sends it to one or all servers.  Each server
+     operates independently.  Read-write queries must be sent to all servers,
+     so that every server receives any changes.  But read-only queries can be
+     sent to just one server, allowing the read workload to be distributed
+     among them.
+    </para>
+
+    <para>
+     If queries are simply broadcast unmodified, functions like
+     <function>random()</function>, <function>CURRENT_TIMESTAMP</function>, and
+     sequences can have different values on different servers.
+     This is because each server operates independently, and because
+     SQL queries are broadcast rather than actual data changes.  If
+     this is unacceptable, either the middleware or the application
+     must determine such values from a single source and then use those
+     values in write queries.  Care must also be taken that all
+     transactions either commit or abort on all servers, perhaps
+     using two-phase commit (<xref linkend="sql-prepare-transaction"/>
+     and <xref linkend="sql-commit-prepared"/>).
+     <productname>Pgpool-II</productname> and <productname>Continuent Tungsten</productname>
+     are examples of this type of replication.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Asynchronous Multimaster Replication</term>
+   <listitem>
+
+    <para>
+     For servers that are not regularly connected or have slow
+     communication links, like laptops or
+     remote servers, keeping data consistent among servers is a
+     challenge.  Using asynchronous multimaster replication, each
+     server works independently, and periodically communicates with
+     the other servers to identify conflicting transactions.  The
+     conflicts can be resolved by users or conflict resolution rules.
+     Bucardo is an example of this type of replication.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Synchronous Multimaster Replication</term>
+   <listitem>
+
+    <para>
+     In synchronous multimaster replication, each server can accept
+     write requests, and modified data is transmitted from the
+     original server to every other server before each transaction
+     commits.  Heavy write activity can cause excessive locking and
+     commit delays, leading to poor performance.  Read requests can
+     be sent to any server.  Some implementations use shared disk
+     to reduce the communication overhead.  Synchronous multimaster
+     replication is best for mostly read workloads, though its big
+     advantage is that any server can accept write requests &mdash;
+     there is no need to partition workloads between primary and
+     standby servers, and because the data changes are sent from one
+     server to another, there is no problem with non-deterministic
+     functions like <function>random()</function>.
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> does not offer this type of replication,
+     though <productname>PostgreSQL</productname> two-phase commit (<xref
+     linkend="sql-prepare-transaction"/> and <xref
+     linkend="sql-commit-prepared"/>)
+     can be used to implement this in application code or middleware.
+    </para>
+   </listitem>
+  </varlistentry>
+
+ </variablelist>
+
+ <para>
+  <xref linkend="high-availability-matrix"/> summarizes
+  the capabilities of the various solutions listed above.
+ </para>
+
+ <table id="high-availability-matrix">
+  <title>High Availability, Load Balancing, and Replication Feature Matrix</title>
+  <tgroup cols="9">
+   <colspec colname="col1" colwidth="1.1*"/>
+   <colspec colname="col2" colwidth="1*"/>
+   <colspec colname="col3" colwidth="1*"/>
+   <colspec colname="col4" colwidth="1*"/>
+   <colspec colname="col5" colwidth="1*"/>
+   <colspec colname="col6" colwidth="1*"/>
+   <colspec colname="col7" colwidth="1*"/>
+   <colspec colname="col8" colwidth="1*"/>
+   <colspec colname="col9" colwidth="1*"/>
+   <thead>
+    <row>
+     <entry>Feature</entry>
+     <entry>Shared Disk</entry>
+     <entry>File System Repl.</entry>
+     <entry>Write-Ahead Log Shipping</entry>
+     <entry>Logical Repl.</entry>
+     <entry>Trigger-&zwsp;Based Repl.</entry>
+     <entry>SQL Repl. Middle-ware</entry>
+     <entry>Async. MM Repl.</entry>
+     <entry>Sync. MM Repl.</entry>
+    </row>
+   </thead>
+
+   <tbody>
+
+    <row>
+     <entry>Popular examples</entry>
+     <entry align="center">NAS</entry>
+     <entry align="center">DRBD</entry>
+     <entry align="center">built-in streaming repl.</entry>
+     <entry align="center">built-in logical repl., pglogical</entry>
+     <entry align="center">Londiste, Slony</entry>
+     <entry align="center">pgpool-II</entry>
+     <entry align="center">Bucardo</entry>
+     <entry align="center"></entry>
+    </row>
+
+    <row>
+     <entry>Comm. method</entry>
+     <entry align="center">shared disk</entry>
+     <entry align="center">disk blocks</entry>
+     <entry align="center">WAL</entry>
+     <entry align="center">logical decoding</entry>
+     <entry align="center">table rows</entry>
+     <entry align="center">SQL</entry>
+     <entry align="center">table rows</entry>
+     <entry align="center">table rows and row locks</entry>
+    </row>
+
+    <row>
+     <entry>No special hardware required</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+    <row>
+     <entry>Allows multiple primary servers</entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+    <row>
+     <entry>No overhead on primary</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+    </row>
+
+    <row>
+     <entry>No waiting for multiple servers</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">with sync off</entry>
+     <entry align="center">with sync off</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+    </row>
+
+    <row>
+     <entry>Primary failure will never lose data</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">with sync on</entry>
+     <entry align="center">with sync on</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+    <row>
+     <entry>Replicas accept read-only queries</entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+     <entry align="center">with hot standby</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+    <row>
+     <entry>Per-table granularity</entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+    <row>
+     <entry>No conflict resolution necessary</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center">&bull;</entry>
+     <entry align="center"></entry>
+     <entry align="center">&bull;</entry>
+    </row>
+
+   </tbody>
+  </tgroup>
+ </table>
+
+ <para>
+  There are a few solutions that do not fit into the above categories:
+ </para>
+
+ <variablelist>
+
+  <varlistentry>
+   <term>Data Partitioning</term>
+   <listitem>
+
+    <para>
+     Data partitioning splits tables into data sets.  Each set can
+     be modified by only one server.  For example, data can be
+     partitioned by offices, e.g., London and Paris, with a server
+     in each office.  If queries combining London and Paris data
+     are necessary, an application can query both servers, or
+     primary/standby replication can be used to keep a read-only copy
+     of the other office's data on each server.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term>Multiple-Server Parallel Query Execution</term>
+   <listitem>
+
+    <para>
+     Many of the above solutions allow multiple servers to handle multiple
+     queries, but none allow a single query to use multiple servers to
+     complete faster.  This solution allows multiple servers to work
+     concurrently on a single query.  It is usually accomplished by
+     splitting the data among servers and having each server execute its
+     part of the query and return results to a central server where they
+     are combined and returned to the user. This can be implemented using the
+     <productname>PL/Proxy</productname> tool set.
+    </para>
+
+   </listitem>
+  </varlistentry>
+
+ </variablelist>
+
+  <para>
+   It should also be noted that because <productname>PostgreSQL</productname>
+   is open source and easily extended, a number of companies have
+   taken <productname>PostgreSQL</productname> and created commercial
+   closed-source solutions with unique failover, replication, and load
+   balancing capabilities.  These are not discussed here.
+  </para>
+
+ </sect1>
+
+
+ <sect1 id="warm-standby">
+ <title>Log-Shipping Standby Servers</title>
+
+
+  <para>
+   Continuous archiving can be used to create a <firstterm>high
+   availability</firstterm> (HA) cluster configuration with one or more
+   <firstterm>standby servers</firstterm> ready to take over operations if the
+   primary server fails. This capability is widely referred to as
+   <firstterm>warm standby</firstterm> or <firstterm>log shipping</firstterm>.
+  </para>
+
+  <para>
+   The primary and standby server work together to provide this capability,
+   though the servers are only loosely coupled. The primary server operates
+   in continuous archiving mode, while each standby server operates in
+   continuous recovery mode, reading the WAL files from the primary. No
+   changes to the database tables are required to enable this capability,
+   so it offers low administration overhead compared to some other
+   replication solutions. This configuration also has relatively low
+   performance impact on the primary server.
+  </para>
+
+  <para>
+   Directly moving WAL records from one database server to another
+   is typically described as log shipping. <productname>PostgreSQL</productname>
+   implements file-based log shipping by transferring WAL records
+   one file (WAL segment) at a time. WAL files (16MB) can be
+   shipped easily and cheaply over any distance, whether it be to an
+   adjacent system, another system at the same site, or another system on
+   the far side of the globe. The bandwidth required for this technique
+   varies according to the transaction rate of the primary server.
+   Record-based log shipping is more granular and streams WAL changes
+   incrementally over a network connection (see <xref
+   linkend="streaming-replication"/>).
+  </para>
+
+  <para>
+   It should be noted that log shipping is asynchronous, i.e., the WAL
+   records are shipped after transaction commit. As a result, there is a
+   window for data loss should the primary server suffer a catastrophic
+   failure; transactions not yet shipped will be lost.  The size of the
+   data loss window in file-based log shipping can be limited by use of the
+   <varname>archive_timeout</varname> parameter, which can be set as low
+   as a few seconds.  However such a low setting will
+   substantially increase the bandwidth required for file shipping.
+   Streaming replication (see <xref linkend="streaming-replication"/>)
+   allows a much smaller window of data loss.
+  </para>
+
+  <para>
+   Recovery performance is sufficiently good that the standby will
+   typically be only moments away from full
+   availability once it has been activated. As a result, this is called
+   a warm standby configuration which offers high
+   availability. Restoring a server from an archived base backup and
+   rollforward will take considerably longer, so that technique only
+   offers a solution for disaster recovery, not high availability.
+   A standby server can also be used for read-only queries, in which case
+   it is called a <firstterm>hot standby</firstterm> server. See
+   <xref linkend="hot-standby"/> for more information.
+  </para>
+
+  <indexterm zone="high-availability">
+   <primary>warm standby</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>PITR standby</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>standby server</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>log shipping</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>witness server</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>STONITH</primary>
+  </indexterm>
+
+  <sect2 id="standby-planning">
+   <title>Planning</title>
+
+   <para>
+    It is usually wise to create the primary and standby servers
+    so that they are as similar as possible, at least from the
+    perspective of the database server.  In particular, the path names
+    associated with tablespaces will be passed across unmodified, so both
+    primary and standby servers must have the same mount paths for
+    tablespaces if that feature is used.  Keep in mind that if
+    <xref linkend="sql-createtablespace"/>
+    is executed on the primary, any new mount point needed for it must
+    be created on the primary and all standby servers before the command
+    is executed. Hardware need not be exactly the same, but experience shows
+    that maintaining two identical systems is easier than maintaining two
+    dissimilar ones over the lifetime of the application and system.
+    In any case the hardware architecture must be the same &mdash; shipping
+    from, say, a 32-bit to a 64-bit system will not work.
+   </para>
+
+   <para>
+    In general, log shipping between servers running different major
+    <productname>PostgreSQL</productname> release
+    levels is not possible. It is the policy of the PostgreSQL Global
+    Development Group not to make changes to disk formats during minor release
+    upgrades, so it is likely that running different minor release levels
+    on primary and standby servers will work successfully. However, no
+    formal support for that is offered and you are advised to keep primary
+    and standby servers at the same release level as much as possible.
+    When updating to a new minor release, the safest policy is to update
+    the standby servers first &mdash; a new minor release is more likely
+    to be able to read WAL files from a previous minor release than vice
+    versa.
+   </para>
+
+  </sect2>
+
+  <sect2 id="standby-server-operation" xreflabel="Standby Server Operation">
+   <title>Standby Server Operation</title>
+
+   <para>
+    A server enters standby mode if a
+    <anchor id="file-standby-signal" xreflabel="standby.signal"/>
+    <filename>standby.signal</filename>
+    <indexterm><primary><filename>standby.signal</filename></primary></indexterm>
+    file exists in the data directory when the server is started.
+   </para>
+
+   <para>
+    In standby mode, the server continuously applies WAL received from the
+    primary server. The standby server can read WAL from a WAL archive
+    (see <xref linkend="guc-restore-command"/>) or directly from the primary
+    over a TCP connection (streaming replication). The standby server will
+    also attempt to restore any WAL found in the standby cluster's
+    <filename>pg_wal</filename> directory. That typically happens after a server
+    restart, when the standby replays again WAL that was streamed from the
+    primary before the restart, but you can also manually copy files to
+    <filename>pg_wal</filename> at any time to have them replayed.
+   </para>
+
+   <para>
+    At startup, the standby begins by restoring all WAL available in the
+    archive location, calling <varname>restore_command</varname>. Once it
+    reaches the end of WAL available there and <varname>restore_command</varname>
+    fails, it tries to restore any WAL available in the <filename>pg_wal</filename> directory.
+    If that fails, and streaming replication has been configured, the
+    standby tries to connect to the primary server and start streaming WAL
+    from the last valid record found in archive or <filename>pg_wal</filename>. If that fails
+    or streaming replication is not configured, or if the connection is
+    later disconnected, the standby goes back to step 1 and tries to
+    restore the file from the archive again. This loop of retries from the
+    archive, <filename>pg_wal</filename>, and via streaming replication goes on until the server
+    is stopped or is promoted.
+   </para>
+
+   <para>
+    Standby mode is exited and the server switches to normal operation
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
+   </para>
+  </sect2>
+
+  <sect2 id="preparing-primary-for-standby">
+   <title>Preparing the Primary for Standby Servers</title>
+
+   <para>
+    Set up continuous archiving on the primary to an archive directory
+    accessible from the standby, as described
+    in <xref linkend="continuous-archiving"/>. The archive location should be
+    accessible from the standby even when the primary is down, i.e., it should
+    reside on the standby server itself or another trusted server, not on
+    the primary server.
+   </para>
+
+   <para>
+    If you want to use streaming replication, set up authentication on the
+    primary server to allow replication connections from the standby
+    server(s); that is, create a role and provide a suitable entry or
+    entries in <filename>pg_hba.conf</filename> with the database field set to
+    <literal>replication</literal>.  Also ensure <varname>max_wal_senders</varname> is set
+    to a sufficiently large value in the configuration file of the primary
+    server. If replication slots will be used,
+    ensure that <varname>max_replication_slots</varname> is set sufficiently
+    high as well.
+   </para>
+
+   <para>
+    Take a base backup as described in <xref linkend="backup-base-backup"/>
+    to bootstrap the standby server.
+   </para>
+  </sect2>
+
+  <sect2 id="standby-server-setup">
+   <title>Setting Up a Standby Server</title>
+
+   <para>
+    To set up the standby server, restore the base backup taken from primary
+    server (see <xref linkend="backup-pitr-recovery"/>). Create a file
+    <link linkend="file-standby-signal"><filename>standby.signal</filename></link><indexterm><primary>standby.signal</primary></indexterm>
+    in the standby's cluster data
+    directory. Set <xref linkend="guc-restore-command"/> to a simple command to copy files from
+    the WAL archive. If you plan to have multiple standby servers for high
+    availability purposes, make sure that <varname>recovery_target_timeline</varname> is set to
+    <literal>latest</literal> (the default), to make the standby server follow the timeline change
+    that occurs at failover to another standby.
+   </para>
+
+   <note>
+     <para>
+     <xref linkend="guc-restore-command"/> should return immediately
+     if the file does not exist; the server will retry the command again if
+     necessary.
+    </para>
+   </note>
+
+   <para>
+     If you want to use streaming replication, fill in
+     <xref linkend="guc-primary-conninfo"/> with a libpq connection string, including
+     the host name (or IP address) and any additional details needed to
+     connect to the primary server. If the primary needs a password for
+     authentication, the password needs to be specified in
+     <xref linkend="guc-primary-conninfo"/> as well.
+   </para>
+
+   <para>
+    If you're setting up the standby server for high availability purposes,
+    set up WAL archiving, connections and authentication like the primary
+    server, because the standby server will work as a primary server after
+    failover.
+   </para>
+
+   <para>
+    If you're using a WAL archive, its size can be minimized using the <xref
+    linkend="guc-archive-cleanup-command"/> parameter to remove files that are no
+    longer required by the standby server.
+    The <application>pg_archivecleanup</application> utility is designed specifically to
+    be used with <varname>archive_cleanup_command</varname> in typical single-standby
+    configurations, see <xref linkend="pgarchivecleanup"/>.
+    Note however, that if you're using the archive for backup purposes, you
+    need to retain files needed to recover from at least the latest base
+    backup, even if they're no longer needed by the standby.
+   </para>
+
+   <para>
+    A simple example of configuration is:
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
+restore_command = 'cp /path/to/archive/%f %p'
+archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
+</programlisting>
+   </para>
+
+   <para>
+    You can have any number of standby servers, but if you use streaming
+    replication, make sure you set <varname>max_wal_senders</varname> high enough in
+    the primary to allow them to be connected simultaneously.
+   </para>
+
+  </sect2>
+
+  <sect2 id="streaming-replication">
+   <title>Streaming Replication</title>
+
+   <indexterm zone="high-availability">
+    <primary>Streaming Replication</primary>
+   </indexterm>
+
+   <para>
+    Streaming replication allows a standby server to stay more up-to-date
+    than is possible with file-based log shipping. The standby connects
+    to the primary, which streams WAL records to the standby as they're
+    generated, without waiting for the WAL file to be filled.
+   </para>
+
+   <para>
+    Streaming replication is asynchronous by default
+    (see <xref linkend="synchronous-replication"/>), in which case there is
+    a small delay between committing a transaction in the primary and the
+    changes becoming visible in the standby. This delay is however much
+    smaller than with file-based log shipping, typically under one second
+    assuming the standby is powerful enough to keep up with the load. With
+    streaming replication, <varname>archive_timeout</varname> is not required to
+    reduce the data loss window.
+   </para>
+
+   <para>
+    If you use streaming replication without file-based continuous
+    archiving, the server might recycle old WAL segments before the standby
+    has received them.  If this occurs, the standby will need to be
+    reinitialized from a new base backup.  You can avoid this by setting
+    <varname>wal_keep_size</varname> to a value large enough to ensure that
+    WAL segments are not recycled too early, or by configuring a replication
+    slot for the standby.  If you set up a WAL archive that's accessible from
+    the standby, these solutions are not required, since the standby can
+    always use the archive to catch up provided it retains enough segments.
+   </para>
+
+   <para>
+    To use streaming replication, set up a file-based log-shipping standby
+    server as described in <xref linkend="warm-standby"/>. The step that
+    turns a file-based log-shipping standby into streaming replication
+    standby is setting the <varname>primary_conninfo</varname> setting
+    to point to the primary server. Set
+    <xref linkend="guc-listen-addresses"/> and authentication options
+    (see <filename>pg_hba.conf</filename>) on the primary so that the standby server
+    can connect to the <literal>replication</literal> pseudo-database on the primary
+    server (see <xref linkend="streaming-replication-authentication"/>).
+   </para>
+
+   <para>
+    On systems that support the keepalive socket option, setting
+    <xref linkend="guc-tcp-keepalives-idle"/>,
+    <xref linkend="guc-tcp-keepalives-interval"/> and
+    <xref linkend="guc-tcp-keepalives-count"/> helps the primary promptly
+    notice a broken connection.
+   </para>
+
+   <para>
+    Set the maximum number of concurrent connections from the standby servers
+    (see <xref linkend="guc-max-wal-senders"/> for details).
+   </para>
+
+   <para>
+    When the standby is started and <varname>primary_conninfo</varname> is set
+    correctly, the standby will connect to the primary after replaying all
+    WAL files available in the archive. If the connection is established
+    successfully, you will see a <literal>walreceiver</literal> in the standby, and
+    a corresponding <literal>walsender</literal> process in the primary.
+   </para>
+
+   <sect3 id="streaming-replication-authentication">
+    <title>Authentication</title>
+    <para>
+     It is very important that the access privileges for replication be set up
+     so that only trusted users can read the WAL stream, because it is
+     easy to extract privileged information from it.  Standby servers must
+     authenticate to the primary as an account that has the
+     <literal>REPLICATION</literal> privilege or a superuser. It is
+     recommended to create a dedicated user account with
+     <literal>REPLICATION</literal> and <literal>LOGIN</literal>
+     privileges for replication. While <literal>REPLICATION</literal>
+     privilege gives very high permissions, it does not allow the user to
+     modify any data on the primary system, which the
+     <literal>SUPERUSER</literal> privilege does.
+    </para>
+
+    <para>
+     Client authentication for replication is controlled by a
+     <filename>pg_hba.conf</filename> record specifying <literal>replication</literal> in the
+     <replaceable>database</replaceable> field. For example, if the standby is running on
+     host IP <literal>192.168.1.100</literal> and the account name for replication
+     is <literal>foo</literal>, the administrator can add the following line to the
+     <filename>pg_hba.conf</filename> file on the primary:
+
+<programlisting>
+# Allow the user "foo" from host 192.168.1.100 to connect to the primary
+# as a replication standby if the user's password is correctly supplied.
+#
+# TYPE  DATABASE        USER            ADDRESS                 METHOD
+host    replication     foo             192.168.1.100/32        md5
+</programlisting>
+    </para>
+    <para>
+     The host name and port number of the primary, connection user name,
+     and password are specified in the <xref linkend="guc-primary-conninfo"/>.
+     The password can also be set in the <filename>~/.pgpass</filename> file on the
+     standby (specify <literal>replication</literal> in the <replaceable>database</replaceable>
+     field).
+     For example, if the primary is running on host IP <literal>192.168.1.50</literal>,
+     port <literal>5432</literal>, the account name for replication is
+     <literal>foo</literal>, and the password is <literal>foopass</literal>, the administrator
+     can add the following line to the <filename>postgresql.conf</filename> file on the
+     standby:
+
+<programlisting>
+# The standby connects to the primary that is running on host 192.168.1.50
+# and port 5432 as the user "foo" whose password is "foopass".
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+</programlisting>
+    </para>
+   </sect3>
+
+   <sect3 id="streaming-replication-monitoring">
+    <title>Monitoring</title>
+    <para>
+     An important health indicator of streaming replication is the amount
+     of WAL records generated in the primary, but not yet applied in the
+     standby. You can calculate this lag by comparing the current WAL write
+     location on the primary with the last WAL location received by the
+     standby. These locations can be retrieved using
+     <function>pg_current_wal_lsn</function> on the primary and
+     <function>pg_last_wal_receive_lsn</function> on the standby,
+     respectively (see <xref linkend="functions-admin-backup-table"/> and
+     <xref linkend="functions-recovery-info-table"/> for details).
+     The last WAL receive location in the standby is also displayed in the
+     process status of the WAL receiver process, displayed using the
+     <command>ps</command> command (see <xref linkend="monitoring-ps"/> for details).
+    </para>
+    <para>
+     You can retrieve a list of WAL sender processes via the
+     <link linkend="monitoring-pg-stat-replication-view"><structname>
+     pg_stat_replication</structname></link> view. Large differences between
+     <function>pg_current_wal_lsn</function> and the view's <literal>sent_lsn</literal> field
+     might indicate that the primary server is under heavy load, while
+     differences between <literal>sent_lsn</literal> and
+     <function>pg_last_wal_receive_lsn</function> on the standby might indicate
+     network delay, or that the standby is under heavy load.
+    </para>
+    <para>
+     On a hot standby, the status of the WAL receiver process can be retrieved
+     via the <link linkend="monitoring-pg-stat-wal-receiver-view">
+     <structname>pg_stat_wal_receiver</structname></link> view.  A large
+     difference between <function>pg_last_wal_replay_lsn</function> and the
+     view's <literal>flushed_lsn</literal> indicates that WAL is being
+     received faster than it can be replayed.
+    </para>
+   </sect3>
+  </sect2>
+
+  <sect2 id="streaming-replication-slots">
+   <title>Replication Slots</title>
+   <indexterm>
+    <primary>replication slot</primary>
+    <secondary>streaming replication</secondary>
+   </indexterm>
+   <para>
+    Replication slots provide an automated way to ensure that the primary does
+    not remove WAL segments until they have been received by all standbys,
+    and that the primary does not remove rows which could cause a
+    <link linkend="hot-standby-conflict">recovery conflict</link> even when the
+    standby is disconnected.
+   </para>
+   <para>
+    In lieu of using replication slots, it is possible to prevent the removal
+    of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
+    storing the segments in an archive using
+    <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>.
+    However, these methods often result in retaining more WAL segments than
+    required, whereas replication slots retain only the number of segments
+    known to be needed.  On the other hand, replication slots can retain so
+    many WAL segments that they fill up the space allocated
+    for <literal>pg_wal</literal>;
+    <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files
+    retained by replication slots.
+   </para>
+   <para>
+    Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without
+    also using a replication slot, provides protection against relevant rows
+    being removed by vacuum, but provides no protection during any time period
+    when the standby is not connected.  Replication slots overcome these
+    disadvantages.
+   </para>
+   <sect3 id="streaming-replication-slots-manipulation">
+    <title>Querying and Manipulating Replication Slots</title>
+    <para>
+     Each replication slot has a name, which can contain lower-case letters,
+     numbers, and the underscore character.
+    </para>
+    <para>
+     Existing replication slots and their state can be seen in the
+     <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
+     view.
+    </para>
+    <para>
+     Slots can be created and dropped either via the streaming replication
+     protocol (see <xref linkend="protocol-replication"/>) or via SQL
+     functions (see <xref linkend="functions-replication"/>).
+    </para>
+   </sect3>
+   <sect3 id="streaming-replication-slots-config">
+    <title>Configuration Example</title>
+    <para>
+     You can create a replication slot like this:
+<programlisting>
+postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
+  slot_name  | lsn
+-------------+-----
+ node_a_slot |
+
+postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
+  slot_name  | slot_type | active
+-------------+-----------+--------
+ node_a_slot | physical  | f
+(1 row)
+</programlisting>
+     To configure the standby to use this slot, <varname>primary_slot_name</varname>
+     should be configured on the standby. Here is a simple example:
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+primary_slot_name = 'node_a_slot'
+</programlisting>
+    </para>
+   </sect3>
+  </sect2>
+
+  <sect2 id="cascading-replication">
+   <title>Cascading Replication</title>
+
+   <indexterm zone="high-availability">
+    <primary>Cascading Replication</primary>
+   </indexterm>
+
+   <para>
+    The cascading replication feature allows a standby server to accept replication
+    connections and stream WAL records to other standbys, acting as a relay.
+    This can be used to reduce the number of direct connections to the primary
+    and also to minimize inter-site bandwidth overheads.
+   </para>
+
+   <para>
+    A standby acting as both a receiver and a sender is known as a cascading
+    standby.  Standbys that are more directly connected to the primary are known
+    as upstream servers, while those standby servers further away are downstream
+    servers.  Cascading replication does not place limits on the number or
+    arrangement of downstream servers, though each standby connects to only
+    one upstream server which eventually links to a single primary server.
+   </para>
+
+   <para>
+    A cascading standby sends not only WAL records received from the
+    primary but also those restored from the archive. So even if the replication
+    connection in some upstream connection is terminated, streaming replication
+    continues downstream for as long as new WAL records are available.
+   </para>
+
+   <para>
+    Cascading replication is currently asynchronous. Synchronous replication
+    (see <xref linkend="synchronous-replication"/>) settings have no effect on
+    cascading replication at present.
+   </para>
+
+   <para>
+    Hot standby feedback propagates upstream, whatever the cascaded arrangement.
+   </para>
+
+   <para>
+    If an upstream standby server is promoted to become the new primary, downstream
+    servers will continue to stream from the new primary if
+    <varname>recovery_target_timeline</varname> is set to <literal>'latest'</literal> (the default).
+   </para>
+
+   <para>
+    To use cascading replication, set up the cascading standby so that it can
+    accept replication connections (that is, set
+    <xref linkend="guc-max-wal-senders"/> and <xref linkend="guc-hot-standby"/>,
+    and configure
+    <link linkend="auth-pg-hba-conf">host-based authentication</link>).
+    You will also need to set <varname>primary_conninfo</varname> in the downstream
+    standby to point to the cascading standby.
+   </para>
+  </sect2>
+
+  <sect2 id="synchronous-replication">
+   <title>Synchronous Replication</title>
+
+   <indexterm zone="high-availability">
+    <primary>Synchronous Replication</primary>
+   </indexterm>
+
+   <para>
+    <productname>PostgreSQL</productname> streaming replication is asynchronous by
+    default. If the primary server
+    crashes then some transactions that were committed may not have been
+    replicated to the standby server, causing data loss. The amount
+    of data loss is proportional to the replication delay at the time of
+    failover.
+   </para>
+
+   <para>
+    Synchronous replication offers the ability to confirm that all changes
+    made by a transaction have been transferred to one or more synchronous
+    standby servers. This extends that standard level of durability
+    offered by a transaction commit. This level of protection is referred
+    to as 2-safe replication in computer science theory, and group-1-safe
+    (group-safe and 1-safe) when <varname>synchronous_commit</varname> is set to
+    <literal>remote_write</literal>.
+   </para>
+
+   <para>
+    When requesting synchronous replication, each commit of a
+    write transaction will wait until confirmation is
+    received that the commit has been written to the write-ahead log on disk
+    of both the primary and standby server. The only possibility that data
+    can be lost is if both the primary and the standby suffer crashes at the
+    same time. This can provide a much higher level of durability, though only
+    if the sysadmin is cautious about the placement and management of the two
+    servers.  Waiting for confirmation increases the user's confidence that the
+    changes will not be lost in the event of server crashes but it also
+    necessarily increases the response time for the requesting transaction.
+    The minimum wait time is the round-trip time between primary and standby.
+   </para>
+
+   <para>
+    Read-only transactions and transaction rollbacks need not wait for
+    replies from standby servers. Subtransaction commits do not wait for
+    responses from standby servers, only top-level commits. Long
+    running actions such as data loading or index building do not wait
+    until the very final commit message. All two-phase commit actions
+    require commit waits, including both prepare and commit.
+   </para>
+
+   <para>
+    A synchronous standby can be a physical replication standby or a logical
+    replication subscriber.  It can also be any other physical or logical WAL
+    replication stream consumer that knows how to send the appropriate
+    feedback messages.  Besides the built-in physical and logical replication
+    systems, this includes special programs such
+    as <command>pg_receivewal</command> and <command>pg_recvlogical</command>
+    as well as some third-party replication systems and custom programs.
+    Check the respective documentation for details on synchronous replication
+    support.
+   </para>
+
+   <sect3 id="synchronous-replication-config">
+    <title>Basic Configuration</title>
+
+   <para>
+    Once streaming replication has been configured, configuring synchronous
+    replication requires only one additional configuration step:
+    <xref linkend="guc-synchronous-standby-names"/> must be set to
+    a non-empty value.  <varname>synchronous_commit</varname> must also be set to
+    <literal>on</literal>, but since this is the default value, typically no change is
+    required.  (See <xref linkend="runtime-config-wal-settings"/> and
+    <xref linkend="runtime-config-replication-primary"/>.)
+    This configuration will cause each commit to wait for
+    confirmation that the standby has written the commit record to durable
+    storage.
+    <varname>synchronous_commit</varname> can be set by individual
+    users, so it can be configured in the configuration file, for particular
+    users or databases, or dynamically by applications, in order to control
+    the durability guarantee on a per-transaction basis.
+   </para>
+
+   <para>
+    After a commit record has been written to disk on the primary, the
+    WAL record is then sent to the standby. The standby sends reply
+    messages each time a new batch of WAL data is written to disk, unless
+    <varname>wal_receiver_status_interval</varname> is set to zero on the standby.
+    In the case that <varname>synchronous_commit</varname> is set to
+    <literal>remote_apply</literal>, the standby sends reply messages when the commit
+    record is replayed, making the transaction visible.
+    If the standby is chosen as a synchronous standby, according to the setting
+    of <varname>synchronous_standby_names</varname> on the primary, the reply
+    messages from that standby will be considered along with those from other
+    synchronous standbys to decide when to release transactions waiting for
+    confirmation that the commit record has been received. These parameters
+    allow the administrator to specify which standby servers should be
+    synchronous standbys. Note that the configuration of synchronous
+    replication is mainly on the primary. Named standbys must be directly
+    connected to the primary; the primary knows nothing about downstream
+    standby servers using cascaded replication.
+   </para>
+
+   <para>
+    Setting <varname>synchronous_commit</varname> to <literal>remote_write</literal> will
+    cause each commit to wait for confirmation that the standby has received
+    the commit record and written it out to its own operating system, but not
+    for the data to be flushed to disk on the standby.  This
+    setting provides a weaker guarantee of durability than <literal>on</literal>
+    does: the standby could lose the data in the event of an operating system
+    crash, though not a <productname>PostgreSQL</productname> crash.
+    However, it's a useful setting in practice
+    because it can decrease the response time for the transaction.
+    Data loss could only occur if both the primary and the standby crash and
+    the database of the primary gets corrupted at the same time.
+   </para>
+
+   <para>
+    Setting <varname>synchronous_commit</varname> to <literal>remote_apply</literal> will
+    cause each commit to wait until the current synchronous standbys report
+    that they have replayed the transaction, making it visible to user
+    queries.  In simple cases, this allows for load balancing with causal
+    consistency.
+   </para>
+
+   <para>
+    Users will stop waiting if a fast shutdown is requested.  However, as
+    when using asynchronous replication, the server will not fully
+    shutdown until all outstanding WAL records are transferred to the currently
+    connected standby servers.
+   </para>
+
+   </sect3>
+
+   <sect3 id="synchronous-replication-multiple-standbys">
+    <title>Multiple Synchronous Standbys</title>
+
+   <para>
+    Synchronous replication supports one or more synchronous standby servers;
+    transactions will wait until all the standby servers which are considered
+    as synchronous confirm receipt of their data. The number of synchronous
+    standbys that transactions must wait for replies from is specified in
+    <varname>synchronous_standby_names</varname>. This parameter also specifies
+    a list of standby names and the method (<literal>FIRST</literal> and
+    <literal>ANY</literal>) to choose synchronous standbys from the listed ones.
+   </para>
+   <para>
+    The method <literal>FIRST</literal> specifies a priority-based synchronous
+    replication and makes transaction commits wait until their WAL records are
+    replicated to the requested number of synchronous standbys chosen based on
+    their priorities. The standbys whose names appear earlier in the list are
+    given higher priority and will be considered as synchronous. Other standby
+    servers appearing later in this list represent potential synchronous
+    standbys. If any of the current synchronous standbys disconnects for
+    whatever reason, it will be replaced immediately with the
+    next-highest-priority standby.
+   </para>
+   <para>
+    An example of <varname>synchronous_standby_names</varname> for
+    a priority-based multiple synchronous standbys is:
+<programlisting>
+synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
+</programlisting>
+    In this example, if four standby servers <literal>s1</literal>, <literal>s2</literal>,
+    <literal>s3</literal> and <literal>s4</literal> are running, the two standbys
+    <literal>s1</literal> and <literal>s2</literal> will be chosen as synchronous standbys
+    because their names appear early in the list of standby names.
+    <literal>s3</literal> is a potential synchronous standby and will take over
+    the role of synchronous standby when either of <literal>s1</literal> or
+    <literal>s2</literal> fails. <literal>s4</literal> is an asynchronous standby since
+    its name is not in the list.
+   </para>
+   <para>
+    The method <literal>ANY</literal> specifies a quorum-based synchronous
+    replication and makes transaction commits wait until their WAL records
+    are replicated to <emphasis>at least</emphasis> the requested number of
+    synchronous standbys in the list.
+   </para>
+   <para>
+    An example of <varname>synchronous_standby_names</varname> for
+    a quorum-based multiple synchronous standbys is:
+<programlisting>
+synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
+</programlisting>
+    In this example, if four standby servers <literal>s1</literal>, <literal>s2</literal>,
+    <literal>s3</literal> and <literal>s4</literal> are running, transaction commits will
+    wait for replies from at least any two standbys of <literal>s1</literal>,
+    <literal>s2</literal> and <literal>s3</literal>. <literal>s4</literal> is an asynchronous
+    standby since its name is not in the list.
+   </para>
+   <para>
+    The synchronous states of standby servers can be viewed using
+    the <structname>pg_stat_replication</structname> view.
+   </para>
+   </sect3>
+
+   <sect3 id="synchronous-replication-performance">
+    <title>Planning for Performance</title>
+
+   <para>
+    Synchronous replication usually requires carefully planned and placed
+    standby servers to ensure applications perform acceptably. Waiting
+    doesn't utilize system resources, but transaction locks continue to be
+    held until the transfer is confirmed. As a result, incautious use of
+    synchronous replication will reduce performance for database
+    applications because of increased response times and higher contention.
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> allows the application developer
+    to specify the durability level required via replication. This can be
+    specified for the system overall, though it can also be specified for
+    specific users or connections, or even individual transactions.
+   </para>
+
+   <para>
+    For example, an application workload might consist of:
+    10% of changes are important customer details, while
+    90% of changes are less important data that the business can more
+    easily survive if it is lost, such as chat messages between users.
+   </para>
+
+   <para>
+    With synchronous replication options specified at the application level
+    (on the primary) we can offer synchronous replication for the most
+    important changes, without slowing down the bulk of the total workload.
+    Application level options are an important and practical tool for allowing
+    the benefits of synchronous replication for high performance applications.
+   </para>
+
+   <para>
+    You should consider that the network bandwidth must be higher than
+    the rate of generation of WAL data.
+   </para>
+
+   </sect3>
+
+   <sect3 id="synchronous-replication-ha">
+    <title>Planning for High Availability</title>
+
+   <para>
+    <varname>synchronous_standby_names</varname> specifies the number and
+    names of synchronous standbys that transaction commits made when
+    <varname>synchronous_commit</varname> is set to <literal>on</literal>,
+    <literal>remote_apply</literal> or <literal>remote_write</literal> will wait for
+    responses from. Such transaction commits may never be completed
+    if any one of synchronous standbys should crash.
+   </para>
+
+   <para>
+    The best solution for high availability is to ensure you keep as many
+    synchronous standbys as requested. This can be achieved by naming multiple
+    potential synchronous standbys using <varname>synchronous_standby_names</varname>.
+   </para>
+
+   <para>
+    In a priority-based synchronous replication, the standbys whose names
+    appear earlier in the list will be used as synchronous standbys.
+    Standbys listed after these will take over the role of synchronous standby
+    if one of current ones should fail.
+   </para>
+
+   <para>
+    In a quorum-based synchronous replication, all the standbys appearing
+    in the list will be used as candidates for synchronous standbys.
+    Even if one of them should fail, the other standbys will keep performing
+    the role of candidates of synchronous standby.
+   </para>
+
+   <para>
+    When a standby first attaches to the primary, it will not yet be properly
+    synchronized. This is described as <literal>catchup</literal> mode. Once
+    the lag between standby and primary reaches zero for the first time
+    we move to real-time <literal>streaming</literal> state.
+    The catch-up duration may be long immediately after the standby has
+    been created. If the standby is shut down, then the catch-up period
+    will increase according to the length of time the standby has been down.
+    The standby is only able to become a synchronous standby
+    once it has reached <literal>streaming</literal> state.
+    This state can be viewed using
+    the <structname>pg_stat_replication</structname> view.
+   </para>
+
+   <para>
+    If primary restarts while commits are waiting for acknowledgment, those
+    waiting transactions will be marked fully committed once the primary
+    database recovers.
+    There is no way to be certain that all standbys have received all
+    outstanding WAL data at time of the crash of the primary. Some
+    transactions may not show as committed on the standby, even though
+    they show as committed on the primary. The guarantee we offer is that
+    the application will not receive explicit acknowledgment of the
+    successful commit of a transaction until the WAL data is known to be
+    safely received by all the synchronous standbys.
+   </para>
+
+   <para>
+    If you really cannot keep as many synchronous standbys as requested
+    then you should decrease the number of synchronous standbys that
+    transaction commits must wait for responses from
+    in <varname>synchronous_standby_names</varname> (or disable it) and
+    reload the configuration file on the primary server.
+   </para>
+
+   <para>
+    If the primary is isolated from remaining standby servers you should
+    fail over to the best candidate of those other remaining standby servers.
+   </para>
+
+   <para>
+    If you need to re-create a standby server while transactions are
+    waiting, make sure that the commands pg_backup_start() and
+    pg_backup_stop() are run in a session with
+    <varname>synchronous_commit</varname> = <literal>off</literal>, otherwise those
+    requests will wait forever for the standby to appear.
+   </para>
+
+   </sect3>
+  </sect2>
+
+  <sect2 id="continuous-archiving-in-standby">
+   <title>Continuous Archiving in Standby</title>
+
+   <indexterm>
+     <primary>continuous archiving</primary>
+     <secondary>in standby</secondary>
+   </indexterm>
+
+   <para>
+     When continuous WAL archiving is used in a standby, there are two
+     different scenarios: the WAL archive can be shared between the primary
+     and the standby, or the standby can have its own WAL archive. When
+     the standby has its own WAL archive, set <varname>archive_mode</varname>
+     to <literal>always</literal>, and the standby will call the archive
+     command for every WAL segment it receives, whether it's by restoring
+     from the archive or by streaming replication. The shared archive can
+     be handled similarly, but the <varname>archive_command</varname> or <varname>archive_library</varname> must
+     test if the file being archived exists already, and if the existing file
+     has identical contents. This requires more care in the
+     <varname>archive_command</varname> or <varname>archive_library</varname>, as it must
+     be careful to not overwrite an existing file with different contents,
+     but return success if the exactly same file is archived twice. And
+     all that must be done free of race conditions, if two servers attempt
+     to archive the same file at the same time.
+   </para>
+
+   <para>
+     If <varname>archive_mode</varname> is set to <literal>on</literal>, the
+     archiver is not enabled during recovery or standby mode. If the standby
+     server is promoted, it will start archiving after the promotion, but
+     will not archive any WAL or timeline history files that
+     it did not generate itself. To get a complete
+     series of WAL files in the archive, you must ensure that all WAL is
+     archived, before it reaches the standby. This is inherently true with
+     file-based log shipping, as the standby can only restore files that
+     are found in the archive, but not if streaming replication is enabled.
+     When a server is not in recovery mode, there is no difference between
+     <literal>on</literal> and <literal>always</literal> modes.
+   </para>
+  </sect2>
+  </sect1>
+
+  <sect1 id="warm-standby-failover">
+   <title>Failover</title>
+
+   <para>
+    If the primary server fails then the standby server should begin
+    failover procedures.
+   </para>
+
+   <para>
+    If the standby server fails then no failover need take place. If the
+    standby server can be restarted, even some time later, then the recovery
+    process can also be restarted immediately, taking advantage of
+    restartable recovery. If the standby server cannot be restarted, then a
+    full new standby server instance should be created.
+   </para>
+
+   <para>
+    If the primary server fails and the standby server becomes the
+    new primary, and then the old primary restarts, you must have
+    a mechanism for informing the old primary that it is no longer the primary. This is
+    sometimes known as <acronym>STONITH</acronym> (Shoot The Other Node In The Head), which is
+    necessary to avoid situations where both systems think they are the
+    primary, which will lead to confusion and ultimately data loss.
+   </para>
+
+   <para>
+    Many failover systems use just two systems, the primary and the standby,
+    connected by some kind of heartbeat mechanism to continually verify the
+    connectivity between the two and the viability of the primary. It is
+    also possible to use a third system (called a witness server) to prevent
+    some cases of inappropriate failover, but the additional complexity
+    might not be worthwhile unless it is set up with sufficient care and
+    rigorous testing.
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> does not provide the system
+    software required to identify a failure on the primary and notify
+    the standby database server.  Many such tools exist and are well
+    integrated with the operating system facilities required for
+    successful failover, such as IP address migration.
+   </para>
+
+   <para>
+    Once failover to the standby occurs, there is only a
+    single server in operation. This is known as a degenerate state.
+    The former standby is now the primary, but the former primary is down
+    and might stay down.  To return to normal operation, a standby server
+    must be recreated,
+    either on the former primary system when it comes up, or on a third,
+    possibly new, system. The <xref linkend="app-pgrewind"/> utility can be
+    used to speed up this process on large clusters.
+    Once complete, the primary and standby can be
+    considered to have switched roles. Some people choose to use a third
+    server to provide backup for the new primary until the new standby
+    server is recreated,
+    though clearly this complicates the system configuration and
+    operational processes.
+   </para>
+
+   <para>
+    So, switching from primary to standby server can be fast but requires
+    some time to re-prepare the failover cluster. Regular switching from
+    primary to standby is useful, since it allows regular downtime on
+    each system for maintenance. This also serves as a test of the
+    failover mechanism to ensure that it will really work when you need it.
+    Written administration procedures are advised.
+   </para>
+
+   <para>
+    To trigger failover of a log-shipping standby server, run
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    If you're setting up reporting servers that are only used to offload
+    read-only queries from the primary, not for high availability purposes,
+    you don't need to promote.
+   </para>
+  </sect1>
+
+ <sect1 id="hot-standby">
+  <title>Hot Standby</title>
+
+  <indexterm zone="high-availability">
+   <primary>hot standby</primary>
+  </indexterm>
+
+   <para>
+    Hot standby is the term used to describe the ability to connect to
+    the server and run read-only queries while the server is in archive
+    recovery or standby mode. This
+    is useful both for replication purposes and for restoring a backup
+    to a desired state with great precision.
+    The term hot standby also refers to the ability of the server to move
+    from recovery through to normal operation while users continue running
+    queries and/or keep their connections open.
+   </para>
+
+   <para>
+    Running queries in hot standby mode is similar to normal query operation,
+    though there are several usage and administrative differences
+    explained below.
+   </para>
+
+  <sect2 id="hot-standby-users">
+   <title>User's Overview</title>
+
+   <para>
+    When the <xref linkend="guc-hot-standby"/> parameter is set to true on a
+    standby server, it will begin accepting connections once the recovery has
+    brought the system to a consistent state.  All such connections are
+    strictly read-only; not even temporary tables may be written.
+   </para>
+
+   <para>
+    The data on the standby takes some time to arrive from the primary server
+    so there will be a measurable delay between primary and standby. Running the
+    same query nearly simultaneously on both primary and standby might therefore
+    return differing results. We say that data on the standby is
+    <firstterm>eventually consistent</firstterm> with the primary.  Once the
+    commit record for a transaction is replayed on the standby, the changes
+    made by that transaction will be visible to any new snapshots taken on
+    the standby.  Snapshots may be taken at the start of each query or at the
+    start of each transaction, depending on the current transaction isolation
+    level.  For more details, see <xref linkend="transaction-iso"/>.
+   </para>
+
+   <para>
+    Transactions started during hot standby may issue the following commands:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Query access: <command>SELECT</command>, <command>COPY TO</command>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Cursor commands: <command>DECLARE</command>, <command>FETCH</command>, <command>CLOSE</command>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Settings: <command>SHOW</command>, <command>SET</command>, <command>RESET</command>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Transaction management commands:
+        <itemizedlist>
+         <listitem>
+          <para>
+           <command>BEGIN</command>, <command>END</command>, <command>ABORT</command>, <command>START TRANSACTION</command>
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           <command>SAVEPOINT</command>, <command>RELEASE</command>, <command>ROLLBACK TO SAVEPOINT</command>
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           <command>EXCEPTION</command> blocks and other internal subtransactions
+          </para>
+         </listitem>
+        </itemizedlist>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>LOCK TABLE</command>, though only when explicitly in one of these modes:
+       <literal>ACCESS SHARE</literal>, <literal>ROW SHARE</literal> or <literal>ROW EXCLUSIVE</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Plans and resources: <command>PREPARE</command>, <command>EXECUTE</command>,
+       <command>DEALLOCATE</command>, <command>DISCARD</command>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Plugins and extensions: <command>LOAD</command>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>UNLISTEN</command>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Transactions started during hot standby will never be assigned a
+    transaction ID and cannot write to the system write-ahead log.
+    Therefore, the following actions will produce error messages:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Data Manipulation Language (DML): <command>INSERT</command>,
+       <command>UPDATE</command>, <command>DELETE</command>,
+       <command>MERGE</command>, <command>COPY FROM</command>,
+       <command>TRUNCATE</command>.
+       Note that there are no allowed actions that result in a trigger
+       being executed during recovery.  This restriction applies even to
+       temporary tables, because table rows cannot be read or written without
+       assigning a transaction ID, which is currently not possible in a
+       hot standby environment.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Data Definition Language (DDL): <command>CREATE</command>,
+       <command>DROP</command>, <command>ALTER</command>, <command>COMMENT</command>.
+       This restriction applies even to temporary tables, because carrying
+       out these operations would require updating the system catalog tables.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>SELECT ... FOR SHARE | UPDATE</command>, because row locks cannot be
+       taken without updating the underlying data files.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Rules on <command>SELECT</command> statements that generate DML commands.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>LOCK</command> that explicitly requests a mode higher than <literal>ROW EXCLUSIVE MODE</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>LOCK</command> in short default form, since it requests <literal>ACCESS EXCLUSIVE MODE</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Transaction management commands that explicitly set non-read-only state:
+        <itemizedlist>
+         <listitem>
+          <para>
+            <command>BEGIN READ WRITE</command>,
+            <command>START TRANSACTION READ WRITE</command>
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+            <command>SET TRANSACTION READ WRITE</command>,
+            <command>SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE</command>
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           <command>SET transaction_read_only = off</command>
+          </para>
+         </listitem>
+        </itemizedlist>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Two-phase commit commands: <command>PREPARE TRANSACTION</command>,
+       <command>COMMIT PREPARED</command>, <command>ROLLBACK PREPARED</command>
+       because even read-only transactions need to write WAL in the
+       prepare phase (the first phase of two phase commit).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Sequence updates: <function>nextval()</function>, <function>setval()</function>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <command>LISTEN</command>, <command>NOTIFY</command>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    In normal operation, <quote>read-only</quote> transactions are allowed to
+    use <command>LISTEN</command> and <command>NOTIFY</command>,
+    so hot standby sessions operate under slightly tighter
+    restrictions than ordinary read-only sessions.  It is possible that some
+    of these restrictions might be loosened in a future release.
+   </para>
+
+   <para>
+    During hot standby, the parameter <varname>transaction_read_only</varname> is always
+    true and may not be changed.  But as long as no attempt is made to modify
+    the database, connections during hot standby will act much like any other
+    database connection.  If failover or switchover occurs, the database will
+    switch to normal processing mode.  Sessions will remain connected while the
+    server changes mode.  Once hot standby finishes, it will be possible to
+    initiate read-write transactions (even from a session begun during
+    hot standby).
+   </para>
+
+   <para>
+    Users can determine whether hot standby is currently active for their
+    session by issuing <command>SHOW in_hot_standby</command>.
+    (In server versions before 14, the <varname>in_hot_standby</varname>
+    parameter did not exist; a workable substitute method for older servers
+    is <command>SHOW transaction_read_only</command>.)  In addition, a set of
+    functions (<xref linkend="functions-recovery-info-table"/>) allow users to
+    access information about the standby server. These allow you to write
+    programs that are aware of the current state of the database. These
+    can be used to monitor the progress of recovery, or to allow you to
+    write complex programs that restore the database to particular states.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-conflict">
+   <title>Handling Query Conflicts</title>
+
+   <para>
+    The primary and standby servers are in many ways loosely connected. Actions
+    on the primary will have an effect on the standby. As a result, there is
+    potential for negative interactions or conflicts between them. The easiest
+    conflict to understand is performance: if a huge data load is taking place
+    on the primary then this will generate a similar stream of WAL records on the
+    standby, so standby queries may contend for system resources, such as I/O.
+   </para>
+
+   <para>
+    There are also additional types of conflict that can occur with hot standby.
+    These conflicts are <emphasis>hard conflicts</emphasis> in the sense that queries
+    might need to be canceled and, in some cases, sessions disconnected to resolve them.
+    The user is provided with several ways to handle these
+    conflicts. Conflict cases include:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         Access Exclusive locks taken on the primary server, including both
+         explicit <command>LOCK</command> commands and various <acronym>DDL</acronym>
+         actions, conflict with table accesses in standby queries.
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Dropping a tablespace on the primary conflicts with standby queries
+         using that tablespace for temporary work files.
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Dropping a database on the primary conflicts with sessions connected
+         to that database on the standby.
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Application of a vacuum cleanup record from WAL conflicts with
+         standby transactions whose snapshots can still <quote>see</quote> any of
+         the rows to be removed.
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Application of a vacuum cleanup record from WAL conflicts with
+         queries accessing the target page on the standby, whether or not
+         the data to be removed is visible.
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    On the primary server, these cases simply result in waiting; and the
+    user might choose to cancel either of the conflicting actions.  However,
+    on the standby there is no choice: the WAL-logged action already occurred
+    on the primary so the standby must not fail to apply it.  Furthermore,
+    allowing WAL application to wait indefinitely may be very undesirable,
+    because the standby's state will become increasingly far behind the
+    primary's.  Therefore, a mechanism is provided to forcibly cancel standby
+    queries that conflict with to-be-applied WAL records.
+   </para>
+
+   <para>
+    An example of the problem situation is an administrator on the primary
+    server running <command>DROP TABLE</command> on a table that is currently being
+    queried on the standby server.  Clearly the standby query cannot continue
+    if the <command>DROP TABLE</command> is applied on the standby. If this situation
+    occurred on the primary, the <command>DROP TABLE</command> would wait until the
+    other query had finished. But when <command>DROP TABLE</command> is run on the
+    primary, the primary doesn't have information about what queries are
+    running on the standby, so it will not wait for any such standby
+    queries. The WAL change records come through to the standby while the
+    standby query is still running, causing a conflict.  The standby server
+    must either delay application of the WAL records (and everything after
+    them, too) or else cancel the conflicting query so that the <command>DROP
+    TABLE</command> can be applied.
+   </para>
+
+   <para>
+    When a conflicting query is short, it's typically desirable to allow it to
+    complete by delaying WAL application for a little bit; but a long delay in
+    WAL application is usually not desirable.  So the cancel mechanism has
+    parameters, <xref linkend="guc-max-standby-archive-delay"/> and <xref
+    linkend="guc-max-standby-streaming-delay"/>, that define the maximum
+    allowed delay in WAL application.  Conflicting queries will be canceled
+    once it has taken longer than the relevant delay setting to apply any
+    newly-received WAL data.  There are two parameters so that different delay
+    values can be specified for the case of reading WAL data from an archive
+    (i.e., initial recovery from a base backup or <quote>catching up</quote> a
+    standby server that has fallen far behind) versus reading WAL data via
+    streaming replication.
+   </para>
+
+   <para>
+    In a standby server that exists primarily for high availability, it's
+    best to set the delay parameters relatively short, so that the server
+    cannot fall far behind the primary due to delays caused by standby
+    queries.  However, if the standby server is meant for executing
+    long-running queries, then a high or even infinite delay value may be
+    preferable.  Keep in mind however that a long-running query could
+    cause other sessions on the standby server to not see recent changes
+    on the primary, if it delays application of WAL records.
+   </para>
+
+   <para>
+    Once the delay specified by <varname>max_standby_archive_delay</varname> or
+    <varname>max_standby_streaming_delay</varname> has been exceeded, conflicting
+    queries will be canceled.  This usually results just in a cancellation
+    error, although in the case of replaying a <command>DROP DATABASE</command>
+    the entire conflicting session will be terminated.  Also, if the conflict
+    is over a lock held by an idle transaction, the conflicting session is
+    terminated (this behavior might change in the future).
+   </para>
+
+   <para>
+    Canceled queries may be retried immediately (after beginning a new
+    transaction, of course).  Since query cancellation depends on
+    the nature of the WAL records being replayed, a query that was
+    canceled may well succeed if it is executed again.
+   </para>
+
+   <para>
+    Keep in mind that the delay parameters are compared to the elapsed time
+    since the WAL data was received by the standby server.  Thus, the grace
+    period allowed to any one query on the standby is never more than the
+    delay parameter, and could be considerably less if the standby has already
+    fallen behind as a result of waiting for previous queries to complete, or
+    as a result of being unable to keep up with a heavy update load.
+   </para>
+
+   <para>
+    The most common reason for conflict between standby queries and WAL replay
+    is <quote>early cleanup</quote>.  Normally, <productname>PostgreSQL</productname> allows
+    cleanup of old row versions when there are no transactions that need to
+    see them to ensure correct visibility of data according to MVCC rules.
+    However, this rule can only be applied for transactions executing on the
+    primary.  So it is possible that cleanup on the primary will remove row
+    versions that are still visible to a transaction on the standby.
+   </para>
+
+   <para>
+    Row version cleanup isn't the only potential cause of conflicts with
+    standby queries.  All index-only scans (including those that run on
+    standbys) must use an <acronym>MVCC</acronym> snapshot that
+    <quote>agrees</quote> with the visibility map.  Conflicts are therefore
+    required whenever <command>VACUUM</command> <link
+     linkend="vacuum-for-visibility-map">sets a page as all-visible in the
+     visibility map</link> containing one or more rows
+    <emphasis>not</emphasis> visible to all standby queries.  So even running
+    <command>VACUUM</command> against a table with no updated or deleted rows
+    requiring cleanup might lead to conflicts.
+   </para>
+
+   <para>
+    Users should be clear that tables that are regularly and heavily updated
+    on the primary server will quickly cause cancellation of longer running
+    queries on the standby. In such cases the setting of a finite value for
+    <varname>max_standby_archive_delay</varname> or
+    <varname>max_standby_streaming_delay</varname> can be considered similar to
+    setting <varname>statement_timeout</varname>.
+   </para>
+
+   <para>
+    Remedial possibilities exist if the number of standby-query cancellations
+    is found to be unacceptable.  The first option is to set the parameter
+    <varname>hot_standby_feedback</varname>, which prevents <command>VACUUM</command> from
+    removing recently-dead rows and so cleanup conflicts do not occur.
+    If you do this, you
+    should note that this will delay cleanup of dead rows on the primary,
+    which may result in undesirable table bloat. However, the cleanup
+    situation will be no worse than if the standby queries were running
+    directly on the primary server, and you are still getting the benefit of
+    off-loading execution onto the standby.
+    If standby servers connect and disconnect frequently, you
+    might want to make adjustments to handle the period when
+    <varname>hot_standby_feedback</varname> feedback is not being provided.
+    For example, consider increasing <varname>max_standby_archive_delay</varname>
+    so that queries are not rapidly canceled by conflicts in WAL archive
+    files during disconnected periods.  You should also consider increasing
+    <varname>max_standby_streaming_delay</varname> to avoid rapid cancellations
+    by newly-arrived streaming WAL entries after reconnection.
+   </para>
+
+   <para>
+    The number of query cancels and the reason for them can be viewed using
+    the <structname>pg_stat_database_conflicts</structname> system view on the standby
+    server. The <structname>pg_stat_database</structname> system view also contains
+    summary information.
+   </para>
+
+   <para>
+    Users can control whether a log message is produced when WAL replay is waiting
+    longer than <varname>deadlock_timeout</varname> for conflicts. This
+    is controlled by the <xref linkend="guc-log-recovery-conflict-waits"/> parameter.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-admin">
+   <title>Administrator's Overview</title>
+
+   <para>
+    If <varname>hot_standby</varname> is <literal>on</literal> in <filename>postgresql.conf</filename>
+    (the default value) and there is a
+    <link linkend="file-standby-signal"><filename>standby.signal</filename></link><indexterm><primary>standby.signal</primary><secondary>for hot standby</secondary></indexterm>
+    file present, the server will run in hot standby mode.
+    However, it may take some time for hot standby connections to be allowed,
+    because the server will not accept connections until it has completed
+    sufficient recovery to provide a consistent state against which queries
+    can run.  During this period,
+    clients that attempt to connect will be refused with an error message.
+    To confirm the server has come up, either loop trying to connect from
+    the application, or look for these messages in the server logs:
+
+<programlisting>
+LOG:  entering standby mode
+
+... then some time later ...
+
+LOG:  consistent recovery state reached
+LOG:  database system is ready to accept read-only connections
+</programlisting>
+
+    Consistency information is recorded once per checkpoint on the primary.
+    It is not possible to enable hot standby when reading WAL
+    written during a period when <varname>wal_level</varname> was not set to
+    <literal>replica</literal> or <literal>logical</literal> on the primary.  Reaching
+    a consistent state can also be delayed in the presence of both of these
+    conditions:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         A write transaction has more than 64 subtransactions
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Very long-lived write transactions
+        </para>
+       </listitem>
+      </itemizedlist>
+
+    If you are running file-based log shipping ("warm standby"), you might need
+    to wait until the next WAL file arrives, which could be as long as the
+    <varname>archive_timeout</varname> setting on the primary.
+   </para>
+
+   <para>
+    The settings of some parameters determine the size of shared memory for
+    tracking transaction IDs, locks, and prepared transactions.  These shared
+    memory structures must be no smaller on a standby than on the primary in
+    order to ensure that the standby does not run out of shared memory during
+    recovery.  For example, if the primary had used a prepared transaction but
+    the standby had not allocated any shared memory for tracking prepared
+    transactions, then recovery could not continue until the standby's
+    configuration is changed.  The parameters affected are:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         <varname>max_connections</varname>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_prepared_transactions</varname>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_locks_per_transaction</varname>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_wal_senders</varname>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_worker_processes</varname>
+        </para>
+       </listitem>
+      </itemizedlist>
+
+    The easiest way to ensure this does not become a problem is to have these
+    parameters set on the standbys to values equal to or greater than on the
+    primary.  Therefore, if you want to increase these values, you should do
+    so on all standby servers first, before applying the changes to the
+    primary server.  Conversely, if you want to decrease these values, you
+    should do so on the primary server first, before applying the changes to
+    all standby servers.  Keep in mind that when a standby is promoted, it
+    becomes the new reference for the required parameter settings for the
+    standbys that follow it.  Therefore, to avoid this becoming a problem
+    during a switchover or failover, it is recommended to keep these settings
+    the same on all standby servers.
+   </para>
+
+   <para>
+    The WAL tracks changes to these parameters on the
+    primary.  If a hot standby processes WAL that indicates that the current
+    value on the primary is higher than its own value, it will log a warning
+    and pause recovery, for example:
+<screen>
+WARNING:  hot standby is not possible because of insufficient parameter settings
+DETAIL:  max_connections = 80 is a lower setting than on the primary server, where its value was 100.
+LOG:  recovery has paused
+DETAIL:  If recovery is unpaused, the server will shut down.
+HINT:  You can then restart the server after making the necessary configuration changes.
+</screen>
+    At that point, the settings on the standby need to be updated and the
+    instance restarted before recovery can continue.  If the standby is not a
+    hot standby, then when it encounters the incompatible parameter change, it
+    will shut down immediately without pausing, since there is then no value
+    in keeping it up.
+   </para>
+
+   <para>
+    It is important that the administrator select appropriate settings for
+    <xref linkend="guc-max-standby-archive-delay"/> and <xref
+    linkend="guc-max-standby-streaming-delay"/>.  The best choices vary
+    depending on business priorities.  For example if the server is primarily
+    tasked as a High Availability server, then you will want low delay
+    settings, perhaps even zero, though that is a very aggressive setting. If
+    the standby server is tasked as an additional server for decision support
+    queries then it might be acceptable to set the maximum delay values to
+    many hours, or even -1 which means wait forever for queries to complete.
+   </para>
+
+   <para>
+    Transaction status "hint bits" written on the primary are not WAL-logged,
+    so data on the standby will likely re-write the hints again on the standby.
+    Thus, the standby server will still perform disk writes even though
+    all users are read-only; no changes occur to the data values
+    themselves.  Users will still write large sort temporary files and
+    re-generate relcache info files, so no part of the database
+    is truly read-only during hot standby mode.
+    Note also that writes to remote databases using
+    <application>dblink</application> module, and other operations outside the
+    database using PL functions will still be possible, even though the
+    transaction is read-only locally.
+   </para>
+
+   <para>
+    The following types of administration commands are not accepted
+    during recovery mode:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         Data Definition Language (DDL): e.g., <command>CREATE INDEX</command>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Privilege and Ownership: <command>GRANT</command>, <command>REVOKE</command>,
+         <command>REASSIGN</command>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Maintenance commands: <command>ANALYZE</command>, <command>VACUUM</command>,
+         <command>CLUSTER</command>, <command>REINDEX</command>
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    Again, note that some of these commands are actually allowed during
+    "read only" mode transactions on the primary.
+   </para>
+
+   <para>
+    As a result, you cannot create additional indexes that exist solely
+    on the standby, nor statistics that exist solely on the standby.
+    If these administration commands are needed, they should be executed
+    on the primary, and eventually those changes will propagate to the
+    standby.
+   </para>
+
+   <para>
+    <function>pg_cancel_backend()</function>
+    and <function>pg_terminate_backend()</function> will work on user backends,
+    but not the startup process, which performs
+    recovery. <structname>pg_stat_activity</structname> does not show
+    recovering transactions as active. As a result,
+    <structname>pg_prepared_xacts</structname> is always empty during
+    recovery. If you wish to resolve in-doubt prepared transactions, view
+    <literal>pg_prepared_xacts</literal> on the primary and issue commands to
+    resolve transactions there or resolve them after the end of recovery.
+   </para>
+
+   <para>
+    <structname>pg_locks</structname> will show locks held by backends,
+    as normal. <structname>pg_locks</structname> also shows
+    a virtual transaction managed by the startup process that owns all
+    <literal>AccessExclusiveLocks</literal> held by transactions being replayed by recovery.
+    Note that the startup process does not acquire locks to
+    make database changes, and thus locks other than <literal>AccessExclusiveLocks</literal>
+    do not show in <structname>pg_locks</structname> for the Startup
+    process; they are just presumed to exist.
+   </para>
+
+   <para>
+    The <productname>Nagios</productname> plugin <productname>check_pgsql</productname> will
+    work, because the simple information it checks for exists.
+    The <productname>check_postgres</productname> monitoring script will also work,
+    though some reported values could give different or confusing results.
+    For example, last vacuum time will not be maintained, since no
+    vacuum occurs on the standby.  Vacuums running on the primary
+    do still send their changes to the standby.
+   </para>
+
+   <para>
+    WAL file control commands will not work during recovery,
+    e.g., <function>pg_backup_start</function>, <function>pg_switch_wal</function> etc.
+   </para>
+
+   <para>
+    Dynamically loadable modules work, including <structname>pg_stat_statements</structname>.
+   </para>
+
+   <para>
+    Advisory locks work normally in recovery, including deadlock detection.
+    Note that advisory locks are never WAL logged, so it is impossible for
+    an advisory lock on either the primary or the standby to conflict with WAL
+    replay. Nor is it possible to acquire an advisory lock on the primary
+    and have it initiate a similar advisory lock on the standby. Advisory
+    locks relate only to the server on which they are acquired.
+   </para>
+
+   <para>
+    Trigger-based replication systems such as <productname>Slony</productname>,
+    <productname>Londiste</productname> and <productname>Bucardo</productname> won't run on the
+    standby at all, though they will run happily on the primary server as
+    long as the changes are not sent to standby servers to be applied.
+    WAL replay is not trigger-based so you cannot relay from the
+    standby to any system that requires additional database writes or
+    relies on the use of triggers.
+   </para>
+
+   <para>
+    New OIDs cannot be assigned, though some <acronym>UUID</acronym> generators may still
+    work as long as they do not rely on writing new status to the database.
+   </para>
+
+   <para>
+    Currently, temporary table creation is not allowed during read-only
+    transactions, so in some cases existing scripts will not run correctly.
+    This restriction might be relaxed in a later release. This is
+    both an SQL standard compliance issue and a technical issue.
+   </para>
+
+   <para>
+    <command>DROP TABLESPACE</command> can only succeed if the tablespace is empty.
+    Some standby users may be actively using the tablespace via their
+    <varname>temp_tablespaces</varname> parameter. If there are temporary files in the
+    tablespace, all active queries are canceled to ensure that temporary
+    files are removed, so the tablespace can be removed and WAL replay
+    can continue.
+   </para>
+
+   <para>
+    Running <command>DROP DATABASE</command> or <command>ALTER DATABASE ... SET
+    TABLESPACE</command> on the primary
+    will generate a WAL entry that will cause all users connected to that
+    database on the standby to be forcibly disconnected. This action occurs
+    immediately, whatever the setting of
+    <varname>max_standby_streaming_delay</varname>. Note that
+    <command>ALTER DATABASE ... RENAME</command> does not disconnect users, which
+    in most cases will go unnoticed, though might in some cases cause a
+    program confusion if it depends in some way upon database name.
+   </para>
+
+   <para>
+    In normal (non-recovery) mode, if you issue <command>DROP USER</command> or <command>DROP ROLE</command>
+    for a role with login capability while that user is still connected then
+    nothing happens to the connected user &mdash; they remain connected. The user cannot
+    reconnect however. This behavior applies in recovery also, so a
+    <command>DROP USER</command> on the primary does not disconnect that user on the standby.
+   </para>
+
+   <para>
+    The cumulative statistics system is active during recovery. All scans,
+    reads, blocks, index usage, etc., will be recorded normally on the
+    standby. However, WAL replay will not increment relation and database
+    specific counters. I.e. replay will not increment pg_stat_all_tables
+    columns (like n_tup_ins), nor will reads or writes performed by the
+    startup process be tracked in the pg_statio views, nor will associated
+    pg_stat_database columns be incremented.
+   </para>
+
+   <para>
+    Autovacuum is not active during recovery.  It will start normally at the
+    end of recovery.
+   </para>
+
+   <para>
+    The checkpointer process and the background writer process are active during
+    recovery. The checkpointer process will perform restartpoints (similar to
+    checkpoints on the primary) and the background writer process will perform
+    normal block cleaning activities. This can include updates of the hint bit
+    information stored on the standby server.
+    The <command>CHECKPOINT</command> command is accepted during recovery,
+    though it performs a restartpoint rather than a new checkpoint.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-parameters">
+   <title>Hot Standby Parameter Reference</title>
+
+   <para>
+    Various parameters have been mentioned above in
+    <xref linkend="hot-standby-conflict"/> and
+    <xref linkend="hot-standby-admin"/>.
+   </para>
+
+   <para>
+    On the primary, the <xref linkend="guc-wal-level"/> parameter can be used.
+    <xref linkend="guc-max-standby-archive-delay"/> and
+    <xref linkend="guc-max-standby-streaming-delay"/> have no effect if set on
+    the primary.
+   </para>
+
+   <para>
+    On the standby, parameters <xref linkend="guc-hot-standby"/>,
+    <xref linkend="guc-max-standby-archive-delay"/> and
+    <xref linkend="guc-max-standby-streaming-delay"/> can be used.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-caveats">
+   <title>Caveats</title>
+
+   <para>
+    There are several limitations of hot standby.
+    These can and probably will be fixed in future releases:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     Full knowledge of running transactions is required before snapshots
+     can be taken. Transactions that use large numbers of subtransactions
+     (currently greater than 64) will delay the start of read-only
+     connections until the completion of the longest running write transaction.
+     If this situation occurs, explanatory messages will be sent to the server log.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Valid starting points for standby queries are generated at each
+     checkpoint on the primary. If the standby is shut down while the primary
+     is in a shutdown state, it might not be possible to re-enter hot standby
+     until the primary is started up, so that it generates further starting
+     points in the WAL logs.  This situation isn't a problem in the most
+     common situations where it might happen. Generally, if the primary is
+     shut down and not available anymore, that's likely due to a serious
+     failure that requires the standby being converted to operate as
+     the new primary anyway.  And in situations where the primary is
+     being intentionally taken down, coordinating to make sure the standby
+     becomes the new primary smoothly is also standard procedure.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     At the end of recovery, <literal>AccessExclusiveLocks</literal> held by prepared transactions
+     will require twice the normal number of lock table entries. If you plan
+     on running either a large number of concurrent prepared transactions
+     that normally take <literal>AccessExclusiveLocks</literal>, or you plan on having one
+     large transaction that takes many <literal>AccessExclusiveLocks</literal>, you are
+     advised to select a larger value of <varname>max_locks_per_transaction</varname>,
+     perhaps as much as twice the value of the parameter on
+     the primary server. You need not consider this at all if
+     your setting of <varname>max_prepared_transactions</varname> is 0.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     The Serializable transaction isolation level is not yet available in hot
+     standby.  (See <xref linkend="xact-serializable"/> and
+     <xref linkend="serializable-consistency"/> for details.)
+     An attempt to set a transaction to the serializable isolation level in
+     hot standby mode will generate an error.
+    </para>
+   </listitem>
+  </itemizedlist>
+
+   </para>
+  </sect2>
+
+ </sect1>
+
+</chapter>
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-13 13:44:03 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-13 13:44:03 +0000
commit	293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
tree	fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/high-availability.sgml
parent	Initial commit. (diff)
download	postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip