1 files changed, 1941 insertions, 0 deletions
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index 0000000..512e8b7
--- /dev/null
+++ b/doc/src/sgml/mvcc.sgml
@@ -0,0 +1,1941 @@
+<!-- doc/src/sgml/mvcc.sgml -->
+
+ <chapter id="mvcc">
+  <title>Concurrency Control</title>
+
+  <indexterm>
+   <primary>concurrency</primary>
+  </indexterm>
+
+  <para>
+   This chapter describes the behavior of the
+   <productname>PostgreSQL</productname> database system when two or
+   more sessions try to access the same data at the same time.  The
+   goals in that situation are to allow efficient access for all
+   sessions while maintaining strict data integrity.  Every developer
+   of database applications should be familiar with the topics covered
+   in this chapter.
+  </para>
+
+  <sect1 id="mvcc-intro">
+   <title>Introduction</title>
+
+   <indexterm>
+    <primary>Multiversion Concurrency Control</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>MVCC</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>Serializable Snapshot Isolation</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>SSI</primary>
+   </indexterm>
+
+   <para>
+    <productname>PostgreSQL</productname> provides a rich set of tools
+    for developers to manage concurrent access to data.  Internally,
+    data consistency is maintained by using a multiversion
+    model (Multiversion Concurrency Control, <acronym>MVCC</acronym>).
+    This means that each SQL statement sees
+    a snapshot of data (a <firstterm>database version</firstterm>)
+    as it was some
+    time ago, regardless of the current state of the underlying data.
+    This prevents statements from viewing inconsistent data produced
+    by concurrent transactions performing updates on the same
+    data rows, providing <firstterm>transaction isolation</firstterm>
+    for each database session.  <acronym>MVCC</acronym>, by eschewing
+    the locking methodologies of traditional database systems,
+    minimizes lock contention in order to allow for reasonable
+    performance in multiuser environments.
+   </para>
+
+   <para>
+    The main advantage of using the <acronym>MVCC</acronym> model of
+    concurrency control rather than locking is that in
+    <acronym>MVCC</acronym> locks acquired for querying (reading) data
+    do not conflict with locks acquired for writing data, and so
+    reading never blocks writing and writing never blocks reading.
+    <productname>PostgreSQL</productname> maintains this guarantee
+    even when providing the strictest level of transaction
+    isolation through the use of an innovative <firstterm>Serializable
+    Snapshot Isolation</firstterm> (<acronym>SSI</acronym>) level.
+   </para>
+
+   <para>
+    Table- and row-level locking facilities are also available in
+    <productname>PostgreSQL</productname> for applications which don't
+    generally need full transaction isolation and prefer to explicitly
+    manage particular points of conflict.  However, proper
+    use of <acronym>MVCC</acronym> will generally provide better
+    performance than locks.  In addition, application-defined advisory
+    locks provide a mechanism for acquiring locks that are not tied
+    to a single transaction.
+   </para>
+  </sect1>
+
+  <sect1 id="transaction-iso">
+   <title>Transaction Isolation</title>
+
+   <indexterm>
+    <primary>transaction isolation</primary>
+   </indexterm>
+
+   <para>
+    The <acronym>SQL</acronym> standard defines four levels of
+    transaction isolation.  The most strict is Serializable,
+    which is defined by the standard in a paragraph which says that any
+    concurrent execution of a set of Serializable transactions is guaranteed
+    to produce the same effect as running them one at a time in some order.
+    The other three levels are defined in terms of phenomena, resulting from
+    interaction between concurrent transactions, which must not occur at
+    each level.  The standard notes that due to the definition of
+    Serializable, none of these phenomena are possible at that level.  (This
+    is hardly surprising -- if the effect of the transactions must be
+    consistent with having been run one at a time, how could you see any
+    phenomena caused by interactions?)
+   </para>
+
+   <para>
+    The phenomena which are prohibited at various levels are:
+
+    <variablelist>
+     <varlistentry>
+      <term>
+       dirty read
+       <indexterm><primary>dirty read</primary></indexterm>
+      </term>
+      <listitem>
+       <para>
+        A transaction reads data written by a concurrent uncommitted transaction.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>
+       nonrepeatable read
+       <indexterm><primary>nonrepeatable read</primary></indexterm>
+      </term>
+      <listitem>
+       <para>
+        A transaction re-reads data it has previously read and finds that data
+        has been modified by another transaction (that committed since the
+        initial read).
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>
+       phantom read
+       <indexterm><primary>phantom read</primary></indexterm>
+      </term>
+      <listitem>
+       <para>
+        A transaction re-executes a query returning a set of rows that satisfy a
+        search condition and finds that the set of rows satisfying the condition
+        has changed due to another recently-committed transaction.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>
+       serialization anomaly
+       <indexterm><primary>serialization anomaly</primary></indexterm>
+      </term>
+      <listitem>
+       <para>
+        The result of successfully committing a group of transactions
+        is inconsistent with all possible orderings of running those
+        transactions one at a time.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+
+   <para>
+    <indexterm>
+     <primary>transaction isolation level</primary>
+    </indexterm>
+    The SQL standard and PostgreSQL-implemented transaction isolation levels
+    are described in <xref linkend="mvcc-isolevel-table"/>.
+   </para>
+
+    <table tocentry="1" id="mvcc-isolevel-table">
+     <title>Transaction Isolation Levels</title>
+     <tgroup cols="5">
+      <thead>
+       <row>
+        <entry>
+         Isolation Level
+        </entry>
+        <entry>
+         Dirty Read
+        </entry>
+        <entry>
+         Nonrepeatable Read
+        </entry>
+        <entry>
+         Phantom Read
+        </entry>
+        <entry>
+         Serialization Anomaly
+        </entry>
+       </row>
+      </thead>
+      <tbody>
+       <row>
+        <entry>
+         Read uncommitted
+        </entry>
+        <entry>
+         Allowed, but not in PG
+        </entry>
+        <entry>
+         Possible
+        </entry>
+        <entry>
+         Possible
+        </entry>
+        <entry>
+         Possible
+        </entry>
+       </row>
+
+       <row>
+        <entry>
+         Read committed
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Possible
+        </entry>
+        <entry>
+         Possible
+        </entry>
+        <entry>
+         Possible
+        </entry>
+       </row>
+
+       <row>
+        <entry>
+         Repeatable read
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Allowed, but not in PG
+        </entry>
+        <entry>
+         Possible
+        </entry>
+       </row>
+
+       <row>
+        <entry>
+         Serializable
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+        <entry>
+         Not possible
+        </entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+
+   <para>
+    In <productname>PostgreSQL</productname>, you can request any of
+    the four standard transaction isolation levels, but internally only
+    three distinct isolation levels are implemented, i.e., PostgreSQL's
+    Read Uncommitted mode behaves like Read Committed.  This is because
+    it is the only sensible way to map the standard isolation levels to
+    PostgreSQL's multiversion concurrency control architecture.
+   </para>
+
+   <para>
+    The table also shows that PostgreSQL's Repeatable Read implementation
+    does not allow phantom reads.  This is acceptable under the SQL
+    standard because the standard specifies which anomalies must
+    <emphasis>not</emphasis> occur at certain isolation levels;  higher
+    guarantees are acceptable.
+    The behavior of the available isolation levels is detailed in the
+    following subsections.
+   </para>
+
+   <para>
+    To set the transaction isolation level of a transaction, use the
+    command <xref linkend="sql-set-transaction"/>.
+   </para>
+
+   <important>
+     <para>
+       Some <productname>PostgreSQL</productname> data types and functions have
+       special rules regarding transactional behavior.  In particular, changes
+       made to a sequence (and therefore the counter of a
+       column declared using <type>serial</type>) are immediately visible
+       to all other transactions and are not rolled back if the transaction
+       that made the changes aborts.  See <xref linkend="functions-sequence"/>
+       and <xref linkend="datatype-serial"/>.
+     </para>
+   </important>
+
+  <sect2 id="xact-read-committed">
+   <title>Read Committed Isolation Level</title>
+
+   <indexterm>
+    <primary>transaction isolation level</primary>
+    <secondary>read committed</secondary>
+   </indexterm>
+
+   <indexterm>
+    <primary>read committed</primary>
+   </indexterm>
+
+   <para>
+    <firstterm>Read Committed</firstterm> is the default isolation
+    level in <productname>PostgreSQL</productname>.  When a transaction
+    uses this isolation level, a <command>SELECT</command> query
+    (without a <literal>FOR UPDATE/SHARE</literal> clause) sees only data
+    committed before the query began; it never sees either uncommitted
+    data or changes committed during query execution by concurrent
+    transactions.  In effect, a <command>SELECT</command> query sees
+    a snapshot of the database as of the instant the query begins to
+    run.   However, <command>SELECT</command> does see the effects
+    of previous updates executed within its own transaction, even
+    though they are not yet committed.  Also note that two successive
+    <command>SELECT</command> commands can see different data, even
+    though they are within a single transaction, if other transactions
+    commit changes after the first <command>SELECT</command> starts and
+    before the second <command>SELECT</command> starts.
+   </para>
+
+   <para>
+    <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT
+    FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands
+    behave the same as <command>SELECT</command>
+    in terms of searching for target rows: they will only find target rows
+    that were committed as of the command start time.  However, such a target
+    row might have already been updated (or deleted or locked) by
+    another concurrent transaction by the time it is found.  In this case, the
+    would-be updater will wait for the first updating transaction to commit or
+    roll back (if it is still in progress).  If the first updater rolls back,
+    then its effects are negated and the second updater can proceed with
+    updating the originally found row.  If the first updater commits, the
+    second updater will ignore the row if the first updater deleted it,
+    otherwise it will attempt to apply its operation to the updated version of
+    the row.  The search condition of the command (the <literal>WHERE</literal> clause) is
+    re-evaluated to see if the updated version of the row still matches the
+    search condition.  If so, the second updater proceeds with its operation
+    using the updated version of the row.  In the case of
+    <command>SELECT FOR UPDATE</command> and <command>SELECT FOR
+    SHARE</command>, this means it is the updated version of the row that is
+    locked and returned to the client.
+   </para>
+
+   <para>
+    <command>INSERT</command> with an <literal>ON CONFLICT DO UPDATE</literal> clause
+    behaves similarly. In Read Committed mode, each row proposed for insertion
+    will either insert or update. Unless there are unrelated errors, one of
+    those two outcomes is guaranteed.  If a conflict originates in another
+    transaction whose effects are not yet visible to the <command>INSERT</command>,
+    the <command>UPDATE</command> clause will affect that row,
+    even though possibly <emphasis>no</emphasis> version of that row is
+    conventionally visible to the command.
+   </para>
+
+   <para>
+    <command>INSERT</command> with an <literal>ON CONFLICT DO
+    NOTHING</literal> clause may have insertion not proceed for a row due to
+    the outcome of another transaction whose effects are not visible
+    to the <command>INSERT</command> snapshot.  Again, this is only
+    the case in Read Committed mode.
+   </para>
+
+   <para>
+    <command>MERGE</command> allows the user to specify various
+    combinations of <command>INSERT</command>, <command>UPDATE</command>
+    and <command>DELETE</command> subcommands. A <command>MERGE</command>
+    command with both <command>INSERT</command> and <command>UPDATE</command>
+    subcommands looks similar to <command>INSERT</command> with an
+    <literal>ON CONFLICT DO UPDATE</literal> clause but does not
+    guarantee that either <command>INSERT</command> or
+    <command>UPDATE</command> will occur.
+    If <command>MERGE</command> attempts an <command>UPDATE</command> or
+    <command>DELETE</command> and the row is concurrently updated but
+    the join condition still passes for the current target and the
+    current source tuple, then <command>MERGE</command> will behave
+    the same as the <command>UPDATE</command> or
+    <command>DELETE</command> commands and perform its action on the
+    updated version of the row.  However, because <command>MERGE</command>
+    can specify several actions and they can be conditional, the
+    conditions for each action are re-evaluated on the updated version of
+    the row, starting from the first action, even if the action that had
+    originally matched appears later in the list of actions.
+    On the other hand, if the row is concurrently updated or deleted so
+    that the join condition fails, then <command>MERGE</command> will
+    evaluate the condition's <literal>NOT MATCHED</literal> actions next,
+    and execute the first one that succeeds.
+    If <command>MERGE</command> attempts an <command>INSERT</command>
+    and a unique index is present and a duplicate row is concurrently
+    inserted, then a uniqueness violation error is raised;
+    <command>MERGE</command> does not attempt to avoid such
+    errors by restarting evaluation of <literal>MATCHED</literal>
+    conditions.
+   </para>
+
+   <para>
+    Because of the above rules, it is possible for an updating command to see
+    an inconsistent snapshot: it can see the effects of concurrent updating
+    commands on the same rows it is trying to update, but it
+    does not see effects of those commands on other rows in the database.
+    This behavior makes Read Committed mode unsuitable for commands that
+    involve complex search conditions; however, it is just right for simpler
+    cases.  For example, consider updating bank balances with transactions
+    like:
+
+<screen>
+BEGIN;
+UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
+UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
+COMMIT;
+</screen>
+
+    If two such transactions concurrently try to change the balance of account
+    12345, we clearly want the second transaction to start with the updated
+    version of the account's row.  Because each command is affecting only a
+    predetermined row, letting it see the updated version of the row does
+    not create any troublesome inconsistency.
+   </para>
+
+   <para>
+    More complex usage can produce undesirable results in Read Committed
+    mode.  For example, consider a <command>DELETE</command> command
+    operating on data that is being both added and removed from its
+    restriction criteria by another command, e.g., assume
+    <literal>website</literal> is a two-row table with
+    <literal>website.hits</literal> equaling <literal>9</literal> and
+    <literal>10</literal>:
+
+<screen>
+BEGIN;
+UPDATE website SET hits = hits + 1;
+-- run from another session:  DELETE FROM website WHERE hits = 10;
+COMMIT;
+</screen>
+
+    The <command>DELETE</command> will have no effect even though
+    there is a <literal>website.hits = 10</literal> row before and
+    after the <command>UPDATE</command>. This occurs because the
+    pre-update row value <literal>9</literal> is skipped, and when the
+    <command>UPDATE</command> completes and <command>DELETE</command>
+    obtains a lock, the new row value is no longer <literal>10</literal> but
+    <literal>11</literal>, which no longer matches the criteria.
+   </para>
+
+   <para>
+    Because Read Committed mode starts each command with a new snapshot
+    that includes all transactions committed up to that instant,
+    subsequent commands in the same transaction will see the effects
+    of the committed concurrent transaction in any case.  The point
+    at issue above is whether or not a <emphasis>single</emphasis> command
+    sees an absolutely consistent view of the database.
+   </para>
+
+   <para>
+    The partial transaction isolation provided by Read Committed mode
+    is adequate for many applications, and this mode is fast and simple
+    to use;  however, it is not sufficient for all cases.  Applications
+    that do complex queries and updates might require a more rigorously
+    consistent view of the database than Read Committed mode provides.
+   </para>
+  </sect2>
+
+  <sect2 id="xact-repeatable-read">
+   <title>Repeatable Read Isolation Level</title>
+
+   <indexterm>
+    <primary>transaction isolation level</primary>
+    <secondary>repeatable read</secondary>
+   </indexterm>
+
+   <indexterm>
+    <primary>repeatable read</primary>
+   </indexterm>
+
+   <para>
+    The <firstterm>Repeatable Read</firstterm> isolation level only sees
+    data committed before the transaction began; it never sees either
+    uncommitted data or changes committed during transaction execution
+    by concurrent transactions.  (However, the query does see the
+    effects of previous updates executed within its own transaction,
+    even though they are not yet committed.)  This is a stronger
+    guarantee than is required by the <acronym>SQL</acronym> standard
+    for this isolation level, and prevents all of the phenomena described
+    in <xref linkend="mvcc-isolevel-table"/> except for serialization
+    anomalies.  As mentioned above, this is
+    specifically allowed by the standard, which only describes the
+    <emphasis>minimum</emphasis> protections each isolation level must
+    provide.
+   </para>
+
+   <para>
+    This level is different from Read Committed in that a query in a
+    repeatable read transaction sees a snapshot as of the start of the
+    first non-transaction-control statement in the
+    <emphasis>transaction</emphasis>, not as of the start
+    of the current statement within the transaction.  Thus, successive
+    <command>SELECT</command> commands within a <emphasis>single</emphasis>
+    transaction see the same data, i.e., they do not see changes made by
+    other transactions that committed after their own transaction started.
+   </para>
+
+   <para>
+    Applications using this level must be prepared to retry transactions
+    due to serialization failures.
+   </para>
+
+   <para>
+    <command>UPDATE</command>, <command>DELETE</command>,
+    <command>MERGE</command>, <command>SELECT FOR UPDATE</command>,
+    and <command>SELECT FOR SHARE</command> commands
+    behave the same as <command>SELECT</command>
+    in terms of searching for target rows: they will only find target rows
+    that were committed as of the transaction start time.  However, such a
+    target row might have already been updated (or deleted or locked) by
+    another concurrent transaction by the time it is found.  In this case, the
+    repeatable read transaction will wait for the first updating transaction to commit or
+    roll back (if it is still in progress).  If the first updater rolls back,
+    then its effects are negated and the repeatable read transaction can proceed
+    with updating the originally found row.  But if the first updater commits
+    (and actually updated or deleted the row, not just locked it)
+    then the repeatable read transaction will be rolled back with the message
+
+<screen>
+ERROR:  could not serialize access due to concurrent update
+</screen>
+
+    because a repeatable read transaction cannot modify or lock rows changed by
+    other transactions after the repeatable read transaction began.
+   </para>
+
+   <para>
+    When an application receives this error message, it should abort
+    the current transaction and retry the whole transaction from
+    the beginning.  The second time through, the transaction will see the
+    previously-committed change as part of its initial view of the database,
+    so there is no logical conflict in using the new version of the row
+    as the starting point for the new transaction's update.
+   </para>
+
+   <para>
+    Note that only updating transactions might need to be retried; read-only
+    transactions will never have serialization conflicts.
+   </para>
+
+   <para>
+    The Repeatable Read mode provides a rigorous guarantee that each
+    transaction sees a completely stable view of the database.  However,
+    this view will not necessarily always be consistent with some serial
+    (one at a time) execution of concurrent transactions of the same level.
+    For example, even a read-only transaction at this level may see a
+    control record updated to show that a batch has been completed but
+    <emphasis>not</emphasis> see one of the detail records which is logically
+    part of the batch because it read an earlier revision of the control
+    record.  Attempts to enforce business rules by transactions running at
+    this isolation level are not likely to work correctly without careful use
+    of explicit locks to block conflicting transactions.
+   </para>
+
+   <para>
+    The Repeatable Read isolation level is implemented using a technique
+    known in academic database literature and in some other database products
+    as <firstterm>Snapshot Isolation</firstterm>.  Differences in behavior
+    and performance may be observed when compared with systems that use a
+    traditional locking technique that reduces concurrency.  Some other
+    systems may even offer Repeatable Read and Snapshot Isolation as distinct
+    isolation levels with different behavior.  The permitted phenomena that
+    distinguish the two techniques were not formalized by database researchers
+    until after the SQL standard was developed, and are outside the scope of
+    this manual.  For a full treatment, please see
+    <xref linkend="berenson95"/>.
+   </para>
+
+   <note>
+    <para>
+     Prior to <productname>PostgreSQL</productname> version 9.1, a request
+     for the Serializable transaction isolation level provided exactly the
+     same behavior described here.  To retain the legacy Serializable
+     behavior, Repeatable Read should now be requested.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="xact-serializable">
+   <title>Serializable Isolation Level</title>
+
+   <indexterm>
+    <primary>transaction isolation level</primary>
+    <secondary>serializable</secondary>
+   </indexterm>
+
+   <indexterm>
+    <primary>serializable</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>predicate locking</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>serialization anomaly</primary>
+   </indexterm>
+
+   <para>
+    The <firstterm>Serializable</firstterm> isolation level provides
+    the strictest transaction isolation.  This level emulates serial
+    transaction execution for all committed transactions;
+    as if transactions had been executed one after another, serially,
+    rather than concurrently.  However, like the Repeatable Read level,
+    applications using this level must
+    be prepared to retry transactions due to serialization failures.
+    In fact, this isolation level works exactly the same as Repeatable
+    Read except that it also monitors for conditions which could make
+    execution of a concurrent set of serializable transactions behave
+    in a manner inconsistent with all possible serial (one at a time)
+    executions of those transactions.  This monitoring does not
+    introduce any blocking beyond that present in repeatable read, but
+    there is some overhead to the monitoring, and detection of the
+    conditions which could cause a
+    <firstterm>serialization anomaly</firstterm> will trigger a
+    <firstterm>serialization failure</firstterm>.
+   </para>
+
+   <para>
+    As an example,
+    consider a table <structname>mytab</structname>, initially containing:
+<screen>
+ class | value
+-------+-------
+     1 |    10
+     1 |    20
+     2 |   100
+     2 |   200
+</screen>
+    Suppose that serializable transaction A computes:
+<screen>
+SELECT SUM(value) FROM mytab WHERE class = 1;
+</screen>
+    and then inserts the result (30) as the <structfield>value</structfield> in a
+    new row with <structfield>class</structfield><literal> = 2</literal>.  Concurrently, serializable
+    transaction B computes:
+<screen>
+SELECT SUM(value) FROM mytab WHERE class = 2;
+</screen>
+    and obtains the result 300, which it inserts in a new row with
+    <structfield>class</structfield><literal> = 1</literal>.  Then both transactions try to commit.
+    If either transaction were running at the Repeatable Read isolation level,
+    both would be allowed to commit; but since there is no serial order of execution
+    consistent with the result, using Serializable transactions will allow one
+    transaction to commit and will roll the other back with this message:
+
+<screen>
+ERROR:  could not serialize access due to read/write dependencies among transactions
+</screen>
+
+    This is because if A had
+    executed before B, B would have computed the sum 330, not 300, and
+    similarly the other order would have resulted in a different sum
+    computed by A.
+   </para>
+
+   <para>
+    When relying on Serializable transactions to prevent anomalies, it is
+    important that any data read from a permanent user table not be
+    considered valid until the transaction which read it has successfully
+    committed.  This is true even for read-only transactions, except that
+    data read within a <firstterm>deferrable</firstterm> read-only
+    transaction is known to be valid as soon as it is read, because such a
+    transaction waits until it can acquire a snapshot guaranteed to be free
+    from such problems before starting to read any data.  In all other cases
+    applications must not depend on results read during a transaction that
+    later aborted; instead, they should retry the transaction until it
+    succeeds.
+   </para>
+
+   <para>
+    To guarantee true serializability <productname>PostgreSQL</productname>
+    uses <firstterm>predicate locking</firstterm>, which means that it keeps locks
+    which allow it to determine when a write would have had an impact on
+    the result of a previous read from a concurrent transaction, had it run
+    first.  In <productname>PostgreSQL</productname> these locks do not
+    cause any blocking and therefore can <emphasis>not</emphasis> play any part in
+    causing a deadlock.  They are used to identify and flag dependencies
+    among concurrent Serializable transactions which in certain combinations
+    can lead to serialization anomalies.  In contrast, a Read Committed or
+    Repeatable Read transaction which wants to ensure data consistency may
+    need to take out a lock on an entire table, which could block other
+    users attempting to use that table, or it may use <literal>SELECT FOR
+    UPDATE</literal> or <literal>SELECT FOR SHARE</literal> which not only
+    can block other transactions but cause disk access.
+   </para>
+
+   <para>
+    Predicate locks in <productname>PostgreSQL</productname>, like in most
+    other database systems, are based on data actually accessed by a
+    transaction.  These will show up in the
+    <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
+    system view with a <literal>mode</literal> of <literal>SIReadLock</literal>.  The
+    particular locks
+    acquired during execution of a query will depend on the plan used by
+    the query, and multiple finer-grained locks (e.g., tuple locks) may be
+    combined into fewer coarser-grained locks (e.g., page locks) during the
+    course of the transaction to prevent exhaustion of the memory used to
+    track the locks.  A <literal>READ ONLY</literal> transaction may be able to
+    release its SIRead locks before completion, if it detects that no
+    conflicts can still occur which could lead to a serialization anomaly.
+    In fact, <literal>READ ONLY</literal> transactions will often be able to
+    establish that fact at startup and avoid taking any predicate locks.
+    If you explicitly request a <literal>SERIALIZABLE READ ONLY DEFERRABLE</literal>
+    transaction, it will block until it can establish this fact.  (This is
+    the <emphasis>only</emphasis> case where Serializable transactions block but
+    Repeatable Read transactions don't.)  On the other hand, SIRead locks
+    often need to be kept past transaction commit, until overlapping read
+    write transactions complete.
+   </para>
+
+   <para>
+    Consistent use of Serializable transactions can simplify development.
+    The guarantee that any set of successfully committed concurrent
+    Serializable transactions will have the same effect as if they were run
+    one at a time means that if you can demonstrate that a single transaction,
+    as written, will do the right thing when run by itself, you can have
+    confidence that it will do the right thing in any mix of Serializable
+    transactions, even without any information about what those other
+    transactions might do, or it will not successfully commit.  It is
+    important that an environment which uses this technique have a
+    generalized way of handling serialization failures (which always return
+    with an SQLSTATE value of '40001'), because it will be very hard to
+    predict exactly which transactions might contribute to the read/write
+    dependencies and need to be rolled back to prevent serialization
+    anomalies.  The monitoring of read/write dependencies has a cost, as does
+    the restart of transactions which are terminated with a serialization
+    failure, but balanced against the cost and blocking involved in use of
+    explicit locks and <literal>SELECT FOR UPDATE</literal> or <literal>SELECT FOR
+    SHARE</literal>, Serializable transactions are the best performance choice
+    for some environments.
+   </para>
+
+   <para>
+    While <productname>PostgreSQL</productname>'s Serializable transaction isolation
+    level only allows concurrent transactions to commit if it can prove there
+    is a serial order of execution that would produce the same effect, it
+    doesn't always prevent errors from being raised that would not occur in
+    true serial execution.  In particular, it is possible to see unique
+    constraint violations caused by conflicts with overlapping Serializable
+    transactions even after explicitly checking that the key isn't present
+    before attempting to insert it.  This can be avoided by making sure
+    that <emphasis>all</emphasis> Serializable transactions that insert potentially
+    conflicting keys explicitly check if they can do so first.  For example,
+    imagine an application that asks the user for a new key and then checks
+    that it doesn't exist already by trying to select it first, or generates
+    a new key by selecting the maximum existing key and adding one.  If some
+    Serializable transactions insert new keys directly without following this
+    protocol, unique constraints violations might be reported even in cases
+    where they could not occur in a serial execution of the concurrent
+    transactions.
+   </para>
+
+   <para>
+    For optimal performance when relying on Serializable transactions for
+    concurrency control, these issues should be considered:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Declare transactions as <literal>READ ONLY</literal> when possible.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Control the number of active connections, using a connection pool if
+       needed.  This is always an important performance consideration, but
+       it can be particularly important in a busy system using Serializable
+       transactions.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Don't put more into a single transaction than needed for integrity
+       purposes.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Don't leave connections dangling <quote>idle in transaction</quote>
+       longer than necessary.  The configuration parameter
+       <xref linkend="guc-idle-in-transaction-session-timeout"/> may be used to
+       automatically disconnect lingering sessions.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Eliminate explicit locks, <literal>SELECT FOR UPDATE</literal>, and
+       <literal>SELECT FOR SHARE</literal> where no longer needed due to the
+       protections automatically provided by Serializable transactions.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       When the system is forced to combine multiple page-level predicate
+       locks into a single relation-level predicate lock because the predicate
+       lock table is short of memory, an increase in the rate of serialization
+       failures may occur.  You can avoid this by increasing
+       <xref linkend="guc-max-pred-locks-per-transaction"/>,
+       <xref linkend="guc-max-pred-locks-per-relation"/>, and/or
+       <xref linkend="guc-max-pred-locks-per-page"/>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       A sequential scan will always necessitate a relation-level predicate
+       lock.  This can result in an increased rate of serialization failures.
+       It may be helpful to encourage the use of index scans by reducing
+       <xref linkend="guc-random-page-cost"/> and/or increasing
+       <xref linkend="guc-cpu-tuple-cost"/>.  Be sure to weigh any decrease
+       in transaction rollbacks and restarts against any overall change in
+       query execution time.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    The Serializable isolation level is implemented using a technique known
+    in academic database literature as Serializable Snapshot Isolation, which
+    builds on Snapshot Isolation by adding checks for serialization anomalies.
+    Some differences in behavior and performance may be observed when compared
+    with other systems that use a traditional locking technique.  Please see
+    <xref linkend="ports12"/> for detailed information.
+   </para>
+  </sect2>
+ </sect1>
+
+  <sect1 id="explicit-locking">
+   <title>Explicit Locking</title>
+
+   <indexterm>
+    <primary>lock</primary>
+   </indexterm>
+
+   <para>
+    <productname>PostgreSQL</productname> provides various lock modes
+    to control concurrent access to data in tables.  These modes can
+    be used for application-controlled locking in situations where
+    <acronym>MVCC</acronym> does not give the desired behavior.  Also,
+    most <productname>PostgreSQL</productname> commands automatically
+    acquire locks of appropriate modes to ensure that referenced
+    tables are not dropped or modified in incompatible ways while the
+    command executes.  (For example, <command>TRUNCATE</command> cannot safely be
+    executed concurrently with other operations on the same table, so it
+    obtains an <literal>ACCESS EXCLUSIVE</literal> lock on the table to
+    enforce that.)
+   </para>
+
+   <para>
+    To examine a list of the currently outstanding locks in a database
+    server, use the
+    <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
+    system view. For more information on monitoring the status of the lock
+    manager subsystem, refer to <xref linkend="monitoring"/>.
+   </para>
+
+  <sect2 id="locking-tables">
+   <title>Table-Level Locks</title>
+
+   <indexterm zone="locking-tables">
+    <primary>LOCK</primary>
+   </indexterm>
+
+   <para>
+    The list below shows the available lock modes and the contexts in
+    which they are used automatically by
+    <productname>PostgreSQL</productname>.  You can also acquire any
+    of these locks explicitly with the command <xref
+    linkend="sql-lock"/>.
+    Remember that all of these lock modes are table-level locks,
+    even if the name contains the word
+    <quote>row</quote>; the names of the lock modes are historical.
+    To some extent the names reflect the typical usage of each lock
+    mode &mdash; but the semantics are all the same.  The only real difference
+    between one lock mode and another is the set of lock modes with
+    which each conflicts (see <xref linkend="table-lock-compatibility"/>).
+    Two transactions cannot hold locks of conflicting
+    modes on the same table at the same time.  (However, a transaction
+    never conflicts with itself.  For example, it might acquire
+    <literal>ACCESS EXCLUSIVE</literal> lock and later acquire
+    <literal>ACCESS SHARE</literal> lock on the same table.)  Non-conflicting
+    lock modes can be held concurrently by many transactions.  Notice in
+    particular that some lock modes are self-conflicting (for example,
+    an <literal>ACCESS EXCLUSIVE</literal> lock cannot be held by more than one
+    transaction at a time) while others are not self-conflicting (for example,
+    an <literal>ACCESS SHARE</literal> lock can be held by multiple transactions).
+   </para>
+
+     <variablelist>
+      <title>Table-Level Lock Modes</title>
+      <varlistentry>
+       <term>
+        <literal>ACCESS SHARE</literal> (<literal>AccessShareLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
+         mode only.
+        </para>
+
+        <para>
+         The <command>SELECT</command> command acquires a lock of this mode on
+         referenced tables.  In general, any query that only <emphasis>reads</emphasis> a table
+         and does not modify it will acquire this lock mode.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>ROW SHARE</literal> (<literal>RowShareLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>EXCLUSIVE</literal> and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        </para>
+
+        <para>
+         The <command>SELECT</command> command acquires a lock of this mode
+         on all tables on which one of the <option>FOR UPDATE</option>,
+         <option>FOR NO KEY UPDATE</option>,
+         <option>FOR SHARE</option>, or
+         <option>FOR KEY SHARE</option> options is specified
+         (in addition to <literal>ACCESS SHARE</literal> locks on any other
+         tables that are referenced without any explicit
+         <option>FOR ...</option> locking option).
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>ROW EXCLUSIVE</literal> (<literal>RowExclusiveLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW
+         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        </para>
+
+        <para>
+         The commands <command>UPDATE</command>,
+         <command>DELETE</command>, <command>INSERT</command>, and
+         <command>MERGE</command>
+         acquire this lock mode on the target table (in addition to
+         <literal>ACCESS SHARE</literal> locks on any other referenced
+         tables).  In general, this lock mode will be acquired by any
+         command that <emphasis>modifies data</emphasis> in a table.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>SHARE UPDATE EXCLUSIVE</literal> (<literal>ShareUpdateExclusiveLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>,
+         <literal>SHARE</literal>, <literal>SHARE ROW
+         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+         This mode protects a table against
+         concurrent schema changes and <command>VACUUM</command> runs.
+        </para>
+
+        <para>
+         Acquired by <command>VACUUM</command> (without <option>FULL</option>),
+         <command>ANALYZE</command>, <command>CREATE INDEX CONCURRENTLY</command>,
+         <command>CREATE STATISTICS</command>, <command>COMMENT ON</command>,
+         <command>REINDEX CONCURRENTLY</command>,
+         and certain <link linkend="sql-alterindex"><command>ALTER INDEX</command></link>
+         and <link linkend="sql-altertable"><command>ALTER TABLE</command></link> variants
+         (for full details see the documentation of these commands).
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>SHARE</literal> (<literal>ShareLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>ROW EXCLUSIVE</literal>,
+         <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW
+         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+         This mode protects a table against concurrent data changes.
+        </para>
+
+        <para>
+         Acquired by <command>CREATE INDEX</command>
+         (without <option>CONCURRENTLY</option>).
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>SHARE ROW EXCLUSIVE</literal> (<literal>ShareRowExclusiveLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>ROW EXCLUSIVE</literal>,
+         <literal>SHARE UPDATE EXCLUSIVE</literal>,
+         <literal>SHARE</literal>, <literal>SHARE ROW
+         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+         This mode protects a table against concurrent data changes, and
+         is self-exclusive so that only one session can hold it at a time.
+        </para>
+
+        <para>
+         Acquired by <command>CREATE TRIGGER</command> and some forms of
+         <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>EXCLUSIVE</literal> (<literal>ExclusiveLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW
+         EXCLUSIVE</literal>, <literal>SHARE UPDATE
+         EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
+         ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal> lock modes.
+         This mode allows only concurrent <literal>ACCESS SHARE</literal> locks,
+         i.e., only reads from the table can proceed in parallel with a
+         transaction holding this lock mode.
+        </para>
+
+        <para>
+         Acquired by <command>REFRESH MATERIALIZED VIEW CONCURRENTLY</command>.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>ACCESS EXCLUSIVE</literal> (<literal>AccessExclusiveLock</literal>)
+       </term>
+       <listitem>
+        <para>
+         Conflicts with locks of all modes (<literal>ACCESS
+         SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW
+         EXCLUSIVE</literal>, <literal>SHARE UPDATE
+         EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
+         ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
+         <literal>ACCESS EXCLUSIVE</literal>).
+         This mode guarantees that the
+         holder is the only transaction accessing the table in any way.
+        </para>
+
+        <para>
+         Acquired by the <command>DROP TABLE</command>,
+         <command>TRUNCATE</command>, <command>REINDEX</command>,
+         <command>CLUSTER</command>, <command>VACUUM FULL</command>,
+         and <command>REFRESH MATERIALIZED VIEW</command> (without
+         <option>CONCURRENTLY</option>)
+         commands. Many forms of <command>ALTER INDEX</command> and <command>ALTER TABLE</command> also acquire
+         a lock at this level. This is also the default lock mode for
+         <command>LOCK TABLE</command> statements that do not specify
+         a mode explicitly.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+
+     <tip>
+      <para>
+       Only an <literal>ACCESS EXCLUSIVE</literal> lock blocks a
+       <command>SELECT</command> (without <option>FOR UPDATE/SHARE</option>)
+       statement.
+      </para>
+     </tip>
+
+   <para>
+    Once acquired, a lock is normally held until the end of the transaction.  But if a
+    lock is acquired after establishing a savepoint, the lock is released
+    immediately if the savepoint is rolled back to.  This is consistent with
+    the principle that <command>ROLLBACK</command> cancels all effects of the
+    commands since the savepoint.  The same holds for locks acquired within a
+    <application>PL/pgSQL</application> exception block: an error escape from the block
+    releases locks acquired within it.
+   </para>
+
+
+
+    <table tocentry="1" id="table-lock-compatibility">
+     <title>Conflicting Lock Modes</title>
+     <tgroup cols="9">
+      <colspec colnum="1" colwidth="1.25*"/>
+      <colspec colnum="2" colwidth="1*" colname="lockst"/>
+      <colspec colnum="3" colwidth="1*"/>
+      <colspec colnum="4" colwidth="1*"/>
+      <colspec colnum="5" colwidth="1*"/>
+      <colspec colnum="6" colwidth="1*"/>
+      <colspec colnum="7" colwidth="1*"/>
+      <colspec colnum="8" colwidth="1*"/>
+      <colspec colnum="9" colwidth="1*" colname="lockend"/>
+      <spanspec spanname="lockreq" namest="lockst" nameend="lockend" align="center"/>
+      <thead>
+       <row>
+        <entry morerows="1">Requested Lock Mode</entry>
+        <entry spanname="lockreq">Existing Lock Mode</entry>
+       </row>
+       <row>
+        <entry><literal>ACCESS SHARE</literal></entry>
+        <entry><literal>ROW SHARE</literal></entry>
+        <entry><literal>ROW EXCL.</literal></entry>
+        <entry><literal>SHARE UPDATE EXCL.</literal></entry>
+        <entry><literal>SHARE</literal></entry>
+        <entry><literal>SHARE ROW EXCL.</literal></entry>
+        <entry><literal>EXCL.</literal></entry>
+        <entry><literal>ACCESS EXCL.</literal></entry>
+       </row>
+      </thead>
+      <tbody>
+       <row>
+        <entry><literal>ACCESS SHARE</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>ROW SHARE</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>ROW EXCL.</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>SHARE UPDATE EXCL.</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>SHARE</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>SHARE ROW EXCL.</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>EXCL.</literal></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry><literal>ACCESS EXCL.</literal></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+   </sect2>
+
+   <sect2 id="locking-rows">
+    <title>Row-Level Locks</title>
+
+    <para>
+     In addition to table-level locks, there are row-level locks, which
+     are listed as below with the contexts in which they are used
+     automatically by <productname>PostgreSQL</productname>.  See
+     <xref linkend="row-lock-compatibility"/> for a complete table of
+     row-level lock conflicts.  Note that a transaction can hold
+     conflicting locks on the same row, even in different subtransactions;
+     but other than that, two transactions can never hold conflicting locks
+     on the same row.  Row-level locks do not affect data querying; they
+     block only <emphasis>writers and lockers</emphasis> to the same
+     row.  Row-level locks are released at transaction end or during
+     savepoint rollback, just like table-level locks.
+
+    </para>
+
+     <variablelist>
+      <title>Row-Level Lock Modes</title>
+      <varlistentry>
+       <term>
+        <literal>FOR UPDATE</literal>
+       </term>
+       <listitem>
+        <para>
+         <literal>FOR UPDATE</literal> causes the rows retrieved by the
+         <command>SELECT</command> statement to be locked as though for
+         update.  This prevents them from being locked, modified or deleted by
+         other transactions until the current transaction ends.  That is,
+         other transactions that attempt <command>UPDATE</command>,
+         <command>DELETE</command>,
+         <command>SELECT FOR UPDATE</command>,
+         <command>SELECT FOR NO KEY UPDATE</command>,
+         <command>SELECT FOR SHARE</command> or
+         <command>SELECT FOR KEY SHARE</command>
+         of these rows will be blocked until the current transaction ends;
+         conversely, <command>SELECT FOR UPDATE</command> will wait for a
+         concurrent transaction that has run any of those commands on the
+         same row,
+         and will then lock and return the updated row (or no row, if the
+         row was deleted).  Within a <literal>REPEATABLE READ</literal> or
+         <literal>SERIALIZABLE</literal> transaction,
+         however, an error will be thrown if a row to be locked has changed
+         since the transaction started.  For further discussion see
+         <xref linkend="applevel-consistency"/>.
+        </para>
+        <para>
+         The <literal>FOR UPDATE</literal> lock mode
+         is also acquired by any <command>DELETE</command> on a row, and also by an
+         <command>UPDATE</command> that modifies the values of certain columns.  Currently,
+         the set of columns considered for the <command>UPDATE</command> case are those that
+         have a unique index on them that can be used in a foreign key (so partial
+         indexes and expressional indexes are not considered), but this may change
+         in the future.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>FOR NO KEY UPDATE</literal>
+       </term>
+       <listitem>
+        <para>
+         Behaves similarly to <literal>FOR UPDATE</literal>, except that the lock
+         acquired is weaker: this lock will not block
+         <literal>SELECT FOR KEY SHARE</literal> commands that attempt to acquire
+         a lock on the same rows. This lock mode is also acquired by any
+         <command>UPDATE</command> that does not acquire a <literal>FOR UPDATE</literal> lock.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>FOR SHARE</literal>
+       </term>
+       <listitem>
+        <para>
+         Behaves similarly to <literal>FOR NO KEY UPDATE</literal>, except that it
+         acquires a shared lock rather than exclusive lock on each retrieved
+         row.  A shared lock blocks other transactions from performing
+         <command>UPDATE</command>, <command>DELETE</command>,
+         <command>SELECT FOR UPDATE</command> or
+         <command>SELECT FOR NO KEY UPDATE</command> on these rows, but it does not
+         prevent them from performing <command>SELECT FOR SHARE</command> or
+         <command>SELECT FOR KEY SHARE</command>.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
+        <literal>FOR KEY SHARE</literal>
+       </term>
+       <listitem>
+        <para>
+         Behaves similarly to <literal>FOR SHARE</literal>, except that the
+         lock is weaker: <literal>SELECT FOR UPDATE</literal> is blocked, but not
+         <literal>SELECT FOR NO KEY UPDATE</literal>.  A key-shared lock blocks
+         other transactions from performing <command>DELETE</command> or
+         any <command>UPDATE</command> that changes the key values, but not
+         other <command>UPDATE</command>, and neither does it prevent
+         <command>SELECT FOR NO KEY UPDATE</command>, <command>SELECT FOR SHARE</command>,
+         or <command>SELECT FOR KEY SHARE</command>.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+
+    <para>
+     <productname>PostgreSQL</productname> doesn't remember any
+     information about modified rows in memory, so there is no limit on
+     the number of rows locked at one time.  However, locking a row
+     might cause a disk write, e.g., <command>SELECT FOR
+     UPDATE</command> modifies selected rows to mark them locked, and so
+     will result in disk writes.
+    </para>
+
+    <table tocentry="1" id="row-lock-compatibility">
+     <title>Conflicting Row-Level Locks</title>
+     <tgroup cols="5">
+      <colspec colname="col1"    colwidth="1.5*"/>
+      <colspec colname="lockst"  colwidth="1*"/>
+      <colspec colname="col3"    colwidth="1*"/>
+      <colspec colname="col4"    colwidth="1*"/>
+      <colspec colname="lockend" colwidth="1*"/>
+      <spanspec namest="lockst" nameend="lockend" spanname="lockreq"/>
+      <thead>
+       <row>
+        <entry morerows="1">Requested Lock Mode</entry>
+        <entry spanname="lockreq">Current Lock Mode</entry>
+       </row>
+       <row>
+        <entry>FOR KEY SHARE</entry>
+        <entry>FOR SHARE</entry>
+        <entry>FOR NO KEY UPDATE</entry>
+        <entry>FOR UPDATE</entry>
+       </row>
+      </thead>
+      <tbody>
+       <row>
+        <entry>FOR KEY SHARE</entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry>FOR SHARE</entry>
+        <entry align="center"></entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry>FOR NO KEY UPDATE</entry>
+        <entry align="center"></entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+       <row>
+        <entry>FOR UPDATE</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+        <entry align="center">X</entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+   </sect2>
+
+   <sect2 id="locking-pages">
+    <title>Page-Level Locks</title>
+
+    <para>
+     In addition to table and row locks, page-level share/exclusive locks are
+     used to control read/write access to table pages in the shared buffer
+     pool.  These locks are released immediately after a row is fetched or
+     updated.  Application developers normally need not be concerned with
+     page-level locks, but they are mentioned here for completeness.
+    </para>
+
+   </sect2>
+
+   <sect2 id="locking-deadlocks">
+    <title>Deadlocks</title>
+
+    <indexterm zone="locking-deadlocks">
+     <primary>deadlock</primary>
+    </indexterm>
+
+    <para>
+     The use of explicit locking can increase the likelihood of
+     <firstterm>deadlocks</firstterm>, wherein two (or more) transactions each
+     hold locks that the other wants.  For example, if transaction 1
+     acquires an exclusive lock on table A and then tries to acquire
+     an exclusive lock on table B, while transaction 2 has already
+     exclusive-locked table B and now wants an exclusive lock on table
+     A, then neither one can proceed.
+     <productname>PostgreSQL</productname> automatically detects
+     deadlock situations and resolves them by aborting one of the
+     transactions involved, allowing the other(s) to complete.
+     (Exactly which transaction will be aborted is difficult to
+     predict and should not be relied upon.)
+    </para>
+
+    <para>
+     Note that deadlocks can also occur as the result of row-level
+     locks (and thus, they can occur even if explicit locking is not
+     used). Consider the case in which two concurrent
+     transactions modify a table. The first transaction executes:
+
+<screen>
+UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;
+</screen>
+
+     This acquires a row-level lock on the row with the specified
+     account number. Then, the second transaction executes:
+
+<screen>
+UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;
+UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;
+</screen>
+
+     The first <command>UPDATE</command> statement successfully
+     acquires a row-level lock on the specified row, so it succeeds in
+     updating that row. However, the second <command>UPDATE</command>
+     statement finds that the row it is attempting to update has
+     already been locked, so it waits for the transaction that
+     acquired the lock to complete. Transaction two is now waiting on
+     transaction one to complete before it continues execution. Now,
+     transaction one executes:
+
+<screen>
+UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
+</screen>
+
+     Transaction one attempts to acquire a row-level lock on the
+     specified row, but it cannot: transaction two already holds such
+     a lock. So it waits for transaction two to complete. Thus,
+     transaction one is blocked on transaction two, and transaction
+     two is blocked on transaction one: a deadlock
+     condition. <productname>PostgreSQL</productname> will detect this
+     situation and abort one of the transactions.
+    </para>
+
+    <para>
+     The best defense against deadlocks is generally to avoid them by
+     being certain that all applications using a database acquire
+     locks on multiple objects in a consistent order. In the example
+     above, if both transactions
+     had updated the rows in the same order, no deadlock would have
+     occurred. One should also ensure that the first lock acquired on
+     an object in a transaction is the most restrictive mode that will be
+     needed for that object.  If it is not feasible to verify this in
+     advance, then deadlocks can be handled on-the-fly by retrying
+     transactions that abort due to deadlocks.
+    </para>
+
+    <para>
+     So long as no deadlock situation is detected, a transaction seeking
+     either a table-level or row-level lock will wait indefinitely for
+     conflicting locks to be released.  This means it is a bad idea for
+     applications to hold transactions open for long periods of time
+     (e.g., while waiting for user input).
+    </para>
+   </sect2>
+
+   <sect2 id="advisory-locks">
+    <title>Advisory Locks</title>
+
+    <indexterm zone="advisory-locks">
+     <primary>advisory lock</primary>
+    </indexterm>
+
+    <indexterm zone="advisory-locks">
+     <primary>lock</primary>
+     <secondary>advisory</secondary>
+    </indexterm>
+
+    <para>
+     <productname>PostgreSQL</productname> provides a means for
+     creating locks that have application-defined meanings.  These are
+     called <firstterm>advisory locks</firstterm>, because the system does not
+     enforce their use &mdash; it is up to the application to use them
+     correctly.  Advisory locks can be useful for locking strategies
+     that are an awkward fit for the MVCC model.
+     For example, a common use of advisory locks is to emulate pessimistic
+     locking strategies typical of so-called <quote>flat file</quote> data
+     management systems.
+     While a flag stored in a table could be used for the same purpose,
+     advisory locks are faster, avoid table bloat, and are automatically
+     cleaned up by the server at the end of the session.
+    </para>
+
+    <para>
+     There are two ways to acquire an advisory lock in
+     <productname>PostgreSQL</productname>: at session level or at
+     transaction level.
+     Once acquired at session level, an advisory lock is held until
+     explicitly released or the session ends.  Unlike standard lock requests,
+     session-level advisory lock requests do not honor transaction semantics:
+     a lock acquired during a transaction that is later rolled back will still
+     be held following the rollback, and likewise an unlock is effective even
+     if the calling transaction fails later.  A lock can be acquired multiple
+     times by its owning process; for each completed lock request there must
+     be a corresponding unlock request before the lock is actually released.
+     Transaction-level lock requests, on the other hand, behave more like
+     regular lock requests: they are automatically released at the end of the
+     transaction, and there is no explicit unlock operation.  This behavior
+     is often more convenient than the session-level behavior for short-term
+     usage of an advisory lock.
+     Session-level and transaction-level lock requests for the same advisory
+     lock identifier will block each other in the expected way.
+     If a session already holds a given advisory lock, additional requests by
+     it will always succeed, even if other sessions are awaiting the lock; this
+     statement is true regardless of whether the existing lock hold and new
+     request are at session level or transaction level.
+    </para>
+
+    <para>
+     Like all locks in
+     <productname>PostgreSQL</productname>, a complete list of advisory locks
+     currently held by any session can be found in the <link
+     linkend="view-pg-locks"><structname>pg_locks</structname></link> system
+     view.
+    </para>
+
+    <para>
+     Both advisory locks and regular locks are stored in a shared memory
+     pool whose size is defined by the configuration variables
+     <xref linkend="guc-max-locks-per-transaction"/> and
+     <xref linkend="guc-max-connections"/>.
+     Care must be taken not to exhaust this
+     memory or the server will be unable to grant any locks at all.
+     This imposes an upper limit on the number of advisory locks
+     grantable by the server, typically in the tens to hundreds of thousands
+     depending on how the server is configured.
+    </para>
+
+    <para>
+     In certain cases using advisory locking methods, especially in queries
+     involving explicit ordering and <literal>LIMIT</literal> clauses, care must be
+     taken to control the locks acquired because of the order in which SQL
+     expressions are evaluated.  For example:
+<screen>
+SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok
+SELECT pg_advisory_lock(id) FROM foo WHERE id &gt; 12345 LIMIT 100; -- danger!
+SELECT pg_advisory_lock(q.id) FROM
+(
+  SELECT id FROM foo WHERE id &gt; 12345 LIMIT 100
+) q; -- ok
+</screen>
+     In the above queries, the second form is dangerous because the
+     <literal>LIMIT</literal> is not guaranteed to be applied before the locking
+     function is executed.  This might cause some locks to be acquired
+     that the application was not expecting, and hence would fail to release
+     (until it ends the session).
+     From the point of view of the application, such locks
+     would be dangling, although still viewable in
+     <structname>pg_locks</structname>.
+    </para>
+
+    <para>
+     The functions provided to manipulate advisory locks are described in
+     <xref linkend="functions-advisory-locks"/>.
+    </para>
+   </sect2>
+
+  </sect1>
+
+  <sect1 id="applevel-consistency">
+   <title>Data Consistency Checks at the Application Level</title>
+
+   <para>
+    It is very difficult to enforce business rules regarding data integrity
+    using Read Committed transactions because the view of the data is
+    shifting with each statement, and even a single statement may not
+    restrict itself to the statement's snapshot if a write conflict occurs.
+   </para>
+
+   <para>
+    While a Repeatable Read transaction has a stable view of the data
+    throughout its execution, there is a subtle issue with using
+    <acronym>MVCC</acronym> snapshots for data consistency checks, involving
+    something known as <firstterm>read/write conflicts</firstterm>.
+    If one transaction writes data and a concurrent transaction attempts
+    to read the same data (whether before or after the write), it cannot
+    see the work of the other transaction.  The reader then appears to have
+    executed first regardless of which started first or which committed
+    first.  If that is as far as it goes, there is no problem, but
+    if the reader also writes data which is read by a concurrent transaction
+    there is now a transaction which appears to have run before either of
+    the previously mentioned transactions.  If the transaction which appears
+    to have executed last actually commits first, it is very easy for a
+    cycle to appear in a graph of the order of execution of the transactions.
+    When such a cycle appears, integrity checks will not work correctly
+    without some help.
+   </para>
+
+   <para>
+    As mentioned in <xref linkend="xact-serializable"/>, Serializable
+    transactions are just Repeatable Read transactions which add
+    nonblocking monitoring for dangerous patterns of read/write conflicts.
+    When a pattern is detected which could cause a cycle in the apparent
+    order of execution, one of the transactions involved is rolled back to
+    break the cycle.
+   </para>
+
+   <sect2 id="serializable-consistency">
+    <title>Enforcing Consistency with Serializable Transactions</title>
+
+    <para>
+     If the Serializable transaction isolation level is used for all writes
+     and for all reads which need a consistent view of the data, no other
+     effort is required to ensure consistency.  Software from other
+     environments which is written to use serializable transactions to
+     ensure consistency should <quote>just work</quote> in this regard in
+     <productname>PostgreSQL</productname>.
+    </para>
+
+    <para>
+     When using this technique, it will avoid creating an unnecessary burden
+     for application programmers if the application software goes through a
+     framework which automatically retries transactions which are rolled
+     back with a serialization failure.  It may be a good idea to set
+     <literal>default_transaction_isolation</literal> to <literal>serializable</literal>.
+     It would also be wise to take some action to ensure that no other
+     transaction isolation level is used, either inadvertently or to
+     subvert integrity checks, through checks of the transaction isolation
+     level in triggers.
+    </para>
+
+    <para>
+     See <xref linkend="xact-serializable"/> for performance suggestions.
+    </para>
+
+    <warning>
+     <para>
+      This level of integrity protection using Serializable transactions
+      does not yet extend to hot standby mode (<xref linkend="hot-standby"/>).
+      Because of that, those using hot standby may want to use Repeatable
+      Read and explicit locking on the primary.
+     </para>
+    </warning>
+   </sect2>
+
+   <sect2 id="non-serializable-consistency">
+    <title>Enforcing Consistency with Explicit Blocking Locks</title>
+
+    <para>
+     When non-serializable writes are possible,
+     to ensure the current validity of a row and protect it against
+     concurrent updates one must use <command>SELECT FOR UPDATE</command>,
+     <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK
+     TABLE</command> statement.  (<command>SELECT FOR UPDATE</command>
+     and <command>SELECT FOR SHARE</command> lock just the
+     returned rows against concurrent updates, while <command>LOCK
+     TABLE</command> locks the whole table.)  This should be taken into
+     account when porting applications to
+     <productname>PostgreSQL</productname> from other environments.
+    </para>
+
+    <para>
+     Also of note to those converting from other environments is the fact
+     that <command>SELECT FOR UPDATE</command> does not ensure that a
+     concurrent transaction will not update or delete a selected row.
+     To do that in <productname>PostgreSQL</productname> you must actually
+     update the row, even if no values need to be changed.
+     <command>SELECT FOR UPDATE</command> <emphasis>temporarily blocks</emphasis>
+     other transactions from acquiring the same lock or executing an
+     <command>UPDATE</command> or <command>DELETE</command> which would
+     affect the locked row, but once the transaction holding this lock
+     commits or rolls back, a blocked transaction will proceed with the
+     conflicting operation unless an actual <command>UPDATE</command> of
+     the row was performed while the lock was held.
+    </para>
+
+    <para>
+     Global validity checks require extra thought under
+     non-serializable <acronym>MVCC</acronym>.
+     For example, a banking application might wish to check that the sum of
+     all credits in one table equals the sum of debits in another table,
+     when both tables are being actively updated.  Comparing the results of two
+     successive <literal>SELECT sum(...)</literal> commands will not work reliably in
+     Read Committed mode, since the second query will likely include the results
+     of transactions not counted by the first.  Doing the two sums in a
+     single repeatable read transaction will give an accurate picture of only the
+     effects of transactions that committed before the repeatable read transaction
+     started &mdash; but one might legitimately wonder whether the answer is still
+     relevant by the time it is delivered.  If the repeatable read transaction
+     itself applied some changes before trying to make the consistency check,
+     the usefulness of the check becomes even more debatable, since now it
+     includes some but not all post-transaction-start changes.  In such cases
+     a careful person might wish to lock all tables needed for the check,
+     in order to get an indisputable picture of current reality.  A
+     <literal>SHARE</literal> mode (or higher) lock guarantees that there are no
+     uncommitted changes in the locked table, other than those of the current
+     transaction.
+    </para>
+
+    <para>
+     Note also that if one is relying on explicit locking to prevent concurrent
+     changes, one should either use Read Committed mode, or in Repeatable Read
+     mode be careful to obtain
+     locks before performing queries.  A lock obtained by a
+     repeatable read transaction guarantees that no other transactions modifying
+     the table are still running, but if the snapshot seen by the
+     transaction predates obtaining the lock, it might predate some now-committed
+     changes in the table.  A repeatable read transaction's snapshot is actually
+     frozen at the start of its first query or data-modification command
+     (<literal>SELECT</literal>, <literal>INSERT</literal>,
+     <literal>UPDATE</literal>, <literal>DELETE</literal>, or
+     <literal>MERGE</literal>), so it is possible to obtain locks explicitly
+     before the snapshot is frozen.
+    </para>
+   </sect2>
+  </sect1>
+
+  <sect1 id="mvcc-serialization-failure-handling">
+   <title>Serialization Failure Handling</title>
+
+   <indexterm>
+    <primary>serialization failure</primary>
+   </indexterm>
+   <indexterm>
+    <primary>retryable error</primary>
+   </indexterm>
+
+   <para>
+    Both Repeatable Read and Serializable isolation levels can produce
+    errors that are designed to prevent serialization anomalies.  As
+    previously stated, applications using these levels must be prepared to
+    retry transactions that fail due to serialization errors.  Such an
+    error's message text will vary according to the precise circumstances,
+    but it will always have the SQLSTATE code <literal>40001</literal>
+    (<literal>serialization_failure</literal>).
+   </para>
+
+   <para>
+    It may also be advisable to retry deadlock failures.
+    These have the SQLSTATE code <literal>40P01</literal>
+    (<literal>deadlock_detected</literal>).
+   </para>
+
+   <para>
+    In some cases it is also appropriate to retry unique-key failures,
+    which have SQLSTATE code <literal>23505</literal>
+    (<literal>unique_violation</literal>), and exclusion constraint
+    failures, which have SQLSTATE code <literal>23P01</literal>
+    (<literal>exclusion_violation</literal>).  For example, if the
+    application selects a new value for a primary key column after
+    inspecting the currently stored keys, it could get a unique-key
+    failure because another application instance selected the same new key
+    concurrently.  This is effectively a serialization failure, but the
+    server will not detect it as such because it cannot <quote>see</quote>
+    the connection between the inserted value and the previous reads.
+    There are also some corner cases in which the server will issue a
+    unique-key or exclusion constraint error even though in principle it
+    has enough information to determine that a serialization problem
+    is the underlying cause.  While it's recommendable to just
+    retry <literal>serialization_failure</literal> errors unconditionally,
+    more care is needed when retrying these other error codes, since they
+    might represent persistent error conditions rather than transient
+    failures.
+   </para>
+
+   <para>
+    It is important to retry the complete transaction, including all logic
+    that decides which SQL to issue and/or which values to use.
+    Therefore, <productname>PostgreSQL</productname> does not offer an
+    automatic retry facility, since it cannot do so with any guarantee of
+    correctness.
+   </para>
+
+   <para>
+    Transaction retry does not guarantee that the retried transaction will
+    complete; multiple retries may be needed.  In cases with very high
+    contention, it is possible that completion of a transaction may take
+    many attempts.  In cases involving a conflicting prepared transaction,
+    it may not be possible to make progress until the prepared transaction
+    commits or rolls back.
+   </para>
+  </sect1>
+
+  <sect1 id="mvcc-caveats">
+   <title>Caveats</title>
+
+   <para>
+    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    MVCC-safe.  This means that after the truncation or rewrite commits, the
+    table will appear empty to concurrent transactions, if they are using a
+    snapshot taken before the DDL command committed.  This will only be an
+    issue for a transaction that did not access the table in question
+    before the DDL command started &mdash; any transaction that has done so
+    would hold at least an <literal>ACCESS SHARE</literal> table lock,
+    which would block the DDL command until that transaction completes.
+    So these commands will not cause any apparent inconsistency in the
+    table contents for successive queries on the target table, but they
+    could cause visible inconsistency between the contents of the target
+    table and other tables in the database.
+   </para>
+
+   <para>
+    Support for the Serializable transaction isolation level has not yet
+    been added to hot standby replication targets (described in
+    <xref linkend="hot-standby"/>).  The strictest isolation level currently
+    supported in hot standby mode is Repeatable Read.  While performing all
+    permanent database writes within Serializable transactions on the
+    primary will ensure that all standbys will eventually reach a consistent
+    state, a Repeatable Read transaction run on the standby can sometimes
+    see a transient state that is inconsistent with any serial execution
+    of the transactions on the primary.
+   </para>
+
+   <para>
+    Internal access to the system catalogs is not done using the isolation
+    level of the current transaction.  This means that newly created database
+    objects such as tables are visible to concurrent Repeatable Read and
+    Serializable transactions, even though the rows they contain are not.  In
+    contrast, queries that explicitly examine the system catalogs don't see
+    rows representing concurrently created database objects, in the higher
+    isolation levels.
+   </para>
+  </sect1>
+
+  <sect1 id="locking-indexes">
+   <title>Locking and Indexes</title>
+
+   <indexterm zone="locking-indexes">
+    <primary>index</primary>
+    <secondary>locks</secondary>
+   </indexterm>
+
+   <para>
+    Though <productname>PostgreSQL</productname>
+    provides nonblocking read/write access to table
+    data, nonblocking read/write access is not currently offered for every
+    index access method implemented
+    in <productname>PostgreSQL</productname>.
+    The various index types are handled as follows:
+
+    <variablelist>
+     <varlistentry>
+      <term>
+       B-tree, <acronym>GiST</acronym> and <acronym>SP-GiST</acronym> indexes
+      </term>
+      <listitem>
+       <para>
+        Short-term share/exclusive page-level locks are used for
+        read/write access. Locks are released immediately after each
+        index row is fetched or inserted.  These index types provide
+        the highest concurrency without deadlock conditions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>
+       Hash indexes
+      </term>
+      <listitem>
+       <para>
+        Share/exclusive hash-bucket-level locks are used for read/write
+        access.  Locks are released after the whole bucket is processed.
+        Bucket-level locks provide better concurrency than index-level
+        ones, but deadlock is possible since the locks are held longer
+        than one index operation.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>
+       <acronym>GIN</acronym> indexes
+      </term>
+      <listitem>
+       <para>
+        Short-term share/exclusive page-level locks are used for
+        read/write access. Locks are released immediately after each
+        index row is fetched or inserted. But note that insertion of a
+        GIN-indexed value usually produces several index key insertions
+        per row, so GIN might do substantial work for a single value's
+        insertion.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+
+   <para>
+    Currently, B-tree indexes offer the best performance for concurrent
+    applications; since they also have more features than hash
+    indexes, they are the recommended index type for concurrent
+    applications that need to index scalar data. When dealing with
+    non-scalar data, B-trees are not useful, and GiST, SP-GiST or GIN
+    indexes should be used instead.
+   </para>
+  </sect1>
+ </chapter>