1 files changed, 1116 insertions, 0 deletions
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
new file mode 100644
index 0000000..759ea5a
--- /dev/null
+++ b/doc/src/sgml/maintenance.sgml
@@ -0,0 +1,1116 @@
+<!-- doc/src/sgml/maintenance.sgml -->
+
+<chapter id="maintenance">
+ <title>Routine Database Maintenance Tasks</title>
+
+ <indexterm zone="maintenance">
+  <primary>maintenance</primary>
+ </indexterm>
+
+ <indexterm zone="maintenance">
+  <primary>routine maintenance</primary>
+ </indexterm>
+
+  <para>
+   <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
+   be performed regularly to achieve optimum performance. The tasks
+   discussed here are <emphasis>required</emphasis>, but they
+   are repetitive in nature and can easily be automated using standard
+   tools such as <application>cron</application> scripts or
+   Windows' <application>Task Scheduler</application>.  It is the database
+   administrator's responsibility to set up appropriate scripts, and to
+   check that they execute successfully.
+  </para>
+
+  <para>
+   One obvious maintenance task is the creation of backup copies of the data on a
+   regular schedule.  Without a recent backup, you have no chance of recovery
+   after a catastrophe (disk failure, fire, mistakenly dropping a critical
+   table, etc.).  The backup and recovery mechanisms available in
+   <productname>PostgreSQL</productname> are discussed at length in
+   <xref linkend="backup"/>.
+  </para>
+
+  <para>
+   The other main category of maintenance task is periodic <quote>vacuuming</quote>
+   of the database.  This activity is discussed in
+   <xref linkend="routine-vacuuming"/>.  Closely related to this is updating
+   the statistics that will be used by the query planner, as discussed in
+   <xref linkend="vacuum-for-statistics"/>.
+  </para>
+
+  <para>
+   Another task that might need periodic attention is log file management.
+   This is discussed in <xref linkend="logfile-maintenance"/>.
+  </para>
+
+  <para>
+   <ulink
+   url="https://bucardo.org/check_postgres/"><application>check_postgres</application></ulink>
+   is available for monitoring database health and reporting unusual
+   conditions.  <application>check_postgres</application> integrates with
+   Nagios and MRTG, but can be run standalone too.
+  </para>
+
+  <para>
+   <productname>PostgreSQL</productname> is low-maintenance compared
+   to some other database management systems.  Nonetheless,
+   appropriate attention to these tasks will go far towards ensuring a
+   pleasant and productive experience with the system.
+  </para>
+
+ <sect1 id="routine-vacuuming">
+  <title>Routine Vacuuming</title>
+
+  <indexterm zone="routine-vacuuming">
+   <primary>vacuum</primary>
+  </indexterm>
+
+  <para>
+   <productname>PostgreSQL</productname> databases require periodic
+   maintenance known as <firstterm>vacuuming</firstterm>.  For many installations, it
+   is sufficient to let vacuuming be performed by the <firstterm>autovacuum
+   daemon</firstterm>, which is described in <xref linkend="autovacuum"/>.  You might
+   need to adjust the autovacuuming parameters described there to obtain best
+   results for your situation.  Some database administrators will want to
+   supplement or replace the daemon's activities with manually-managed
+   <command>VACUUM</command> commands, which typically are executed according to a
+   schedule by <application>cron</application> or <application>Task
+   Scheduler</application> scripts.  To set up manually-managed vacuuming properly,
+   it is essential to understand the issues discussed in the next few
+   subsections.  Administrators who rely on autovacuuming may still wish
+   to skim this material to help them understand and adjust autovacuuming.
+  </para>
+
+  <sect2 id="vacuum-basics">
+   <title>Vacuuming Basics</title>
+
+   <para>
+    <productname>PostgreSQL</productname>'s
+    <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
+    process each table on a regular basis for several reasons:
+
+    <orderedlist>
+     <listitem>
+      <simpara>To recover or reuse disk space occupied by updated or deleted
+      rows.</simpara>
+     </listitem>
+
+     <listitem>
+      <simpara>To update data statistics used by the
+      <productname>PostgreSQL</productname> query planner.</simpara>
+     </listitem>
+
+     <listitem>
+      <simpara>To update the visibility map, which speeds
+      up <link linkend="indexes-index-only-scans">index-only
+      scans</link>.</simpara>
+     </listitem>
+
+     <listitem>
+      <simpara>To protect against loss of very old data due to
+      <firstterm>transaction ID wraparound</firstterm> or
+      <firstterm>multixact ID wraparound</firstterm>.</simpara>
+     </listitem>
+    </orderedlist>
+
+    Each of these reasons dictates performing <command>VACUUM</command> operations
+    of varying frequency and scope, as explained in the following subsections.
+   </para>
+
+   <para>
+    There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
+    and <command>VACUUM FULL</command>.  <command>VACUUM FULL</command> can reclaim more
+    disk space but runs much more slowly.  Also,
+    the standard form of <command>VACUUM</command> can run in parallel with production
+    database operations.  (Commands such as <command>SELECT</command>,
+    <command>INSERT</command>, <command>UPDATE</command>, and
+    <command>DELETE</command> will continue to function normally, though you
+    will not be able to modify the definition of a table with commands such as
+    <command>ALTER TABLE</command> while it is being vacuumed.)
+    <command>VACUUM FULL</command> requires an
+    <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
+    working on, and therefore cannot be done in parallel with other use
+    of the table.  Generally, therefore,
+    administrators should strive to use standard <command>VACUUM</command> and
+    avoid <command>VACUUM FULL</command>.
+   </para>
+
+   <para>
+    <command>VACUUM</command> creates a substantial amount of I/O
+    traffic, which can cause poor performance for other active sessions.
+    There are configuration parameters that can be adjusted to reduce the
+    performance impact of background vacuuming &mdash; see
+    <xref linkend="runtime-config-resource-vacuum-cost"/>.
+   </para>
+  </sect2>
+
+  <sect2 id="vacuum-for-space-recovery">
+   <title>Recovering Disk Space</title>
+
+   <indexterm zone="vacuum-for-space-recovery">
+    <primary>disk space</primary>
+   </indexterm>
+
+   <para>
+    In <productname>PostgreSQL</productname>, an
+    <command>UPDATE</command> or <command>DELETE</command> of a row does not
+    immediately remove the old version of the row.
+    This approach is necessary to gain the benefits of multiversion
+    concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version
+    must not be deleted while it is still potentially visible to other
+    transactions. But eventually, an outdated or deleted row version is no
+    longer of interest to any transaction. The space it occupies must then be
+    reclaimed for reuse by new rows, to avoid unbounded growth of disk
+    space requirements. This is done by running <command>VACUUM</command>.
+   </para>
+
+   <para>
+    The standard form of <command>VACUUM</command> removes dead row
+    versions in tables and indexes and marks the space available for
+    future reuse.  However, it will not return the space to the operating
+    system, except in the special case where one or more pages at the
+    end of a table become entirely free and an exclusive table lock can be
+    easily obtained.  In contrast, <command>VACUUM FULL</command> actively compacts
+    tables by writing a complete new version of the table file with no dead
+    space.  This minimizes the size of the table, but can take a long time.
+    It also requires extra disk space for the new copy of the table, until
+    the operation completes.
+   </para>
+
+   <para>
+    The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s
+    often enough to avoid needing <command>VACUUM FULL</command>.  The
+    autovacuum daemon attempts to work this way, and in fact will
+    never issue <command>VACUUM FULL</command>.  In this approach, the idea
+    is not to keep tables at their minimum size, but to maintain steady-state
+    usage of disk space: each table occupies space equivalent to its
+    minimum size plus however much space gets used up between vacuum runs.
+    Although <command>VACUUM FULL</command> can be used to shrink a table back
+    to its minimum size and return the disk space to the operating system,
+    there is not much point in this if the table will just grow again in the
+    future.  Thus, moderately-frequent standard <command>VACUUM</command> runs are a
+    better approach than infrequent <command>VACUUM FULL</command> runs for
+    maintaining heavily-updated tables.
+   </para>
+
+   <para>
+    Some administrators prefer to schedule vacuuming themselves, for example
+    doing all the work at night when load is low.
+    The difficulty with doing vacuuming according to a fixed schedule
+    is that if a table has an unexpected spike in update activity, it may
+    get bloated to the point that <command>VACUUM FULL</command> is really necessary
+    to reclaim space.  Using the autovacuum daemon alleviates this problem,
+    since the daemon schedules vacuuming dynamically in response to update
+    activity.  It is unwise to disable the daemon completely unless you
+    have an extremely predictable workload.  One possible compromise is
+    to set the daemon's parameters so that it will only react to unusually
+    heavy update activity, thus keeping things from getting out of hand,
+    while scheduled <command>VACUUM</command>s are expected to do the bulk of the
+    work when the load is typical.
+   </para>
+
+   <para>
+    For those not using autovacuum, a typical approach is to schedule a
+    database-wide <command>VACUUM</command> once a day during a low-usage period,
+    supplemented by more frequent vacuuming of heavily-updated tables as
+    necessary. (Some installations with extremely high update rates vacuum
+    their busiest tables as often as once every few minutes.) If you have
+    multiple databases in a cluster, don't forget to
+    <command>VACUUM</command> each one; the program <xref
+    linkend="app-vacuumdb"/> might be helpful.
+   </para>
+
+   <tip>
+   <para>
+    Plain <command>VACUUM</command> may not be satisfactory when
+    a table contains large numbers of dead row versions as a result of
+    massive update or delete activity.  If you have such a table and
+    you need to reclaim the excess disk space it occupies, you will need
+    to use <command>VACUUM FULL</command>, or alternatively
+    <link linkend="sql-cluster"><command>CLUSTER</command></link>
+    or one of the table-rewriting variants of
+    <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
+    These commands rewrite an entire new copy of the table and build
+    new indexes for it.  All these options require an
+    <literal>ACCESS EXCLUSIVE</literal> lock.  Note that
+    they also temporarily use extra disk space approximately equal to the size
+    of the table, since the old copies of the table and indexes can't be
+    released until the new ones are complete.
+   </para>
+   </tip>
+
+   <tip>
+   <para>
+    If you have a table whose entire contents are deleted on a periodic
+    basis, consider doing it with
+    <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather
+    than using <command>DELETE</command> followed by
+    <command>VACUUM</command>. <command>TRUNCATE</command> removes the
+    entire content of the table immediately, without requiring a
+    subsequent <command>VACUUM</command> or <command>VACUUM
+    FULL</command> to reclaim the now-unused disk space.
+    The disadvantage is that strict MVCC semantics are violated.
+   </para>
+   </tip>
+  </sect2>
+
+  <sect2 id="vacuum-for-statistics">
+   <title>Updating Planner Statistics</title>
+
+   <indexterm zone="vacuum-for-statistics">
+    <primary>statistics</primary>
+    <secondary>of the planner</secondary>
+   </indexterm>
+
+   <indexterm zone="vacuum-for-statistics">
+    <primary>ANALYZE</primary>
+   </indexterm>
+
+   <para>
+    The <productname>PostgreSQL</productname> query planner relies on
+    statistical information about the contents of tables in order to
+    generate good plans for queries.  These statistics are gathered by
+    the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
+    which can be invoked by itself or
+    as an optional step in <command>VACUUM</command>.  It is important to have
+    reasonably accurate statistics, otherwise poor choices of plans might
+    degrade database performance.
+   </para>
+
+   <para>
+    The autovacuum daemon, if enabled, will automatically issue
+    <command>ANALYZE</command> commands whenever the content of a table has
+    changed sufficiently.  However, administrators might prefer to rely
+    on manually-scheduled <command>ANALYZE</command> operations, particularly
+    if it is known that update activity on a table will not affect the
+    statistics of <quote>interesting</quote> columns.  The daemon schedules
+    <command>ANALYZE</command> strictly as a function of the number of rows
+    inserted or updated; it has no knowledge of whether that will lead
+    to meaningful statistical changes.
+   </para>
+
+   <para>
+    Tuples changed in partitions and inheritance children do not trigger
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum, and the statistics for
+    the inheritance tree as a whole won't be collected. It is necessary to
+    run <command>ANALYZE</command> on the parent table manually in order to
+    keep the statistics up to date.
+   </para>
+
+   <para>
+    As with vacuuming for space recovery, frequent updates of statistics
+    are more useful for heavily-updated tables than for seldom-updated
+    ones. But even for a heavily-updated table, there might be no need for
+    statistics updates if the statistical distribution of the data is
+    not changing much. A simple rule of thumb is to think about how much
+    the minimum and maximum values of the columns in the table change.
+    For example, a <type>timestamp</type> column that contains the time
+    of row update will have a constantly-increasing maximum value as
+    rows are added and updated; such a column will probably need more
+    frequent statistics updates than, say, a column containing URLs for
+    pages accessed on a website. The URL column might receive changes just
+    as often, but the statistical distribution of its values probably
+    changes relatively slowly.
+   </para>
+
+   <para>
+    It is possible to run <command>ANALYZE</command> on specific tables and even
+    just specific columns of a table, so the flexibility exists to update some
+    statistics more frequently than others if your application requires it.
+    In practice, however, it is usually best to just analyze the entire
+    database, because it is a fast operation.  <command>ANALYZE</command> uses a
+    statistically random sampling of the rows of a table rather than reading
+    every single row.
+   </para>
+
+   <tip>
+    <para>
+     Although per-column tweaking of <command>ANALYZE</command> frequency might not be
+     very productive, you might find it worthwhile to do per-column
+     adjustment of the level of detail of the statistics collected by
+     <command>ANALYZE</command>.  Columns that are heavily used in <literal>WHERE</literal>
+     clauses and have highly irregular data distributions might require a
+     finer-grain data histogram than other columns.  See <command>ALTER TABLE
+     SET STATISTICS</command>, or change the database-wide default using the <xref
+     linkend="guc-default-statistics-target"/> configuration parameter.
+    </para>
+
+    <para>
+     Also, by default there is limited information available about
+     the selectivity of functions.  However, if you create a statistics
+     object or an expression
+     index that uses a function call, useful statistics will be
+     gathered about the function, which can greatly improve query
+     plans that use the expression index.
+    </para>
+   </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+     foreign tables, since it has no means of determining how often that
+     might be useful.  If your queries require statistics on foreign tables
+     for proper planning, it's a good idea to run manually-managed
+     <command>ANALYZE</command> commands on those tables on a suitable schedule.
+    </para>
+   </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands
+     for partitioned tables.  Inheritance parents will only be analyzed if the
+     parent itself is changed - changes to child tables do not trigger
+     autoanalyze on the parent table.  If your queries require statistics on
+     parent tables for proper planning, it is necessary to periodically run
+     a manual <command>ANALYZE</command> on those tables to keep the statistics
+     up to date.
+    </para>
+   </tip>
+
+  </sect2>
+
+  <sect2 id="vacuum-for-visibility-map">
+   <title>Updating the Visibility Map</title>
+
+   <para>
+    Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
+    table to keep track of which pages contain only tuples that are known to be
+    visible to all active transactions (and all future transactions, until the
+    page is again modified).  This has two purposes.  First, vacuum
+    itself can skip such pages on the next run, since there is nothing to
+    clean up.
+   </para>
+
+   <para>
+    Second, it allows <productname>PostgreSQL</productname> to answer some
+    queries using only the index, without reference to the underlying table.
+    Since <productname>PostgreSQL</productname> indexes don't contain tuple
+    visibility information, a normal index scan fetches the heap tuple for each
+    matching index entry, to check whether it should be seen by the current
+    transaction.
+    An <link linkend="indexes-index-only-scans"><firstterm>index-only
+    scan</firstterm></link>, on the other hand, checks the visibility map first.
+    If it's known that all tuples on the page are
+    visible, the heap fetch can be skipped.  This is most useful on
+    large data sets where the visibility map can prevent disk accesses.
+    The visibility map is vastly smaller than the heap, so it can easily be
+    cached even when the heap is very large.
+   </para>
+  </sect2>
+
+  <sect2 id="vacuum-for-wraparound">
+   <title>Preventing Transaction ID Wraparound Failures</title>
+
+   <indexterm zone="vacuum-for-wraparound">
+    <primary>transaction ID</primary>
+    <secondary>wraparound</secondary>
+   </indexterm>
+
+    <indexterm>
+     <primary>wraparound</primary>
+     <secondary>of transaction IDs</secondary>
+    </indexterm>
+
+   <para>
+    <productname>PostgreSQL</productname>'s
+    <link linkend="mvcc-intro">MVCC</link> transaction semantics
+    depend on being able to compare transaction ID (<acronym>XID</acronym>)
+    numbers: a row version with an insertion XID greater than the current
+    transaction's XID is <quote>in the future</quote> and should not be visible
+    to the current transaction.  But since transaction IDs have limited size
+    (32 bits) a cluster that runs for a long time (more
+    than 4 billion transactions) would suffer <firstterm>transaction ID
+    wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden
+    transactions that were in the past appear to be in the future &mdash; which
+    means their output become invisible.  In short, catastrophic data loss.
+    (Actually the data is still there, but that's cold comfort if you cannot
+    get at it.)  To avoid this, it is necessary to vacuum every table
+    in every database at least once every two billion transactions.
+   </para>
+
+   <para>
+    The reason that periodic vacuuming solves the problem is that
+    <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
+    they were inserted by a transaction that committed sufficiently far in
+    the past that the effects of the inserting transaction are certain to be
+    visible to all current and future transactions.
+    Normal XIDs are
+    compared using modulo-2<superscript>32</superscript> arithmetic. This means
+    that for every normal XID, there are two billion XIDs that are
+    <quote>older</quote> and two billion that are <quote>newer</quote>; another
+    way to say it is that the normal XID space is circular with no
+    endpoint. Therefore, once a row version has been created with a particular
+    normal XID, the row version will appear to be <quote>in the past</quote> for
+    the next two billion transactions, no matter which normal XID we are
+    talking about. If the row version still exists after more than two billion
+    transactions, it will suddenly appear to be in the future. To
+    prevent this, <productname>PostgreSQL</productname> reserves a special XID,
+    <literal>FrozenTransactionId</literal>, which does not follow the normal XID
+    comparison rules and is always considered older
+    than every normal XID.
+    Frozen row versions are treated as if the inserting XID were
+    <literal>FrozenTransactionId</literal>, so that they will appear to be
+    <quote>in the past</quote> to all normal transactions regardless of wraparound
+    issues, and so such row versions will be valid until deleted, no matter
+    how long that is.
+   </para>
+
+   <note>
+    <para>
+     In <productname>PostgreSQL</productname> versions before 9.4, freezing was
+     implemented by actually replacing a row's insertion XID
+     with <literal>FrozenTransactionId</literal>, which was visible in the
+     row's <structname>xmin</structname> system column.  Newer versions just set a flag
+     bit, preserving the row's original <structname>xmin</structname> for possible
+     forensic use.  However, rows with <structname>xmin</structname> equal
+     to <literal>FrozenTransactionId</literal> (2) may still be found
+     in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
+    </para>
+    <para>
+     Also, system catalogs may contain rows with <structname>xmin</structname> equal
+     to <literal>BootstrapTransactionId</literal> (1), indicating that they were
+     inserted during the first phase of <application>initdb</application>.
+     Like <literal>FrozenTransactionId</literal>, this special XID is treated as
+     older than every normal XID.
+    </para>
+   </note>
+
+   <para>
+    <xref linkend="guc-vacuum-freeze-min-age"/>
+    controls how old an XID value has to be before rows bearing that XID will be
+    frozen.  Increasing this setting may avoid unnecessary work if the
+    rows that would otherwise be frozen will soon be modified again,
+    but decreasing this setting increases
+    the number of transactions that can elapse before the table must be
+    vacuumed again.
+   </para>
+
+   <para>
+    <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link>
+    to determine which pages of a table must be scanned.  Normally, it
+    will skip pages that don't have any dead row versions even if those pages
+    might still have row versions with old XID values.  Therefore, normal
+    <command>VACUUM</command>s won't always freeze every old row version in the table.
+    When that happens, <command>VACUUM</command> will eventually need to perform an
+    <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen
+    XID and MXID values, including those from all-visible but not all-frozen pages.
+    In practice most tables require periodic aggressive vacuuming.
+    <xref linkend="guc-vacuum-freeze-table-age"/>
+    controls when <command>VACUUM</command> does that: all-visible but not all-frozen
+    pages are scanned if the number of transactions that have passed since the
+    last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus
+    <varname>vacuum_freeze_min_age</varname>. Setting
+    <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to
+    always use its aggressive strategy.
+   </para>
+
+   <para>
+    The maximum time that a table can go unvacuumed is two billion
+    transactions minus the <varname>vacuum_freeze_min_age</varname> value at
+    the time of the last aggressive vacuum. If it were to go
+    unvacuumed for longer than
+    that, data loss could result.  To ensure that this does not happen,
+    autovacuum is invoked on any table that might contain unfrozen rows with
+    XIDs older than the age specified by the configuration parameter <xref
+    linkend="guc-autovacuum-freeze-max-age"/>.  (This will happen even if
+    autovacuum is disabled.)
+   </para>
+
+   <para>
+    This implies that if a table is not otherwise vacuumed,
+    autovacuum will be invoked on it approximately once every
+    <varname>autovacuum_freeze_max_age</varname> minus
+    <varname>vacuum_freeze_min_age</varname> transactions.
+    For tables that are regularly vacuumed for space reclamation purposes,
+    this is of little importance.  However, for static tables
+    (including tables that receive inserts, but no updates or deletes),
+    there is no need to vacuum for space reclamation, so it can
+    be useful to try to maximize the interval between forced autovacuums
+    on very large static tables.  Obviously one can do this either by
+    increasing <varname>autovacuum_freeze_max_age</varname> or decreasing
+    <varname>vacuum_freeze_min_age</varname>.
+   </para>
+
+   <para>
+    The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 *
+    <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be
+    capped to the maximum. A value higher than
+    <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an
+    anti-wraparound autovacuum would be triggered at that point anyway, and
+    the 0.95 multiplier leaves some breathing room to run a manual
+    <command>VACUUM</command> before that happens.  As a rule of thumb,
+    <command>vacuum_freeze_table_age</command> should be set to a value somewhat
+    below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that
+    a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by
+    normal delete and update activity is run in that window.  Setting it too
+    close could lead to anti-wraparound autovacuums, even though the table
+    was recently vacuumed to reclaim space, whereas lower values lead to more
+    frequent aggressive vacuuming.
+   </para>
+
+   <para>
+    The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname>
+    (and <varname>vacuum_freeze_table_age</varname> along with it) is that
+    the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename>
+    subdirectories of the database cluster will take more space, because it
+    must store the commit status and (if <varname>track_commit_timestamp</varname> is
+    enabled) timestamp of all transactions back to
+    the <varname>autovacuum_freeze_max_age</varname> horizon.  The commit status uses
+    two bits per transaction, so if
+    <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value
+    of two billion, <filename>pg_xact</filename> can be expected to grow to about half
+    a gigabyte and <filename>pg_commit_ts</filename> to about 20GB.  If this
+    is trivial compared to your total database size,
+    setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value
+    is recommended.  Otherwise, set it depending on what you are willing to
+    allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage.
+    (The default, 200 million transactions, translates to about 50MB
+    of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename>
+    storage.)
+   </para>
+
+   <para>
+    One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that
+    it might cause <command>VACUUM</command> to do useless work: freezing a row
+    version is a waste of time if the row is modified
+    soon thereafter (causing it to acquire a new XID).  So the setting should
+    be large enough that rows are not frozen until they are unlikely to change
+    any more.
+   </para>
+
+   <para>
+    To track the age of the oldest unfrozen XIDs in a database,
+    <command>VACUUM</command> stores XID
+    statistics in the system tables <structname>pg_class</structname> and
+    <structname>pg_database</structname>.  In particular,
+    the <structfield>relfrozenxid</structfield> column of a table's
+    <structname>pg_class</structname> row contains the oldest remaining unfrozen
+    XID at the end of the most recent <command>VACUUM</command> that successfully
+    advanced <structfield>relfrozenxid</structfield> (typically the most recent
+    aggressive VACUUM).  Similarly, the
+    <structfield>datfrozenxid</structfield> column of a database's
+    <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
+    appearing in that database &mdash; it is just the minimum of the
+    per-table <structfield>relfrozenxid</structfield> values within the database.
+    A convenient way to
+    examine this information is to execute queries such as:
+
+<programlisting>
+SELECT c.oid::regclass as table_name,
+       greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
+FROM pg_class c
+LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
+WHERE c.relkind IN ('r', 'm');
+
+SELECT datname, age(datfrozenxid) FROM pg_database;
+</programlisting>
+
+    The <literal>age</literal> column measures the number of transactions from the
+    cutoff XID to the current transaction's XID.
+   </para>
+
+   <tip>
+    <para>
+     When the <command>VACUUM</command> command's <literal>VERBOSE</literal>
+     parameter is specified, <command>VACUUM</command> prints various
+     statistics about the table.  This includes information about how
+     <structfield>relfrozenxid</structfield> and
+     <structfield>relminmxid</structfield> advanced.  The same details appear
+     in the server log when autovacuum logging (controlled by <xref
+      linkend="guc-log-autovacuum-min-duration"/>) reports on a
+     <command>VACUUM</command> operation executed by autovacuum.
+    </para>
+   </tip>
+
+   <para>
+    <command>VACUUM</command> normally only scans pages that have been modified
+    since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
+    advanced when every page of the table
+    that might contain unfrozen XIDs is scanned.  This happens when
+    <structfield>relfrozenxid</structfield> is more than
+    <varname>vacuum_freeze_table_age</varname> transactions old, when
+    <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all
+    pages that are not already all-frozen happen to
+    require vacuuming to remove dead row versions. When <command>VACUUM</command>
+    scans every page in the table that is not already all-frozen, it should
+    set <literal>age(relfrozenxid)</literal> to a value just a little more than the
+    <varname>vacuum_freeze_min_age</varname> setting
+    that was used (more by the number of transactions started since the
+    <command>VACUUM</command> started).  <command>VACUUM</command>
+    will set <structfield>relfrozenxid</structfield> to the oldest XID
+    that remains in the table, so it's possible that the final value
+    will be much more recent than strictly required.
+    If no <structfield>relfrozenxid</structfield>-advancing
+    <command>VACUUM</command> is issued on the table until
+    <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
+    be forced for the table.
+   </para>
+
+   <para>
+    If for some reason autovacuum fails to clear old XIDs from a table, the
+    system will begin to emit warning messages like this when the database's
+    oldest XIDs reach forty million transactions from the wraparound point:
+
+<programlisting>
+WARNING:  database "mydb" must be vacuumed within 39985967 transactions
+HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
+</programlisting>
+
+    (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+    hint; but note that the <command>VACUUM</command> must be performed by a
+    superuser, else it will fail to process system catalogs and thus not
+    be able to advance the database's <structfield>datfrozenxid</structfield>.)
+    If these warnings are
+    ignored, the system will shut down and refuse to start any new
+    transactions once there are fewer than three million transactions left
+    until wraparound:
+
+<programlisting>
+ERROR:  database is not accepting commands to avoid wraparound data loss in database "mydb"
+HINT:  Stop the postmaster and vacuum that database in single-user mode.
+</programlisting>
+
+    The three-million-transaction safety margin exists to let the
+    administrator recover without data loss, by manually executing the
+    required <command>VACUUM</command> commands.  However, since the system will not
+    execute commands once it has gone into the safety shutdown mode,
+    the only way to do this is to stop the server and start the server in single-user
+    mode to execute <command>VACUUM</command>.  The shutdown mode is not enforced
+    in single-user mode.  See the <xref linkend="app-postgres"/> reference
+    page for details about using single-user mode.
+   </para>
+
+   <sect3 id="vacuum-for-multixact-wraparound">
+    <title>Multixacts and Wraparound</title>
+
+    <indexterm>
+     <primary>MultiXactId</primary>
+    </indexterm>
+
+    <indexterm>
+     <primary>wraparound</primary>
+     <secondary>of multixact IDs</secondary>
+    </indexterm>
+
+    <para>
+     <firstterm>Multixact IDs</firstterm> are used to support row locking by
+     multiple transactions.  Since there is only limited space in a tuple
+     header to store lock information, that information is encoded as
+     a <quote>multiple transaction ID</quote>, or multixact ID for short,
+     whenever there is more than one transaction concurrently locking a
+     row.  Information about which transaction IDs are included in any
+     particular multixact ID is stored separately in
+     the <filename>pg_multixact</filename> subdirectory, and only the multixact ID
+     appears in the <structfield>xmax</structfield> field in the tuple header.
+     Like transaction IDs, multixact IDs are implemented as a
+     32-bit counter and corresponding storage, all of which requires
+     careful aging management, storage cleanup, and wraparound handling.
+     There is a separate storage area which holds the list of members in
+     each multixact, which also uses a 32-bit counter and which must also
+     be managed.
+    </para>
+
+    <para>
+     Whenever <command>VACUUM</command> scans any part of a table, it will replace
+     any multixact ID it encounters which is older than
+     <xref linkend="guc-vacuum-multixact-freeze-min-age"/>
+     by a different value, which can be the zero value, a single
+     transaction ID, or a newer multixact ID.  For each table,
+     <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest
+     possible multixact ID still appearing in any tuple of that table.
+     If this value is older than
+     <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive
+     vacuum is forced.  As discussed in the previous section, an aggressive
+     vacuum means that only those pages which are known to be all-frozen will
+     be skipped.  <function>mxid_age()</function> can be used on
+     <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age.
+    </para>
+
+    <para>
+     Aggressive <command>VACUUM</command>s, regardless of what causes
+     them, are <emphasis>guaranteed</emphasis> to be able to advance
+     the table's <structfield>relminmxid</structfield>.
+     Eventually, as all tables in all databases are scanned and their
+     oldest multixact values are advanced, on-disk storage for older
+     multixacts can be removed.
+    </para>
+
+    <para>
+     As a safety device, an aggressive vacuum scan will
+     occur for any table whose multixact-age is greater than <xref
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if the
+     storage occupied by multixacts members exceeds 2GB, aggressive vacuum
+     scans will occur more often for all tables, starting with those that
+     have the oldest multixact-age.  Both of these kinds of aggressive
+     scans will occur even if autovacuum is nominally disabled.
+    </para>
+   </sect3>
+  </sect2>
+
+  <sect2 id="autovacuum">
+   <title>The Autovacuum Daemon</title>
+
+   <indexterm>
+    <primary>autovacuum</primary>
+    <secondary>general information</secondary>
+   </indexterm>
+   <para>
+    <productname>PostgreSQL</productname> has an optional but highly
+    recommended feature called <firstterm>autovacuum</firstterm>,
+    whose purpose is to automate the execution of
+    <command>VACUUM</command> and <command>ANALYZE</command> commands.
+    When enabled, autovacuum checks for
+    tables that have had a large number of inserted, updated or deleted
+    tuples.  These checks use the statistics collection facility;
+    therefore, autovacuum cannot be used unless <xref
+    linkend="guc-track-counts"/> is set to <literal>true</literal>.
+    In the default configuration, autovacuuming is enabled and the related
+    configuration parameters are appropriately set.
+   </para>
+
+   <para>
+    The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+    There is a persistent daemon process, called the
+    <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+    <firstterm>autovacuum worker</firstterm> processes for all databases. The
+    launcher will distribute the work across time, attempting to start one
+    worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+    seconds.  (Therefore, if the installation has <replaceable>N</replaceable> databases,
+    a new worker will be launched every
+    <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+    A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+    are allowed to run at the same time. If there are more than
+    <varname>autovacuum_max_workers</varname> databases to be processed,
+    the next database will be processed as soon as the first worker finishes.
+    Each worker process will check each table within its database and
+    execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+    <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+    autovacuum workers' activity.
+   </para>
+
+   <para>
+    If several large tables all become eligible for vacuuming in a short
+    amount of time, all autovacuum workers might become occupied with
+    vacuuming those tables for a long period.  This would result
+    in other tables and databases not being vacuumed until a worker becomes
+    available. There is no limit on how many workers might be in a
+    single database, but workers do try to avoid repeating work that has
+    already been done by other workers. Note that the number of running
+    workers does not count towards <xref linkend="guc-max-connections"/> or
+    <xref linkend="guc-superuser-reserved-connections"/> limits.
+   </para>
+
+   <para>
+    Tables whose <structfield>relfrozenxid</structfield> value is more than
+    <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+    vacuumed (this also applies to those tables whose freeze max age has
+    been modified via storage parameters; see below).  Otherwise, if the
+    number of tuples obsoleted since the last
+    <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+    table is vacuumed.  The vacuum threshold is defined as:
+<programlisting>
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</programlisting>
+    where the vacuum base threshold is
+    <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+    the vacuum scale factor is
+    <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+    and the number of tuples is
+    <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+   </para>
+
+   <para>
+    The table is also vacuumed if the number of tuples inserted since the last
+    vacuum has exceeded the defined insert threshold, which is defined as:
+<programlisting>
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</programlisting>
+    where the vacuum insert base threshold is
+    <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+    and vacuum insert scale factor is
+    <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+    Such vacuums may allow portions of the table to be marked as
+    <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+    can reduce the work required in subsequent vacuums.
+    For tables which receive <command>INSERT</command> operations but no or
+    almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+    it may be beneficial to lower the table's
+    <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+    tuples to be frozen by earlier vacuums.  The number of obsolete tuples and
+    the number of inserted tuples are obtained from the cumulative statistics system;
+    it is a semi-accurate count updated by each <command>UPDATE</command>,
+    <command>DELETE</command> and <command>INSERT</command> operation.  (It is
+    only semi-accurate because some information might be lost under heavy
+    load.)  If the <structfield>relfrozenxid</structfield> value of the table
+    is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+    an aggressive vacuum is performed to freeze old tuples and advance
+    <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+    since the last vacuum are scanned.
+   </para>
+
+   <para>
+    For analyze, a similar condition is used: the threshold, defined as:
+<programlisting>
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</programlisting>
+    is compared to the total number of tuples inserted, updated, or deleted
+    since the last <command>ANALYZE</command>.
+   </para>
+
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command> when it is
+    first populated, and again whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
+   <para>
+    Temporary tables cannot be accessed by autovacuum.  Therefore,
+    appropriate vacuum and analyze operations should be performed via
+    session SQL commands.
+   </para>
+
+   <para>
+    The default thresholds and scale factors are taken from
+    <filename>postgresql.conf</filename>, but it is possible to override them
+    (and many other autovacuum control parameters) on a per-table basis; see
+    <xref linkend="sql-createtable-storage-parameters"/> for more information.
+    If a setting has been changed via a table's storage parameters, that value
+    is used when processing that table; otherwise the global settings are
+    used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+    the global settings.
+   </para>
+
+   <para>
+    When multiple workers are running, the autovacuum cost delay parameters
+    (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+    <quote>balanced</quote> among all the running workers, so that the
+    total I/O impact on the system is the same regardless of the number
+    of workers actually running.  However, any workers processing tables whose
+    per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+    <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+    are not considered in the balancing algorithm.
+   </para>
+
+   <para>
+    Autovacuum workers generally don't block other commands.  If a process
+    attempts to acquire a lock that conflicts with the
+    <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+    acquisition will interrupt the autovacuum.  For conflicting lock modes,
+    see <xref linkend="table-lock-compatibility"/>.  However, if the autovacuum
+    is running to prevent transaction ID wraparound (i.e., the autovacuum query
+    name in the <structname>pg_stat_activity</structname> view ends with
+    <literal>(to prevent wraparound)</literal>), the autovacuum is not
+    automatically interrupted.
+   </para>
+
+   <warning>
+    <para>
+     Regularly running commands that acquire locks conflicting with a
+     <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+     effectively prevent autovacuums from ever completing.
+    </para>
+   </warning>
+  </sect2>
+ </sect1>
+
+
+ <sect1 id="routine-reindex">
+  <title>Routine Reindexing</title>
+
+  <indexterm zone="routine-reindex">
+   <primary>reindex</primary>
+  </indexterm>
+
+  <para>
+   In some situations it is worthwhile to rebuild indexes periodically
+   with the <xref linkend="sql-reindex"/> command or a series of individual
+   rebuilding steps.
+
+  </para>
+
+  <para>
+   B-tree index pages that have become completely empty are reclaimed for
+   re-use.  However, there is still a possibility
+   of inefficient use of space: if all but a few index keys on a page have
+   been deleted, the page remains allocated.  Therefore, a usage
+   pattern in which most, but not all, keys in each range are eventually
+   deleted will see poor use of space.  For such usage patterns,
+   periodic reindexing is recommended.
+  </para>
+
+  <para>
+   The potential for bloat in non-B-tree indexes has not been well
+   researched.  It is a good idea to periodically monitor the index's physical
+   size when using any non-B-tree index type.
+  </para>
+
+  <para>
+   Also, for B-tree indexes, a freshly-constructed index is slightly faster to
+   access than one that has been updated many times because logically
+   adjacent pages are usually also physically adjacent in a newly built index.
+   (This consideration does not apply to non-B-tree indexes.)  It
+   might be worthwhile to reindex periodically just to improve access speed.
+  </para>
+
+  <para>
+   <xref linkend="sql-reindex"/> can be used safely and easily in all cases.
+   This command requires an <literal>ACCESS EXCLUSIVE</literal> lock by
+   default, hence it is often preferable to execute it with its
+   <literal>CONCURRENTLY</literal> option, which requires only a
+   <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
+  </para>
+ </sect1>
+
+
+ <sect1 id="logfile-maintenance">
+  <title>Log File Maintenance</title>
+
+  <indexterm zone="logfile-maintenance">
+   <primary>server log</primary>
+   <secondary>log file maintenance</secondary>
+  </indexterm>
+
+  <para>
+   It is a good idea to save the database server's log output
+   somewhere, rather than just discarding it via <filename>/dev/null</filename>.
+   The log output is invaluable when diagnosing
+   problems.
+  </para>
+
+  <note>
+   <para>
+    The server log can contain sensitive information and needs to be protected,
+    no matter how or where it is stored, or the destination to which it is routed.
+    For example, some DDL statements might contain plaintext passwords or other
+    authentication details. Logged statements at the <literal>ERROR</literal>
+    level might show the SQL source code for applications
+    and might also contain some parts of data rows. Recording data, events and
+    related information is the intended function of this facility, so this is
+    not a leakage or a bug. Please ensure the server logs are visible only to
+    appropriately authorized people.
+   </para>
+  </note>
+
+  <para>
+   Log output tends to be voluminous
+   (especially at higher debug levels) so you won't want to save it
+   indefinitely.  You need to <emphasis>rotate</emphasis> the log files so that
+   new log files are started and old ones removed after a reasonable
+   period of time.
+  </para>
+
+  <para>
+   If you simply direct the <systemitem>stderr</systemitem> of
+   <command>postgres</command> into a
+   file, you will have log output, but
+   the only way to truncate the log file is to stop and restart
+   the server. This might be acceptable if you are using
+   <productname>PostgreSQL</productname> in a development environment,
+   but few production servers would find this behavior acceptable.
+  </para>
+
+  <para>
+   A better approach is to send the server's
+   <systemitem>stderr</systemitem> output to some type of log rotation program.
+   There is a built-in log rotation facility, which you can use by
+   setting the configuration parameter <varname>logging_collector</varname> to
+   <literal>true</literal> in <filename>postgresql.conf</filename>.  The control
+   parameters for this program are described in <xref
+   linkend="runtime-config-logging-where"/>. You can also use this approach
+   to capture the log data in machine readable <acronym>CSV</acronym>
+   (comma-separated values) format.
+  </para>
+
+  <para>
+   Alternatively, you might prefer to use an external log rotation
+   program if you have one that you are already using with other
+   server software. For example, the <application>rotatelogs</application>
+   tool included in the <productname>Apache</productname> distribution
+   can be used with <productname>PostgreSQL</productname>.  One way to
+   do this is to pipe the server's
+   <systemitem>stderr</systemitem> output to the desired program.
+   If you start the server with
+   <command>pg_ctl</command>, then <systemitem>stderr</systemitem>
+   is already redirected to <systemitem>stdout</systemitem>, so you just need a
+   pipe command, for example:
+
+<programlisting>
+pg_ctl start | rotatelogs /var/log/pgsql_log 86400
+</programlisting>
+  </para>
+
+  <para>
+   You can combine these approaches by setting up <application>logrotate</application>
+   to collect log files produced by <productname>PostgreSQL</productname> built-in
+   logging collector.  In this case, the logging collector defines the names and
+   location of the log files, while <application>logrotate</application>
+   periodically archives these files.  When initiating log rotation,
+   <application>logrotate</application> must ensure that the application
+   sends further output to the new file.  This is commonly done with a
+   <literal>postrotate</literal> script that sends a <literal>SIGHUP</literal>
+   signal to the application, which then reopens the log file.
+   In <productname>PostgreSQL</productname>, you can run <command>pg_ctl</command>
+   with the <literal>logrotate</literal> option instead.  When the server receives
+   this command, the server either switches to a new log file or reopens the
+   existing file, depending on the logging configuration
+   (see <xref linkend="runtime-config-logging-where"/>).
+  </para>
+
+  <note>
+   <para>
+    When using static log file names, the server might fail to reopen the log
+    file if the max open file limit is reached or a file table overflow occurs.
+    In this case, log messages are sent to the old log file until a
+    successful log rotation. If <application>logrotate</application> is
+    configured to compress the log file and delete it, the server may lose
+    the messages logged in this time frame. To avoid this issue, you can
+    configure the logging collector to dynamically assign log file names
+    and use a <literal>prerotate</literal> script to ignore open log files.
+    </para>
+  </note>
+
+  <para>
+   Another production-grade approach to managing log output is to
+   send it to <application>syslog</application> and let
+   <application>syslog</application> deal with file rotation. To do this, set the
+   configuration parameter <varname>log_destination</varname> to <literal>syslog</literal>
+   (to log to <application>syslog</application> only) in
+   <filename>postgresql.conf</filename>. Then you can send a <literal>SIGHUP</literal>
+   signal to the <application>syslog</application> daemon whenever you want to force it
+   to start writing a new log file.  If you want to automate log
+   rotation, the <application>logrotate</application> program can be
+   configured to work with log files from
+   <application>syslog</application>.
+  </para>
+
+  <para>
+   On many systems, however, <application>syslog</application> is not very reliable,
+   particularly with large log messages; it might truncate or drop messages
+   just when you need them the most.  Also, on <productname>Linux</productname>,
+   <application>syslog</application> will flush each message to disk, yielding poor
+   performance.  (You can use a <quote><literal>-</literal></quote> at the start of the file name
+   in the <application>syslog</application> configuration file to disable syncing.)
+  </para>
+
+  <para>
+   Note that all the solutions described above take care of starting new
+   log files at configurable intervals, but they do not handle deletion
+   of old, no-longer-useful log files.  You will probably want to set
+   up a batch job to periodically delete old log files.  Another possibility
+   is to configure the rotation program so that old log files are overwritten
+   cyclically.
+  </para>
+
+  <para>
+   <ulink url="https://pgbadger.darold.net/"><productname>pgBadger</productname></ulink>
+   is an external project that does sophisticated log file analysis.
+   <ulink
+   url="https://bucardo.org/check_postgres/"><productname>check_postgres</productname></ulink>
+   provides Nagios alerts when important messages appear in the log
+   files, as well as detection of many other extraordinary conditions.
+  </para>
+ </sect1>
+</chapter>