summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/routine-vacuuming.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/html/routine-vacuuming.html
parentInitial commit. (diff)
downloadpostgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz
postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/routine-vacuuming.html')
-rw-r--r--doc/src/sgml/html/routine-vacuuming.html629
1 files changed, 629 insertions, 0 deletions
diff --git a/doc/src/sgml/html/routine-vacuuming.html b/doc/src/sgml/html/routine-vacuuming.html
new file mode 100644
index 0000000..56f92b6
--- /dev/null
+++ b/doc/src/sgml/html/routine-vacuuming.html
@@ -0,0 +1,629 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>25.1. Routine Vacuuming</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="maintenance.html" title="Chapter 25. Routine Database Maintenance Tasks" /><link rel="next" href="routine-reindex.html" title="25.2. Routine Reindexing" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">25.1. Routine Vacuuming</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="maintenance.html" title="Chapter 25. Routine Database Maintenance Tasks">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="maintenance.html" title="Chapter 25. Routine Database Maintenance Tasks">Up</a></td><th width="60%" align="center">Chapter 25. Routine Database Maintenance Tasks</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="routine-reindex.html" title="25.2. Routine Reindexing">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="ROUTINE-VACUUMING"><div class="titlepage"><div><div><h2 class="title" style="clear: both">25.1. Routine Vacuuming</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="routine-vacuuming.html#VACUUM-BASICS">25.1.1. Vacuuming Basics</a></span></dt><dt><span class="sect2"><a href="routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY">25.1.2. Recovering Disk Space</a></span></dt><dt><span class="sect2"><a href="routine-vacuuming.html#VACUUM-FOR-STATISTICS">25.1.3. Updating Planner Statistics</a></span></dt><dt><span class="sect2"><a href="routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP">25.1.4. Updating the Visibility Map</a></span></dt><dt><span class="sect2"><a href="routine-vacuuming.html#VACUUM-FOR-WRAPAROUND">25.1.5. Preventing Transaction ID Wraparound Failures</a></span></dt><dt><span class="sect2"><a href="routine-vacuuming.html#AUTOVACUUM">25.1.6. The Autovacuum Daemon</a></span></dt></dl></div><a id="id-1.6.12.10.2" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span> databases require periodic
+ maintenance known as <em class="firstterm">vacuuming</em>. For many installations, it
+ is sufficient to let vacuuming be performed by the <em class="firstterm">autovacuum
+ daemon</em>, which is described in <a class="xref" href="routine-vacuuming.html#AUTOVACUUM" title="25.1.6. The Autovacuum Daemon">Section 25.1.6</a>. You might
+ need to adjust the autovacuuming parameters described there to obtain best
+ results for your situation. Some database administrators will want to
+ supplement or replace the daemon's activities with manually-managed
+ <code class="command">VACUUM</code> commands, which typically are executed according to a
+ schedule by <span class="application">cron</span> or <span class="application">Task
+ Scheduler</span> scripts. To set up manually-managed vacuuming properly,
+ it is essential to understand the issues discussed in the next few
+ subsections. Administrators who rely on autovacuuming may still wish
+ to skim this material to help them understand and adjust autovacuuming.
+ </p><div class="sect2" id="VACUUM-BASICS"><div class="titlepage"><div><div><h3 class="title">25.1.1. Vacuuming Basics</h3></div></div></div><p>
+ <span class="productname">PostgreSQL</span>'s
+ <a class="link" href="sql-vacuum.html" title="VACUUM"><code class="command">VACUUM</code></a> command has to
+ process each table on a regular basis for several reasons:
+
+ </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">To recover or reuse disk space occupied by updated or deleted
+ rows.</li><li class="listitem">To update data statistics used by the
+ <span class="productname">PostgreSQL</span> query planner.</li><li class="listitem">To update the visibility map, which speeds
+ up <a class="link" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes">index-only
+ scans</a>.</li><li class="listitem">To protect against loss of very old data due to
+ <em class="firstterm">transaction ID wraparound</em> or
+ <em class="firstterm">multixact ID wraparound</em>.</li></ol></div><p>
+
+ Each of these reasons dictates performing <code class="command">VACUUM</code> operations
+ of varying frequency and scope, as explained in the following subsections.
+ </p><p>
+ There are two variants of <code class="command">VACUUM</code>: standard <code class="command">VACUUM</code>
+ and <code class="command">VACUUM FULL</code>. <code class="command">VACUUM FULL</code> can reclaim more
+ disk space but runs much more slowly. Also,
+ the standard form of <code class="command">VACUUM</code> can run in parallel with production
+ database operations. (Commands such as <code class="command">SELECT</code>,
+ <code class="command">INSERT</code>, <code class="command">UPDATE</code>, and
+ <code class="command">DELETE</code> will continue to function normally, though you
+ will not be able to modify the definition of a table with commands such as
+ <code class="command">ALTER TABLE</code> while it is being vacuumed.)
+ <code class="command">VACUUM FULL</code> requires an
+ <code class="literal">ACCESS EXCLUSIVE</code> lock on the table it is
+ working on, and therefore cannot be done in parallel with other use
+ of the table. Generally, therefore,
+ administrators should strive to use standard <code class="command">VACUUM</code> and
+ avoid <code class="command">VACUUM FULL</code>.
+ </p><p>
+ <code class="command">VACUUM</code> creates a substantial amount of I/O
+ traffic, which can cause poor performance for other active sessions.
+ There are configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming — see
+ <a class="xref" href="runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST" title="20.4.4. Cost-based Vacuum Delay">Section 20.4.4</a>.
+ </p></div><div class="sect2" id="VACUUM-FOR-SPACE-RECOVERY"><div class="titlepage"><div><div><h3 class="title">25.1.2. Recovering Disk Space</h3></div></div></div><a id="id-1.6.12.10.5.2" class="indexterm"></a><p>
+ In <span class="productname">PostgreSQL</span>, an
+ <code class="command">UPDATE</code> or <code class="command">DELETE</code> of a row does not
+ immediately remove the old version of the row.
+ This approach is necessary to gain the benefits of multiversion
+ concurrency control (<acronym class="acronym">MVCC</acronym>, see <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>): the row version
+ must not be deleted while it is still potentially visible to other
+ transactions. But eventually, an outdated or deleted row version is no
+ longer of interest to any transaction. The space it occupies must then be
+ reclaimed for reuse by new rows, to avoid unbounded growth of disk
+ space requirements. This is done by running <code class="command">VACUUM</code>.
+ </p><p>
+ The standard form of <code class="command">VACUUM</code> removes dead row
+ versions in tables and indexes and marks the space available for
+ future reuse. However, it will not return the space to the operating
+ system, except in the special case where one or more pages at the
+ end of a table become entirely free and an exclusive table lock can be
+ easily obtained. In contrast, <code class="command">VACUUM FULL</code> actively compacts
+ tables by writing a complete new version of the table file with no dead
+ space. This minimizes the size of the table, but can take a long time.
+ It also requires extra disk space for the new copy of the table, until
+ the operation completes.
+ </p><p>
+ The usual goal of routine vacuuming is to do standard <code class="command">VACUUM</code>s
+ often enough to avoid needing <code class="command">VACUUM FULL</code>. The
+ autovacuum daemon attempts to work this way, and in fact will
+ never issue <code class="command">VACUUM FULL</code>. In this approach, the idea
+ is not to keep tables at their minimum size, but to maintain steady-state
+ usage of disk space: each table occupies space equivalent to its
+ minimum size plus however much space gets used up between vacuum runs.
+ Although <code class="command">VACUUM FULL</code> can be used to shrink a table back
+ to its minimum size and return the disk space to the operating system,
+ there is not much point in this if the table will just grow again in the
+ future. Thus, moderately-frequent standard <code class="command">VACUUM</code> runs are a
+ better approach than infrequent <code class="command">VACUUM FULL</code> runs for
+ maintaining heavily-updated tables.
+ </p><p>
+ Some administrators prefer to schedule vacuuming themselves, for example
+ doing all the work at night when load is low.
+ The difficulty with doing vacuuming according to a fixed schedule
+ is that if a table has an unexpected spike in update activity, it may
+ get bloated to the point that <code class="command">VACUUM FULL</code> is really necessary
+ to reclaim space. Using the autovacuum daemon alleviates this problem,
+ since the daemon schedules vacuuming dynamically in response to update
+ activity. It is unwise to disable the daemon completely unless you
+ have an extremely predictable workload. One possible compromise is
+ to set the daemon's parameters so that it will only react to unusually
+ heavy update activity, thus keeping things from getting out of hand,
+ while scheduled <code class="command">VACUUM</code>s are expected to do the bulk of the
+ work when the load is typical.
+ </p><p>
+ For those not using autovacuum, a typical approach is to schedule a
+ database-wide <code class="command">VACUUM</code> once a day during a low-usage period,
+ supplemented by more frequent vacuuming of heavily-updated tables as
+ necessary. (Some installations with extremely high update rates vacuum
+ their busiest tables as often as once every few minutes.) If you have
+ multiple databases in a cluster, don't forget to
+ <code class="command">VACUUM</code> each one; the program <a class="xref" href="app-vacuumdb.html" title="vacuumdb"><span class="refentrytitle"><span class="application">vacuumdb</span></span></a> might be helpful.
+ </p><div class="tip"><h3 class="title">Tip</h3><p>
+ Plain <code class="command">VACUUM</code> may not be satisfactory when
+ a table contains large numbers of dead row versions as a result of
+ massive update or delete activity. If you have such a table and
+ you need to reclaim the excess disk space it occupies, you will need
+ to use <code class="command">VACUUM FULL</code>, or alternatively
+ <a class="link" href="sql-cluster.html" title="CLUSTER"><code class="command">CLUSTER</code></a>
+ or one of the table-rewriting variants of
+ <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE</code></a>.
+ These commands rewrite an entire new copy of the table and build
+ new indexes for it. All these options require an
+ <code class="literal">ACCESS EXCLUSIVE</code> lock. Note that
+ they also temporarily use extra disk space approximately equal to the size
+ of the table, since the old copies of the table and indexes can't be
+ released until the new ones are complete.
+ </p></div><div class="tip"><h3 class="title">Tip</h3><p>
+ If you have a table whose entire contents are deleted on a periodic
+ basis, consider doing it with
+ <a class="link" href="sql-truncate.html" title="TRUNCATE"><code class="command">TRUNCATE</code></a> rather
+ than using <code class="command">DELETE</code> followed by
+ <code class="command">VACUUM</code>. <code class="command">TRUNCATE</code> removes the
+ entire content of the table immediately, without requiring a
+ subsequent <code class="command">VACUUM</code> or <code class="command">VACUUM
+ FULL</code> to reclaim the now-unused disk space.
+ The disadvantage is that strict MVCC semantics are violated.
+ </p></div></div><div class="sect2" id="VACUUM-FOR-STATISTICS"><div class="titlepage"><div><div><h3 class="title">25.1.3. Updating Planner Statistics</h3></div></div></div><a id="id-1.6.12.10.6.2" class="indexterm"></a><a id="id-1.6.12.10.6.3" class="indexterm"></a><p>
+ The <span class="productname">PostgreSQL</span> query planner relies on
+ statistical information about the contents of tables in order to
+ generate good plans for queries. These statistics are gathered by
+ the <a class="link" href="sql-analyze.html" title="ANALYZE"><code class="command">ANALYZE</code></a> command,
+ which can be invoked by itself or
+ as an optional step in <code class="command">VACUUM</code>. It is important to have
+ reasonably accurate statistics, otherwise poor choices of plans might
+ degrade database performance.
+ </p><p>
+ The autovacuum daemon, if enabled, will automatically issue
+ <code class="command">ANALYZE</code> commands whenever the content of a table has
+ changed sufficiently. However, administrators might prefer to rely
+ on manually-scheduled <code class="command">ANALYZE</code> operations, particularly
+ if it is known that update activity on a table will not affect the
+ statistics of <span class="quote">“<span class="quote">interesting</span>”</span> columns. The daemon schedules
+ <code class="command">ANALYZE</code> strictly as a function of the number of rows
+ inserted or updated; it has no knowledge of whether that will lead
+ to meaningful statistical changes.
+ </p><p>
+ Tuples changed in partitions and inheritance children do not trigger
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum, and the statistics for
+ the inheritance tree as a whole won't be collected. It is necessary to
+ run <code class="command">ANALYZE</code> on the parent table manually in order to
+ keep the statistics up to date.
+ </p><p>
+ As with vacuuming for space recovery, frequent updates of statistics
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there might be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
+ the minimum and maximum values of the columns in the table change.
+ For example, a <code class="type">timestamp</code> column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column might receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
+ </p><p>
+ It is possible to run <code class="command">ANALYZE</code> on specific tables and even
+ just specific columns of a table, so the flexibility exists to update some
+ statistics more frequently than others if your application requires it.
+ In practice, however, it is usually best to just analyze the entire
+ database, because it is a fast operation. <code class="command">ANALYZE</code> uses a
+ statistically random sampling of the rows of a table rather than reading
+ every single row.
+ </p><div class="tip"><h3 class="title">Tip</h3><p>
+ Although per-column tweaking of <code class="command">ANALYZE</code> frequency might not be
+ very productive, you might find it worthwhile to do per-column
+ adjustment of the level of detail of the statistics collected by
+ <code class="command">ANALYZE</code>. Columns that are heavily used in <code class="literal">WHERE</code>
+ clauses and have highly irregular data distributions might require a
+ finer-grain data histogram than other columns. See <code class="command">ALTER TABLE
+ SET STATISTICS</code>, or change the database-wide default using the <a class="xref" href="runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET">default_statistics_target</a> configuration parameter.
+ </p><p>
+ Also, by default there is limited information available about
+ the selectivity of functions. However, if you create a statistics
+ object or an expression
+ index that uses a function call, useful statistics will be
+ gathered about the function, which can greatly improve query
+ plans that use the expression index.
+ </p></div><div class="tip"><h3 class="title">Tip</h3><p>
+ The autovacuum daemon does not issue <code class="command">ANALYZE</code> commands for
+ foreign tables, since it has no means of determining how often that
+ might be useful. If your queries require statistics on foreign tables
+ for proper planning, it's a good idea to run manually-managed
+ <code class="command">ANALYZE</code> commands on those tables on a suitable schedule.
+ </p></div><div class="tip"><h3 class="title">Tip</h3><p>
+ The autovacuum daemon does not issue <code class="command">ANALYZE</code> commands
+ for partitioned tables. Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table. If your queries require statistics on
+ parent tables for proper planning, it is necessary to periodically run
+ a manual <code class="command">ANALYZE</code> on those tables to keep the statistics
+ up to date.
+ </p></div></div><div class="sect2" id="VACUUM-FOR-VISIBILITY-MAP"><div class="titlepage"><div><div><h3 class="title">25.1.4. Updating the Visibility Map</h3></div></div></div><p>
+ Vacuum maintains a <a class="link" href="storage-vm.html" title="70.4. Visibility Map">visibility map</a> for each
+ table to keep track of which pages contain only tuples that are known to be
+ visible to all active transactions (and all future transactions, until the
+ page is again modified). This has two purposes. First, vacuum
+ itself can skip such pages on the next run, since there is nothing to
+ clean up.
+ </p><p>
+ Second, it allows <span class="productname">PostgreSQL</span> to answer some
+ queries using only the index, without reference to the underlying table.
+ Since <span class="productname">PostgreSQL</span> indexes don't contain tuple
+ visibility information, a normal index scan fetches the heap tuple for each
+ matching index entry, to check whether it should be seen by the current
+ transaction.
+ An <a class="link" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes"><em class="firstterm">index-only
+ scan</em></a>, on the other hand, checks the visibility map first.
+ If it's known that all tuples on the page are
+ visible, the heap fetch can be skipped. This is most useful on
+ large data sets where the visibility map can prevent disk accesses.
+ The visibility map is vastly smaller than the heap, so it can easily be
+ cached even when the heap is very large.
+ </p></div><div class="sect2" id="VACUUM-FOR-WRAPAROUND"><div class="titlepage"><div><div><h3 class="title">25.1.5. Preventing Transaction ID Wraparound Failures</h3></div></div></div><a id="id-1.6.12.10.8.2" class="indexterm"></a><a id="id-1.6.12.10.8.3" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span>'s
+ <a class="link" href="mvcc-intro.html" title="13.1. Introduction">MVCC</a> transaction semantics
+ depend on being able to compare transaction ID (<acronym class="acronym">XID</acronym>)
+ numbers: a row version with an insertion XID greater than the current
+ transaction's XID is <span class="quote">“<span class="quote">in the future</span>”</span> and should not be visible
+ to the current transaction. But since transaction IDs have limited size
+ (32 bits) a cluster that runs for a long time (more
+ than 4 billion transactions) would suffer <em class="firstterm">transaction ID
+ wraparound</em>: the XID counter wraps around to zero, and all of a sudden
+ transactions that were in the past appear to be in the future — which
+ means their output become invisible. In short, catastrophic data loss.
+ (Actually the data is still there, but that's cold comfort if you cannot
+ get at it.) To avoid this, it is necessary to vacuum every table
+ in every database at least once every two billion transactions.
+ </p><p>
+ The reason that periodic vacuuming solves the problem is that
+ <code class="command">VACUUM</code> will mark rows as <span class="emphasis"><em>frozen</em></span>, indicating that
+ they were inserted by a transaction that committed sufficiently far in
+ the past that the effects of the inserting transaction are certain to be
+ visible to all current and future transactions.
+ Normal XIDs are
+ compared using modulo-2<sup>32</sup> arithmetic. This means
+ that for every normal XID, there are two billion XIDs that are
+ <span class="quote">“<span class="quote">older</span>”</span> and two billion that are <span class="quote">“<span class="quote">newer</span>”</span>; another
+ way to say it is that the normal XID space is circular with no
+ endpoint. Therefore, once a row version has been created with a particular
+ normal XID, the row version will appear to be <span class="quote">“<span class="quote">in the past</span>”</span> for
+ the next two billion transactions, no matter which normal XID we are
+ talking about. If the row version still exists after more than two billion
+ transactions, it will suddenly appear to be in the future. To
+ prevent this, <span class="productname">PostgreSQL</span> reserves a special XID,
+ <code class="literal">FrozenTransactionId</code>, which does not follow the normal XID
+ comparison rules and is always considered older
+ than every normal XID.
+ Frozen row versions are treated as if the inserting XID were
+ <code class="literal">FrozenTransactionId</code>, so that they will appear to be
+ <span class="quote">“<span class="quote">in the past</span>”</span> to all normal transactions regardless of wraparound
+ issues, and so such row versions will be valid until deleted, no matter
+ how long that is.
+ </p><div class="note"><h3 class="title">Note</h3><p>
+ In <span class="productname">PostgreSQL</span> versions before 9.4, freezing was
+ implemented by actually replacing a row's insertion XID
+ with <code class="literal">FrozenTransactionId</code>, which was visible in the
+ row's <code class="structname">xmin</code> system column. Newer versions just set a flag
+ bit, preserving the row's original <code class="structname">xmin</code> for possible
+ forensic use. However, rows with <code class="structname">xmin</code> equal
+ to <code class="literal">FrozenTransactionId</code> (2) may still be found
+ in databases <span class="application">pg_upgrade</span>'d from pre-9.4 versions.
+ </p><p>
+ Also, system catalogs may contain rows with <code class="structname">xmin</code> equal
+ to <code class="literal">BootstrapTransactionId</code> (1), indicating that they were
+ inserted during the first phase of <span class="application">initdb</span>.
+ Like <code class="literal">FrozenTransactionId</code>, this special XID is treated as
+ older than every normal XID.
+ </p></div><p>
+ <a class="xref" href="runtime-config-client.html#GUC-VACUUM-FREEZE-MIN-AGE">vacuum_freeze_min_age</a>
+ controls how old an XID value has to be before rows bearing that XID will be
+ frozen. Increasing this setting may avoid unnecessary work if the
+ rows that would otherwise be frozen will soon be modified again,
+ but decreasing this setting increases
+ the number of transactions that can elapse before the table must be
+ vacuumed again.
+ </p><p>
+ <code class="command">VACUUM</code> uses the <a class="link" href="storage-vm.html" title="70.4. Visibility Map">visibility map</a>
+ to determine which pages of a table must be scanned. Normally, it
+ will skip pages that don't have any dead row versions even if those pages
+ might still have row versions with old XID values. Therefore, normal
+ <code class="command">VACUUM</code>s won't always freeze every old row version in the table.
+ Periodically, <code class="command">VACUUM</code> will perform an <em class="firstterm">aggressive
+ vacuum</em>, skipping only those pages which contain neither dead rows nor
+ any unfrozen XID or MXID values.
+ <a class="xref" href="runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE">vacuum_freeze_table_age</a>
+ controls when <code class="command">VACUUM</code> does that: all-visible but not all-frozen
+ pages are scanned if the number of transactions that have passed since the
+ last such scan is greater than <code class="varname">vacuum_freeze_table_age</code> minus
+ <code class="varname">vacuum_freeze_min_age</code>. Setting
+ <code class="varname">vacuum_freeze_table_age</code> to 0 forces <code class="command">VACUUM</code> to
+ use this more aggressive strategy for all scans.
+ </p><p>
+ The maximum time that a table can go unvacuumed is two billion
+ transactions minus the <code class="varname">vacuum_freeze_min_age</code> value at
+ the time of the last aggressive vacuum. If it were to go
+ unvacuumed for longer than
+ that, data loss could result. To ensure that this does not happen,
+ autovacuum is invoked on any table that might contain unfrozen rows with
+ XIDs older than the age specified by the configuration parameter <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE">autovacuum_freeze_max_age</a>. (This will happen even if
+ autovacuum is disabled.)
+ </p><p>
+ This implies that if a table is not otherwise vacuumed,
+ autovacuum will be invoked on it approximately once every
+ <code class="varname">autovacuum_freeze_max_age</code> minus
+ <code class="varname">vacuum_freeze_min_age</code> transactions.
+ For tables that are regularly vacuumed for space reclamation purposes,
+ this is of little importance. However, for static tables
+ (including tables that receive inserts, but no updates or deletes),
+ there is no need to vacuum for space reclamation, so it can
+ be useful to try to maximize the interval between forced autovacuums
+ on very large static tables. Obviously one can do this either by
+ increasing <code class="varname">autovacuum_freeze_max_age</code> or decreasing
+ <code class="varname">vacuum_freeze_min_age</code>.
+ </p><p>
+ The effective maximum for <code class="varname">vacuum_freeze_table_age</code> is 0.95 *
+ <code class="varname">autovacuum_freeze_max_age</code>; a setting higher than that will be
+ capped to the maximum. A value higher than
+ <code class="varname">autovacuum_freeze_max_age</code> wouldn't make sense because an
+ anti-wraparound autovacuum would be triggered at that point anyway, and
+ the 0.95 multiplier leaves some breathing room to run a manual
+ <code class="command">VACUUM</code> before that happens. As a rule of thumb,
+ <code class="command">vacuum_freeze_table_age</code> should be set to a value somewhat
+ below <code class="varname">autovacuum_freeze_max_age</code>, leaving enough gap so that
+ a regularly scheduled <code class="command">VACUUM</code> or an autovacuum triggered by
+ normal delete and update activity is run in that window. Setting it too
+ close could lead to anti-wraparound autovacuums, even though the table
+ was recently vacuumed to reclaim space, whereas lower values lead to more
+ frequent aggressive vacuuming.
+ </p><p>
+ The sole disadvantage of increasing <code class="varname">autovacuum_freeze_max_age</code>
+ (and <code class="varname">vacuum_freeze_table_age</code> along with it) is that
+ the <code class="filename">pg_xact</code> and <code class="filename">pg_commit_ts</code>
+ subdirectories of the database cluster will take more space, because it
+ must store the commit status and (if <code class="varname">track_commit_timestamp</code> is
+ enabled) timestamp of all transactions back to
+ the <code class="varname">autovacuum_freeze_max_age</code> horizon. The commit status uses
+ two bits per transaction, so if
+ <code class="varname">autovacuum_freeze_max_age</code> is set to its maximum allowed value
+ of two billion, <code class="filename">pg_xact</code> can be expected to grow to about half
+ a gigabyte and <code class="filename">pg_commit_ts</code> to about 20GB. If this
+ is trivial compared to your total database size,
+ setting <code class="varname">autovacuum_freeze_max_age</code> to its maximum allowed value
+ is recommended. Otherwise, set it depending on what you are willing to
+ allow for <code class="filename">pg_xact</code> and <code class="filename">pg_commit_ts</code> storage.
+ (The default, 200 million transactions, translates to about 50MB
+ of <code class="filename">pg_xact</code> storage and about 2GB of <code class="filename">pg_commit_ts</code>
+ storage.)
+ </p><p>
+ One disadvantage of decreasing <code class="varname">vacuum_freeze_min_age</code> is that
+ it might cause <code class="command">VACUUM</code> to do useless work: freezing a row
+ version is a waste of time if the row is modified
+ soon thereafter (causing it to acquire a new XID). So the setting should
+ be large enough that rows are not frozen until they are unlikely to change
+ any more.
+ </p><p>
+ To track the age of the oldest unfrozen XIDs in a database,
+ <code class="command">VACUUM</code> stores XID
+ statistics in the system tables <code class="structname">pg_class</code> and
+ <code class="structname">pg_database</code>. In particular,
+ the <code class="structfield">relfrozenxid</code> column of a table's
+ <code class="structname">pg_class</code> row contains the freeze cutoff XID that was used
+ by the last aggressive <code class="command">VACUUM</code> for that table. All rows
+ inserted by transactions with XIDs older than this cutoff XID are
+ guaranteed to have been frozen. Similarly,
+ the <code class="structfield">datfrozenxid</code> column of a database's
+ <code class="structname">pg_database</code> row is a lower bound on the unfrozen XIDs
+ appearing in that database — it is just the minimum of the
+ per-table <code class="structfield">relfrozenxid</code> values within the database.
+ A convenient way to
+ examine this information is to execute queries such as:
+
+</p><pre class="programlisting">
+SELECT c.oid::regclass as table_name,
+ greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
+FROM pg_class c
+LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
+WHERE c.relkind IN ('r', 'm');
+
+SELECT datname, age(datfrozenxid) FROM pg_database;
+</pre><p>
+
+ The <code class="literal">age</code> column measures the number of transactions from the
+ cutoff XID to the current transaction's XID.
+ </p><p>
+ <code class="command">VACUUM</code> normally only scans pages that have been modified
+ since the last vacuum, but <code class="structfield">relfrozenxid</code> can only be
+ advanced when every page of the table
+ that might contain unfrozen XIDs is scanned. This happens when
+ <code class="structfield">relfrozenxid</code> is more than
+ <code class="varname">vacuum_freeze_table_age</code> transactions old, when
+ <code class="command">VACUUM</code>'s <code class="literal">FREEZE</code> option is used, or when all
+ pages that are not already all-frozen happen to
+ require vacuuming to remove dead row versions. When <code class="command">VACUUM</code>
+ scans every page in the table that is not already all-frozen, it should
+ set <code class="literal">age(relfrozenxid)</code> to a value just a little more than the
+ <code class="varname">vacuum_freeze_min_age</code> setting
+ that was used (more by the number of transactions started since the
+ <code class="command">VACUUM</code> started). If no <code class="structfield">relfrozenxid</code>-advancing
+ <code class="command">VACUUM</code> is issued on the table until
+ <code class="varname">autovacuum_freeze_max_age</code> is reached, an autovacuum will soon
+ be forced for the table.
+ </p><p>
+ If for some reason autovacuum fails to clear old XIDs from a table, the
+ system will begin to emit warning messages like this when the database's
+ oldest XIDs reach forty million transactions from the wraparound point:
+
+</p><pre class="programlisting">
+WARNING: database "mydb" must be vacuumed within 39985967 transactions
+HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
+</pre><p>
+
+ (A manual <code class="command">VACUUM</code> should fix the problem, as suggested by the
+ hint; but note that the <code class="command">VACUUM</code> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <code class="structfield">datfrozenxid</code>.)
+ If these warnings are
+ ignored, the system will shut down and refuse to start any new
+ transactions once there are fewer than three million transactions left
+ until wraparound:
+
+</p><pre class="programlisting">
+ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
+HINT: Stop the postmaster and vacuum that database in single-user mode.
+</pre><p>
+
+ The three-million-transaction safety margin exists to let the
+ administrator recover without data loss, by manually executing the
+ required <code class="command">VACUUM</code> commands. However, since the system will not
+ execute commands once it has gone into the safety shutdown mode,
+ the only way to do this is to stop the server and start the server in single-user
+ mode to execute <code class="command">VACUUM</code>. The shutdown mode is not enforced
+ in single-user mode. See the <a class="xref" href="app-postgres.html" title="postgres"><span class="refentrytitle"><span class="application">postgres</span></span></a> reference
+ page for details about using single-user mode.
+ </p><div class="sect3" id="VACUUM-FOR-MULTIXACT-WRAPAROUND"><div class="titlepage"><div><div><h4 class="title">25.1.5.1. Multixacts and Wraparound</h4></div></div></div><a id="id-1.6.12.10.8.17.2" class="indexterm"></a><a id="id-1.6.12.10.8.17.3" class="indexterm"></a><p>
+ <em class="firstterm">Multixact IDs</em> are used to support row locking by
+ multiple transactions. Since there is only limited space in a tuple
+ header to store lock information, that information is encoded as
+ a <span class="quote">“<span class="quote">multiple transaction ID</span>”</span>, or multixact ID for short,
+ whenever there is more than one transaction concurrently locking a
+ row. Information about which transaction IDs are included in any
+ particular multixact ID is stored separately in
+ the <code class="filename">pg_multixact</code> subdirectory, and only the multixact ID
+ appears in the <code class="structfield">xmax</code> field in the tuple header.
+ Like transaction IDs, multixact IDs are implemented as a
+ 32-bit counter and corresponding storage, all of which requires
+ careful aging management, storage cleanup, and wraparound handling.
+ There is a separate storage area which holds the list of members in
+ each multixact, which also uses a 32-bit counter and which must also
+ be managed.
+ </p><p>
+ Whenever <code class="command">VACUUM</code> scans any part of a table, it will replace
+ any multixact ID it encounters which is older than
+ <a class="xref" href="runtime-config-client.html#GUC-VACUUM-MULTIXACT-FREEZE-MIN-AGE">vacuum_multixact_freeze_min_age</a>
+ by a different value, which can be the zero value, a single
+ transaction ID, or a newer multixact ID. For each table,
+ <code class="structname">pg_class</code>.<code class="structfield">relminmxid</code> stores the oldest
+ possible multixact ID still appearing in any tuple of that table.
+ If this value is older than
+ <a class="xref" href="runtime-config-client.html#GUC-VACUUM-MULTIXACT-FREEZE-TABLE-AGE">vacuum_multixact_freeze_table_age</a>, an aggressive
+ vacuum is forced. As discussed in the previous section, an aggressive
+ vacuum means that only those pages which are known to be all-frozen will
+ be skipped. <code class="function">mxid_age()</code> can be used on
+ <code class="structname">pg_class</code>.<code class="structfield">relminmxid</code> to find its age.
+ </p><p>
+ Aggressive <code class="command">VACUUM</code> scans, regardless of
+ what causes them, enable advancing the value for that table.
+ Eventually, as all tables in all databases are scanned and their
+ oldest multixact values are advanced, on-disk storage for older
+ multixacts can be removed.
+ </p><p>
+ As a safety device, an aggressive vacuum scan will
+ occur for any table whose multixact-age is greater than <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MULTIXACT-FREEZE-MAX-AGE">autovacuum_multixact_freeze_max_age</a>. Also, if the
+ storage occupied by multixacts members exceeds 2GB, aggressive vacuum
+ scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled.
+ </p></div></div><div class="sect2" id="AUTOVACUUM"><div class="titlepage"><div><div><h3 class="title">25.1.6. The Autovacuum Daemon</h3></div></div></div><a id="id-1.6.12.10.9.2" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span> has an optional but highly
+ recommended feature called <em class="firstterm">autovacuum</em>,
+ whose purpose is to automate the execution of
+ <code class="command">VACUUM</code> and <code class="command">ANALYZE</code> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <a class="xref" href="runtime-config-statistics.html#GUC-TRACK-COUNTS">track_counts</a> is set to <code class="literal">true</code>.
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </p><p>
+ The <span class="quote">“<span class="quote">autovacuum daemon</span>”</span> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <em class="firstterm">autovacuum launcher</em>, which is in charge of starting
+ <em class="firstterm">autovacuum worker</em> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-NAPTIME">autovacuum_naptime</a>
+ seconds. (Therefore, if the installation has <em class="replaceable"><code>N</code></em> databases,
+ a new worker will be launched every
+ <code class="varname">autovacuum_naptime</code>/<em class="replaceable"><code>N</code></em> seconds.)
+ A maximum of <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MAX-WORKERS">autovacuum_max_workers</a> worker processes
+ are allowed to run at the same time. If there are more than
+ <code class="varname">autovacuum_max_workers</code> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <code class="command">VACUUM</code> and/or <code class="command">ANALYZE</code> as needed.
+ <a class="xref" href="runtime-config-logging.html#GUC-LOG-AUTOVACUUM-MIN-DURATION">log_autovacuum_min_duration</a> can be set to monitor
+ autovacuum workers' activity.
+ </p><p>
+ If several large tables all become eligible for vacuuming in a short
+ amount of time, all autovacuum workers might become occupied with
+ vacuuming those tables for a long period. This would result
+ in other tables and databases not being vacuumed until a worker becomes
+ available. There is no limit on how many workers might be in a
+ single database, but workers do try to avoid repeating work that has
+ already been done by other workers. Note that the number of running
+ workers does not count towards <a class="xref" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS">max_connections</a> or
+ <a class="xref" href="runtime-config-connection.html#GUC-SUPERUSER-RESERVED-CONNECTIONS">superuser_reserved_connections</a> limits.
+ </p><p>
+ Tables whose <code class="structfield">relfrozenxid</code> value is more than
+ <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE">autovacuum_freeze_max_age</a> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <code class="command">VACUUM</code> exceeds the <span class="quote">“<span class="quote">vacuum threshold</span>”</span>, the
+ table is vacuumed. The vacuum threshold is defined as:
+</p><pre class="programlisting">
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</pre><p>
+ where the vacuum base threshold is
+ <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-THRESHOLD">autovacuum_vacuum_threshold</a>,
+ the vacuum scale factor is
+ <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-SCALE-FACTOR">autovacuum_vacuum_scale_factor</a>,
+ and the number of tuples is
+ <code class="structname">pg_class</code>.<code class="structfield">reltuples</code>.
+ </p><p>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
+</p><pre class="programlisting">
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</pre><p>
+ where the vacuum insert base threshold is
+ <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-INSERT-THRESHOLD">autovacuum_vacuum_insert_threshold</a>,
+ and vacuum insert scale factor is
+ <a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-INSERT-SCALE-FACTOR">autovacuum_vacuum_insert_scale_factor</a>.
+ Such vacuums may allow portions of the table to be marked as
+ <em class="firstterm">all visible</em> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <code class="command">INSERT</code> operations but no or
+ almost no <code class="command">UPDATE</code>/<code class="command">DELETE</code> operations,
+ it may be beneficial to lower the table's
+ <a class="xref" href="sql-createtable.html#RELOPTION-AUTOVACUUM-FREEZE-MIN-AGE">autovacuum_freeze_min_age</a> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the statistics collector;
+ it is a semi-accurate count updated by each <code class="command">UPDATE</code>,
+ <code class="command">DELETE</code> and <code class="command">INSERT</code> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <code class="structfield">relfrozenxid</code> value of the table
+ is more than <code class="varname">vacuum_freeze_table_age</code> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <code class="structfield">relfrozenxid</code>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </p><p>
+ For analyze, a similar condition is used: the threshold, defined as:
+</p><pre class="programlisting">
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</pre><p>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <code class="command">ANALYZE</code>.
+ </p><p>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <code class="command">ANALYZE</code> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </p><p>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </p><p>
+ The default thresholds and scale factors are taken from
+ <code class="filename">postgresql.conf</code>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <a class="xref" href="sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS" title="Storage Parameters">Storage Parameters</a> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <a class="xref" href="runtime-config-autovacuum.html" title="20.10. Automatic Vacuuming">Section 20.10</a> for more details on
+ the global settings.
+ </p><p>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <a class="xref" href="runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST" title="20.4.4. Cost-based Vacuum Delay">Section 20.4.4</a>) are
+ <span class="quote">“<span class="quote">balanced</span>”</span> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <code class="literal">autovacuum_vacuum_cost_delay</code> or
+ <code class="literal">autovacuum_vacuum_cost_limit</code> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </p><p>
+ Autovacuum workers generally don't block other commands. If a process
+ attempts to acquire a lock that conflicts with the
+ <code class="literal">SHARE UPDATE EXCLUSIVE</code> lock held by autovacuum, lock
+ acquisition will interrupt the autovacuum. For conflicting lock modes,
+ see <a class="xref" href="explicit-locking.html#TABLE-LOCK-COMPATIBILITY" title="Table 13.2. Conflicting Lock Modes">Table 13.2</a>. However, if the autovacuum
+ is running to prevent transaction ID wraparound (i.e., the autovacuum query
+ name in the <code class="structname">pg_stat_activity</code> view ends with
+ <code class="literal">(to prevent wraparound)</code>), the autovacuum is not
+ automatically interrupted.
+ </p><div class="warning"><h3 class="title">Warning</h3><p>
+ Regularly running commands that acquire locks conflicting with a
+ <code class="literal">SHARE UPDATE EXCLUSIVE</code> lock (e.g., ANALYZE) can
+ effectively prevent autovacuums from ever completing.
+ </p></div></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="maintenance.html" title="Chapter 25. Routine Database Maintenance Tasks">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="maintenance.html" title="Chapter 25. Routine Database Maintenance Tasks">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="routine-reindex.html" title="25.2. Routine Reindexing">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 25. Routine Database Maintenance Tasks </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 25.2. Routine Reindexing</td></tr></table></div></body></html> \ No newline at end of file