diff options
Diffstat (limited to 'doc/src/sgml/maintenance.sgml')
-rw-r--r-- | doc/src/sgml/maintenance.sgml | 1204 |
1 files changed, 1204 insertions, 0 deletions
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml new file mode 100644 index 0000000..66f2c6a --- /dev/null +++ b/doc/src/sgml/maintenance.sgml @@ -0,0 +1,1204 @@ +<!-- doc/src/sgml/maintenance.sgml --> + +<chapter id="maintenance"> + <title>Routine Database Maintenance Tasks</title> + + <indexterm zone="maintenance"> + <primary>maintenance</primary> + </indexterm> + + <indexterm zone="maintenance"> + <primary>routine maintenance</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname>, like any database software, requires that certain tasks + be performed regularly to achieve optimum performance. The tasks + discussed here are <emphasis>required</emphasis>, but they + are repetitive in nature and can easily be automated using standard + tools such as <application>cron</application> scripts or + Windows' <application>Task Scheduler</application>. It is the database + administrator's responsibility to set up appropriate scripts, and to + check that they execute successfully. + </para> + + <para> + One obvious maintenance task is the creation of backup copies of the data on a + regular schedule. Without a recent backup, you have no chance of recovery + after a catastrophe (disk failure, fire, mistakenly dropping a critical + table, etc.). The backup and recovery mechanisms available in + <productname>PostgreSQL</productname> are discussed at length in + <xref linkend="backup"/>. + </para> + + <para> + The other main category of maintenance task is periodic <quote>vacuuming</quote> + of the database. This activity is discussed in + <xref linkend="routine-vacuuming"/>. Closely related to this is updating + the statistics that will be used by the query planner, as discussed in + <xref linkend="vacuum-for-statistics"/>. + </para> + + <para> + Another task that might need periodic attention is log file management. + This is discussed in <xref linkend="logfile-maintenance"/>. + </para> + + <para> + <ulink + url="https://bucardo.org/check_postgres/"><application>check_postgres</application></ulink> + is available for monitoring database health and reporting unusual + conditions. <application>check_postgres</application> integrates with + Nagios and MRTG, but can be run standalone too. + </para> + + <para> + <productname>PostgreSQL</productname> is low-maintenance compared + to some other database management systems. Nonetheless, + appropriate attention to these tasks will go far towards ensuring a + pleasant and productive experience with the system. + </para> + + <sect1 id="routine-vacuuming"> + <title>Routine Vacuuming</title> + + <indexterm zone="routine-vacuuming"> + <primary>vacuum</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> databases require periodic + maintenance known as <firstterm>vacuuming</firstterm>. For many installations, it + is sufficient to let vacuuming be performed by the <firstterm>autovacuum + daemon</firstterm>, which is described in <xref linkend="autovacuum"/>. You might + need to adjust the autovacuuming parameters described there to obtain best + results for your situation. Some database administrators will want to + supplement or replace the daemon's activities with manually-managed + <command>VACUUM</command> commands, which typically are executed according to a + schedule by <application>cron</application> or <application>Task + Scheduler</application> scripts. To set up manually-managed vacuuming properly, + it is essential to understand the issues discussed in the next few + subsections. Administrators who rely on autovacuuming may still wish + to skim this material to help them understand and adjust autovacuuming. + </para> + + <sect2 id="vacuum-basics"> + <title>Vacuuming Basics</title> + + <para> + <productname>PostgreSQL</productname>'s + <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to + process each table on a regular basis for several reasons: + + <orderedlist> + <listitem> + <simpara>To recover or reuse disk space occupied by updated or deleted + rows.</simpara> + </listitem> + + <listitem> + <simpara>To update data statistics used by the + <productname>PostgreSQL</productname> query planner.</simpara> + </listitem> + + <listitem> + <simpara>To update the visibility map, which speeds + up <link linkend="indexes-index-only-scans">index-only + scans</link>.</simpara> + </listitem> + + <listitem> + <simpara>To protect against loss of very old data due to + <firstterm>transaction ID wraparound</firstterm> or + <firstterm>multixact ID wraparound</firstterm>.</simpara> + </listitem> + </orderedlist> + + Each of these reasons dictates performing <command>VACUUM</command> operations + of varying frequency and scope, as explained in the following subsections. + </para> + + <para> + There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command> + and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more + disk space but runs much more slowly. Also, + the standard form of <command>VACUUM</command> can run in parallel with production + database operations. (Commands such as <command>SELECT</command>, + <command>INSERT</command>, <command>UPDATE</command>, and + <command>DELETE</command> will continue to function normally, though you + will not be able to modify the definition of a table with commands such as + <command>ALTER TABLE</command> while it is being vacuumed.) + <command>VACUUM FULL</command> requires an + <literal>ACCESS EXCLUSIVE</literal> lock on the table it is + working on, and therefore cannot be done in parallel with other use + of the table. Generally, therefore, + administrators should strive to use standard <command>VACUUM</command> and + avoid <command>VACUUM FULL</command>. + </para> + + <para> + <command>VACUUM</command> creates a substantial amount of I/O + traffic, which can cause poor performance for other active sessions. + There are configuration parameters that can be adjusted to reduce the + performance impact of background vacuuming — see + <xref linkend="runtime-config-resource-vacuum-cost"/>. + </para> + </sect2> + + <sect2 id="vacuum-for-space-recovery"> + <title>Recovering Disk Space</title> + + <indexterm zone="vacuum-for-space-recovery"> + <primary>disk space</primary> + </indexterm> + + <para> + In <productname>PostgreSQL</productname>, an + <command>UPDATE</command> or <command>DELETE</command> of a row does not + immediately remove the old version of the row. + This approach is necessary to gain the benefits of multiversion + concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version + must not be deleted while it is still potentially visible to other + transactions. But eventually, an outdated or deleted row version is no + longer of interest to any transaction. The space it occupies must then be + reclaimed for reuse by new rows, to avoid unbounded growth of disk + space requirements. This is done by running <command>VACUUM</command>. + </para> + + <para> + The standard form of <command>VACUUM</command> removes dead row + versions in tables and indexes and marks the space available for + future reuse. However, it will not return the space to the operating + system, except in the special case where one or more pages at the + end of a table become entirely free and an exclusive table lock can be + easily obtained. In contrast, <command>VACUUM FULL</command> actively compacts + tables by writing a complete new version of the table file with no dead + space. This minimizes the size of the table, but can take a long time. + It also requires extra disk space for the new copy of the table, until + the operation completes. + </para> + + <para> + The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s + often enough to avoid needing <command>VACUUM FULL</command>. The + autovacuum daemon attempts to work this way, and in fact will + never issue <command>VACUUM FULL</command>. In this approach, the idea + is not to keep tables at their minimum size, but to maintain steady-state + usage of disk space: each table occupies space equivalent to its + minimum size plus however much space gets used up between vacuum runs. + Although <command>VACUUM FULL</command> can be used to shrink a table back + to its minimum size and return the disk space to the operating system, + there is not much point in this if the table will just grow again in the + future. Thus, moderately-frequent standard <command>VACUUM</command> runs are a + better approach than infrequent <command>VACUUM FULL</command> runs for + maintaining heavily-updated tables. + </para> + + <para> + Some administrators prefer to schedule vacuuming themselves, for example + doing all the work at night when load is low. + The difficulty with doing vacuuming according to a fixed schedule + is that if a table has an unexpected spike in update activity, it may + get bloated to the point that <command>VACUUM FULL</command> is really necessary + to reclaim space. Using the autovacuum daemon alleviates this problem, + since the daemon schedules vacuuming dynamically in response to update + activity. It is unwise to disable the daemon completely unless you + have an extremely predictable workload. One possible compromise is + to set the daemon's parameters so that it will only react to unusually + heavy update activity, thus keeping things from getting out of hand, + while scheduled <command>VACUUM</command>s are expected to do the bulk of the + work when the load is typical. + </para> + + <para> + For those not using autovacuum, a typical approach is to schedule a + database-wide <command>VACUUM</command> once a day during a low-usage period, + supplemented by more frequent vacuuming of heavily-updated tables as + necessary. (Some installations with extremely high update rates vacuum + their busiest tables as often as once every few minutes.) If you have + multiple databases in a cluster, don't forget to + <command>VACUUM</command> each one; the program <xref + linkend="app-vacuumdb"/> might be helpful. + </para> + + <tip> + <para> + Plain <command>VACUUM</command> may not be satisfactory when + a table contains large numbers of dead row versions as a result of + massive update or delete activity. If you have such a table and + you need to reclaim the excess disk space it occupies, you will need + to use <command>VACUUM FULL</command>, or alternatively + <link linkend="sql-cluster"><command>CLUSTER</command></link> + or one of the table-rewriting variants of + <link linkend="sql-altertable"><command>ALTER TABLE</command></link>. + These commands rewrite an entire new copy of the table and build + new indexes for it. All these options require an + <literal>ACCESS EXCLUSIVE</literal> lock. Note that + they also temporarily use extra disk space approximately equal to the size + of the table, since the old copies of the table and indexes can't be + released until the new ones are complete. + </para> + </tip> + + <tip> + <para> + If you have a table whose entire contents are deleted on a periodic + basis, consider doing it with + <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather + than using <command>DELETE</command> followed by + <command>VACUUM</command>. <command>TRUNCATE</command> removes the + entire content of the table immediately, without requiring a + subsequent <command>VACUUM</command> or <command>VACUUM + FULL</command> to reclaim the now-unused disk space. + The disadvantage is that strict MVCC semantics are violated. + </para> + </tip> + </sect2> + + <sect2 id="vacuum-for-statistics"> + <title>Updating Planner Statistics</title> + + <indexterm zone="vacuum-for-statistics"> + <primary>statistics</primary> + <secondary>of the planner</secondary> + </indexterm> + + <indexterm zone="vacuum-for-statistics"> + <primary>ANALYZE</primary> + </indexterm> + + <para> + The <productname>PostgreSQL</productname> query planner relies on + statistical information about the contents of tables in order to + generate good plans for queries. These statistics are gathered by + the <link linkend="sql-analyze"><command>ANALYZE</command></link> command, + which can be invoked by itself or + as an optional step in <command>VACUUM</command>. It is important to have + reasonably accurate statistics, otherwise poor choices of plans might + degrade database performance. + </para> + + <para> + The autovacuum daemon, if enabled, will automatically issue + <command>ANALYZE</command> commands whenever the content of a table has + changed sufficiently. However, administrators might prefer to rely + on manually-scheduled <command>ANALYZE</command> operations, particularly + if it is known that update activity on a table will not affect the + statistics of <quote>interesting</quote> columns. The daemon schedules + <command>ANALYZE</command> strictly as a function of the number of rows + inserted or updated; it has no knowledge of whether that will lead + to meaningful statistical changes. + </para> + + <para> + Tuples changed in partitions and inheritance children do not trigger + analyze on the parent table. If the parent table is empty or rarely + changed, it may never be processed by autovacuum, and the statistics for + the inheritance tree as a whole won't be collected. It is necessary to + run <command>ANALYZE</command> on the parent table manually in order to + keep the statistics up to date. + </para> + + <para> + As with vacuuming for space recovery, frequent updates of statistics + are more useful for heavily-updated tables than for seldom-updated + ones. But even for a heavily-updated table, there might be no need for + statistics updates if the statistical distribution of the data is + not changing much. A simple rule of thumb is to think about how much + the minimum and maximum values of the columns in the table change. + For example, a <type>timestamp</type> column that contains the time + of row update will have a constantly-increasing maximum value as + rows are added and updated; such a column will probably need more + frequent statistics updates than, say, a column containing URLs for + pages accessed on a website. The URL column might receive changes just + as often, but the statistical distribution of its values probably + changes relatively slowly. + </para> + + <para> + It is possible to run <command>ANALYZE</command> on specific tables and even + just specific columns of a table, so the flexibility exists to update some + statistics more frequently than others if your application requires it. + In practice, however, it is usually best to just analyze the entire + database, because it is a fast operation. <command>ANALYZE</command> uses a + statistically random sampling of the rows of a table rather than reading + every single row. + </para> + + <tip> + <para> + Although per-column tweaking of <command>ANALYZE</command> frequency might not be + very productive, you might find it worthwhile to do per-column + adjustment of the level of detail of the statistics collected by + <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal> + clauses and have highly irregular data distributions might require a + finer-grain data histogram than other columns. See <command>ALTER TABLE + SET STATISTICS</command>, or change the database-wide default using the <xref + linkend="guc-default-statistics-target"/> configuration parameter. + </para> + + <para> + Also, by default there is limited information available about + the selectivity of functions. However, if you create a statistics + object or an expression + index that uses a function call, useful statistics will be + gathered about the function, which can greatly improve query + plans that use the expression index. + </para> + </tip> + + <tip> + <para> + The autovacuum daemon does not issue <command>ANALYZE</command> commands for + foreign tables, since it has no means of determining how often that + might be useful. If your queries require statistics on foreign tables + for proper planning, it's a good idea to run manually-managed + <command>ANALYZE</command> commands on those tables on a suitable schedule. + </para> + </tip> + + <tip> + <para> + The autovacuum daemon does not issue <command>ANALYZE</command> commands + for partitioned tables. Inheritance parents will only be analyzed if the + parent itself is changed - changes to child tables do not trigger + autoanalyze on the parent table. If your queries require statistics on + parent tables for proper planning, it is necessary to periodically run + a manual <command>ANALYZE</command> on those tables to keep the statistics + up to date. + </para> + </tip> + + </sect2> + + <sect2 id="vacuum-for-visibility-map"> + <title>Updating the Visibility Map</title> + + <para> + Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each + table to keep track of which pages contain only tuples that are known to be + visible to all active transactions (and all future transactions, until the + page is again modified). This has two purposes. First, vacuum + itself can skip such pages on the next run, since there is nothing to + clean up. + </para> + + <para> + Second, it allows <productname>PostgreSQL</productname> to answer some + queries using only the index, without reference to the underlying table. + Since <productname>PostgreSQL</productname> indexes don't contain tuple + visibility information, a normal index scan fetches the heap tuple for each + matching index entry, to check whether it should be seen by the current + transaction. + An <link linkend="indexes-index-only-scans"><firstterm>index-only + scan</firstterm></link>, on the other hand, checks the visibility map first. + If it's known that all tuples on the page are + visible, the heap fetch can be skipped. This is most useful on + large data sets where the visibility map can prevent disk accesses. + The visibility map is vastly smaller than the heap, so it can easily be + cached even when the heap is very large. + </para> + </sect2> + + <sect2 id="vacuum-for-wraparound"> + <title>Preventing Transaction ID Wraparound Failures</title> + + <indexterm zone="vacuum-for-wraparound"> + <primary>transaction ID</primary> + <secondary>wraparound</secondary> + </indexterm> + + <indexterm> + <primary>wraparound</primary> + <secondary>of transaction IDs</secondary> + </indexterm> + + <para> + <productname>PostgreSQL</productname>'s + <link linkend="mvcc-intro">MVCC</link> transaction semantics + depend on being able to compare transaction ID (<acronym>XID</acronym>) + numbers: a row version with an insertion XID greater than the current + transaction's XID is <quote>in the future</quote> and should not be visible + to the current transaction. But since transaction IDs have limited size + (32 bits) a cluster that runs for a long time (more + than 4 billion transactions) would suffer <firstterm>transaction ID + wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden + transactions that were in the past appear to be in the future — which + means their output become invisible. In short, catastrophic data loss. + (Actually the data is still there, but that's cold comfort if you cannot + get at it.) To avoid this, it is necessary to vacuum every table + in every database at least once every two billion transactions. + </para> + + <para> + The reason that periodic vacuuming solves the problem is that + <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that + they were inserted by a transaction that committed sufficiently far in + the past that the effects of the inserting transaction are certain to be + visible to all current and future transactions. + Normal XIDs are + compared using modulo-2<superscript>32</superscript> arithmetic. This means + that for every normal XID, there are two billion XIDs that are + <quote>older</quote> and two billion that are <quote>newer</quote>; another + way to say it is that the normal XID space is circular with no + endpoint. Therefore, once a row version has been created with a particular + normal XID, the row version will appear to be <quote>in the past</quote> for + the next two billion transactions, no matter which normal XID we are + talking about. If the row version still exists after more than two billion + transactions, it will suddenly appear to be in the future. To + prevent this, <productname>PostgreSQL</productname> reserves a special XID, + <literal>FrozenTransactionId</literal>, which does not follow the normal XID + comparison rules and is always considered older + than every normal XID. + Frozen row versions are treated as if the inserting XID were + <literal>FrozenTransactionId</literal>, so that they will appear to be + <quote>in the past</quote> to all normal transactions regardless of wraparound + issues, and so such row versions will be valid until deleted, no matter + how long that is. + </para> + + <note> + <para> + In <productname>PostgreSQL</productname> versions before 9.4, freezing was + implemented by actually replacing a row's insertion XID + with <literal>FrozenTransactionId</literal>, which was visible in the + row's <structname>xmin</structname> system column. Newer versions just set a flag + bit, preserving the row's original <structname>xmin</structname> for possible + forensic use. However, rows with <structname>xmin</structname> equal + to <literal>FrozenTransactionId</literal> (2) may still be found + in databases <application>pg_upgrade</application>'d from pre-9.4 versions. + </para> + <para> + Also, system catalogs may contain rows with <structname>xmin</structname> equal + to <literal>BootstrapTransactionId</literal> (1), indicating that they were + inserted during the first phase of <application>initdb</application>. + Like <literal>FrozenTransactionId</literal>, this special XID is treated as + older than every normal XID. + </para> + </note> + + <para> + <xref linkend="guc-vacuum-freeze-min-age"/> + controls how old an XID value has to be before rows bearing that XID will be + frozen. Increasing this setting may avoid unnecessary work if the + rows that would otherwise be frozen will soon be modified again, + but decreasing this setting increases + the number of transactions that can elapse before the table must be + vacuumed again. + </para> + + <para> + <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link> + to determine which pages of a table must be scanned. Normally, it + will skip pages that don't have any dead row versions even if those pages + might still have row versions with old XID values. Therefore, normal + <command>VACUUM</command>s won't always freeze every old row version in the table. + When that happens, <command>VACUUM</command> will eventually need to perform an + <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen + XID and MXID values, including those from all-visible but not all-frozen pages. + In practice most tables require periodic aggressive vacuuming. + <xref linkend="guc-vacuum-freeze-table-age"/> + controls when <command>VACUUM</command> does that: all-visible but not all-frozen + pages are scanned if the number of transactions that have passed since the + last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus + <varname>vacuum_freeze_min_age</varname>. Setting + <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to + always use its aggressive strategy. + </para> + + <para> + The maximum time that a table can go unvacuumed is two billion + transactions minus the <varname>vacuum_freeze_min_age</varname> value at + the time of the last aggressive vacuum. If it were to go + unvacuumed for longer than + that, data loss could result. To ensure that this does not happen, + autovacuum is invoked on any table that might contain unfrozen rows with + XIDs older than the age specified by the configuration parameter <xref + linkend="guc-autovacuum-freeze-max-age"/>. (This will happen even if + autovacuum is disabled.) + </para> + + <para> + This implies that if a table is not otherwise vacuumed, + autovacuum will be invoked on it approximately once every + <varname>autovacuum_freeze_max_age</varname> minus + <varname>vacuum_freeze_min_age</varname> transactions. + For tables that are regularly vacuumed for space reclamation purposes, + this is of little importance. However, for static tables + (including tables that receive inserts, but no updates or deletes), + there is no need to vacuum for space reclamation, so it can + be useful to try to maximize the interval between forced autovacuums + on very large static tables. Obviously one can do this either by + increasing <varname>autovacuum_freeze_max_age</varname> or decreasing + <varname>vacuum_freeze_min_age</varname>. + </para> + + <para> + The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 * + <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be + capped to the maximum. A value higher than + <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an + anti-wraparound autovacuum would be triggered at that point anyway, and + the 0.95 multiplier leaves some breathing room to run a manual + <command>VACUUM</command> before that happens. As a rule of thumb, + <command>vacuum_freeze_table_age</command> should be set to a value somewhat + below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that + a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by + normal delete and update activity is run in that window. Setting it too + close could lead to anti-wraparound autovacuums, even though the table + was recently vacuumed to reclaim space, whereas lower values lead to more + frequent aggressive vacuuming. + </para> + + <para> + The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname> + (and <varname>vacuum_freeze_table_age</varname> along with it) is that + the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> + subdirectories of the database cluster will take more space, because it + must store the commit status and (if <varname>track_commit_timestamp</varname> is + enabled) timestamp of all transactions back to + the <varname>autovacuum_freeze_max_age</varname> horizon. The commit status uses + two bits per transaction, so if + <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value + of two billion, <filename>pg_xact</filename> can be expected to grow to about half + a gigabyte and <filename>pg_commit_ts</filename> to about 20GB. If this + is trivial compared to your total database size, + setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value + is recommended. Otherwise, set it depending on what you are willing to + allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage. + (The default, 200 million transactions, translates to about 50MB + of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename> + storage.) + </para> + + <para> + One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that + it might cause <command>VACUUM</command> to do useless work: freezing a row + version is a waste of time if the row is modified + soon thereafter (causing it to acquire a new XID). So the setting should + be large enough that rows are not frozen until they are unlikely to change + any more. + </para> + + <para> + To track the age of the oldest unfrozen XIDs in a database, + <command>VACUUM</command> stores XID + statistics in the system tables <structname>pg_class</structname> and + <structname>pg_database</structname>. In particular, + the <structfield>relfrozenxid</structfield> column of a table's + <structname>pg_class</structname> row contains the oldest remaining unfrozen + XID at the end of the most recent <command>VACUUM</command> that successfully + advanced <structfield>relfrozenxid</structfield> (typically the most recent + aggressive VACUUM). Similarly, the + <structfield>datfrozenxid</structfield> column of a database's + <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs + appearing in that database — it is just the minimum of the + per-table <structfield>relfrozenxid</structfield> values within the database. + A convenient way to + examine this information is to execute queries such as: + +<programlisting> +SELECT c.oid::regclass as table_name, + greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age +FROM pg_class c +LEFT JOIN pg_class t ON c.reltoastrelid = t.oid +WHERE c.relkind IN ('r', 'm'); + +SELECT datname, age(datfrozenxid) FROM pg_database; +</programlisting> + + The <literal>age</literal> column measures the number of transactions from the + cutoff XID to the current transaction's XID. + </para> + + <tip> + <para> + When the <command>VACUUM</command> command's <literal>VERBOSE</literal> + parameter is specified, <command>VACUUM</command> prints various + statistics about the table. This includes information about how + <structfield>relfrozenxid</structfield> and + <structfield>relminmxid</structfield> advanced, and the number of + newly frozen pages. The same details appear in the server log when + autovacuum logging (controlled by <xref + linkend="guc-log-autovacuum-min-duration"/>) reports on a + <command>VACUUM</command> operation executed by autovacuum. + </para> + </tip> + + <para> + <command>VACUUM</command> normally only scans pages that have been modified + since the last vacuum, but <structfield>relfrozenxid</structfield> can only be + advanced when every page of the table + that might contain unfrozen XIDs is scanned. This happens when + <structfield>relfrozenxid</structfield> is more than + <varname>vacuum_freeze_table_age</varname> transactions old, when + <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all + pages that are not already all-frozen happen to + require vacuuming to remove dead row versions. When <command>VACUUM</command> + scans every page in the table that is not already all-frozen, it should + set <literal>age(relfrozenxid)</literal> to a value just a little more than the + <varname>vacuum_freeze_min_age</varname> setting + that was used (more by the number of transactions started since the + <command>VACUUM</command> started). <command>VACUUM</command> + will set <structfield>relfrozenxid</structfield> to the oldest XID + that remains in the table, so it's possible that the final value + will be much more recent than strictly required. + If no <structfield>relfrozenxid</structfield>-advancing + <command>VACUUM</command> is issued on the table until + <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon + be forced for the table. + </para> + + <para> + If for some reason autovacuum fails to clear old XIDs from a table, the + system will begin to emit warning messages like this when the database's + oldest XIDs reach forty million transactions from the wraparound point: + +<programlisting> +WARNING: database "mydb" must be vacuumed within 39985967 transactions +HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. +</programlisting> + + (A manual <command>VACUUM</command> should fix the problem, as suggested by the + hint; but note that the <command>VACUUM</command> should be performed by a + superuser, else it will fail to process system catalogs, which prevent it from + being able to advance the database's <structfield>datfrozenxid</structfield>.) + If these warnings are ignored, the system will refuse to assign new XIDs once + there are fewer than three million transactions left until wraparound: + +<programlisting> +ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" +HINT: Stop the postmaster and vacuum that database in single-user mode. +</programlisting> + + In this condition any transactions already in progress can continue, + but only read-only transactions can be started. Operations that + modify database records or truncate relations will fail. + The <command>VACUUM</command> command can still be run normally. + Contrary to what the hint states, it is not necessary or desirable to stop the + postmaster or enter single user-mode in order to restore normal operation. + Instead, follow these steps: + + <orderedlist> + <listitem> + <simpara>Resolve old prepared transactions. You can find these by checking + <link linkend="view-pg-prepared-xacts">pg_prepared_xacts</link> for rows where + <literal>age(transactionid)</literal> is large. Such transactions should be + committed or rolled back.</simpara> + </listitem> + <listitem> + <simpara>End long-running open transactions. You can find these by checking + <link linkend="monitoring-pg-stat-activity-view">pg_stat_activity</link> for rows where + <literal>age(backend_xid)</literal> or <literal>age(backend_xmin)</literal> is + large. Such transactions should be committed or rolled back, or the session + can be terminated using <literal>pg_terminate_backend</literal>.</simpara> + </listitem> + <listitem> + <simpara>Drop any old replication slots. Use + <link linkend="monitoring-pg-stat-replication-view">pg_stat_replication</link> to + find slots where <literal>age(xmin)</literal> or <literal>age(catalog_xmin)</literal> + is large. In many cases, such slots were created for replication to servers that no + longer exist, or that have been down for a long time. If you drop a slot for a server + that still exists and might still try to connect to that slot, that replica may + need to be rebuilt.</simpara> + </listitem> + <listitem> + <simpara>Execute <command>VACUUM</command> in the target database. A database-wide + <literal>VACUUM</literal> is simplest; to reduce the time required, it as also possible + to issue manual <command>VACUUM</command> commands on the tables where + <structfield>relminxid</structfield> is oldest. Do not use <literal>VACUUM FULL</literal> + in this scenario, because it requires an XID and will therefore fail, except in super-user + mode, where it will instead consume an XID and thus increase the risk of transaction ID + wraparound. Do not use <literal>VACUUM FREEZE</literal> either, because it will do + more than the minimum amount of work required to restore normal operation.</simpara> + </listitem> + <listitem> + <simpara>Once normal operation is restored, ensure that autovacuum is properly configured + in the target database in order to avoid future problems.</simpara> + </listitem> + </orderedlist> + </para> + + <note> + <para> + In earlier versions, it was sometimes necessary to stop the postmaster and + <command>VACUUM</command> the database in a single-user mode. In typical scenarios, this + is no longer necessary, and should be avoided whenever possible, since it involves taking + the system down. It is also riskier, since it disables transaction ID wraparound safeguards + that are designed to prevent data loss. The only reason to use single-user mode in this + scenario is if you wish to <command>TRUNCATE</command> or <command>DROP</command> unneeded + tables to avoid needing to <command>VACUUM</command> them. The three-million-transaction + safety margin exists to let the administrator do this. See the + <xref linkend="app-postgres"/> reference page for details about using single-user mode. + </para> + </note> + + <sect3 id="vacuum-for-multixact-wraparound"> + <title>Multixacts and Wraparound</title> + + <indexterm> + <primary>MultiXactId</primary> + </indexterm> + + <indexterm> + <primary>wraparound</primary> + <secondary>of multixact IDs</secondary> + </indexterm> + + <para> + <firstterm>Multixact IDs</firstterm> are used to support row locking by + multiple transactions. Since there is only limited space in a tuple + header to store lock information, that information is encoded as + a <quote>multiple transaction ID</quote>, or multixact ID for short, + whenever there is more than one transaction concurrently locking a + row. Information about which transaction IDs are included in any + particular multixact ID is stored separately in + the <filename>pg_multixact</filename> subdirectory, and only the multixact ID + appears in the <structfield>xmax</structfield> field in the tuple header. + Like transaction IDs, multixact IDs are implemented as a + 32-bit counter and corresponding storage, all of which requires + careful aging management, storage cleanup, and wraparound handling. + There is a separate storage area which holds the list of members in + each multixact, which also uses a 32-bit counter and which must also + be managed. + </para> + + <para> + Whenever <command>VACUUM</command> scans any part of a table, it will replace + any multixact ID it encounters which is older than + <xref linkend="guc-vacuum-multixact-freeze-min-age"/> + by a different value, which can be the zero value, a single + transaction ID, or a newer multixact ID. For each table, + <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest + possible multixact ID still appearing in any tuple of that table. + If this value is older than + <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive + vacuum is forced. As discussed in the previous section, an aggressive + vacuum means that only those pages which are known to be all-frozen will + be skipped. <function>mxid_age()</function> can be used on + <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age. + </para> + + <para> + Aggressive <command>VACUUM</command>s, regardless of what causes + them, are <emphasis>guaranteed</emphasis> to be able to advance + the table's <structfield>relminmxid</structfield>. + Eventually, as all tables in all databases are scanned and their + oldest multixact values are advanced, on-disk storage for older + multixacts can be removed. + </para> + + <para> + As a safety device, an aggressive vacuum scan will + occur for any table whose multixact-age is greater than <xref + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the + storage occupied by multixacts members exceeds 2GB, aggressive vacuum + scans will occur more often for all tables, starting with those that + have the oldest multixact-age. Both of these kinds of aggressive + scans will occur even if autovacuum is nominally disabled. + </para> + + <para> + Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the + system will begin to emit warning messages when the database's oldest MXIDs reach forty + million transactions from the wraparound point. And, just as an the XID case, if these + warnings are ignored, the system will refuse to generate new MXIDs once there are fewer + than three million left until wraparound. + </para> + + <para> + Normal operation when MXIDs are exhausted can be restored in much the same way as + when XIDs are exhausted. Follow the same steps in the previous section, but with the + following differences: + + <orderedlist> + <listitem> + <simpara>Running transactions and prepared transactions can be ignored if there + is no chance that they might appear in a multixact.</simpara> + </listitem> + <listitem> + <simpara>MXID information is not directly visible in system views such as + <literal>pg_stat_activity</literal>; however, looking for old XIDs is still a good + way of determining which transactions are causing MXID wraparound problems.</simpara> + </listitem> + <listitem> + <simpara>XID exhaustion will block all write transactions, but MXID exhaustion will + only block a subset of write transactions, specifically those that involve + row locks that require an MXID.</simpara> + </listitem> + </orderedlist> + </para> + + </sect3> + </sect2> + + <sect2 id="autovacuum"> + <title>The Autovacuum Daemon</title> + + <indexterm> + <primary>autovacuum</primary> + <secondary>general information</secondary> + </indexterm> + <para> + <productname>PostgreSQL</productname> has an optional but highly + recommended feature called <firstterm>autovacuum</firstterm>, + whose purpose is to automate the execution of + <command>VACUUM</command> and <command>ANALYZE</command> commands. + When enabled, autovacuum checks for + tables that have had a large number of inserted, updated or deleted + tuples. These checks use the statistics collection facility; + therefore, autovacuum cannot be used unless <xref + linkend="guc-track-counts"/> is set to <literal>true</literal>. + In the default configuration, autovacuuming is enabled and the related + configuration parameters are appropriately set. + </para> + + <para> + The <quote>autovacuum daemon</quote> actually consists of multiple processes. + There is a persistent daemon process, called the + <firstterm>autovacuum launcher</firstterm>, which is in charge of starting + <firstterm>autovacuum worker</firstterm> processes for all databases. The + launcher will distribute the work across time, attempting to start one + worker within each database every <xref linkend="guc-autovacuum-naptime"/> + seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases, + a new worker will be launched every + <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.) + A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes + are allowed to run at the same time. If there are more than + <varname>autovacuum_max_workers</varname> databases to be processed, + the next database will be processed as soon as the first worker finishes. + Each worker process will check each table within its database and + execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed. + <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor + autovacuum workers' activity. + </para> + + <para> + If several large tables all become eligible for vacuuming in a short + amount of time, all autovacuum workers might become occupied with + vacuuming those tables for a long period. This would result + in other tables and databases not being vacuumed until a worker becomes + available. There is no limit on how many workers might be in a + single database, but workers do try to avoid repeating work that has + already been done by other workers. Note that the number of running + workers does not count towards <xref linkend="guc-max-connections"/> or + <xref linkend="guc-superuser-reserved-connections"/> limits. + </para> + + <para> + Tables whose <structfield>relfrozenxid</structfield> value is more than + <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always + vacuumed (this also applies to those tables whose freeze max age has + been modified via storage parameters; see below). Otherwise, if the + number of tuples obsoleted since the last + <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the + table is vacuumed. The vacuum threshold is defined as: +<programlisting> +vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples +</programlisting> + where the vacuum base threshold is + <xref linkend="guc-autovacuum-vacuum-threshold"/>, + the vacuum scale factor is + <xref linkend="guc-autovacuum-vacuum-scale-factor"/>, + and the number of tuples is + <structname>pg_class</structname>.<structfield>reltuples</structfield>. + </para> + + <para> + The table is also vacuumed if the number of tuples inserted since the last + vacuum has exceeded the defined insert threshold, which is defined as: +<programlisting> +vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples +</programlisting> + where the vacuum insert base threshold is + <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>, + and vacuum insert scale factor is + <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>. + Such vacuums may allow portions of the table to be marked as + <firstterm>all visible</firstterm> and also allow tuples to be frozen, which + can reduce the work required in subsequent vacuums. + For tables which receive <command>INSERT</command> operations but no or + almost no <command>UPDATE</command>/<command>DELETE</command> operations, + it may be beneficial to lower the table's + <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow + tuples to be frozen by earlier vacuums. The number of obsolete tuples and + the number of inserted tuples are obtained from the cumulative statistics system; + it is a semi-accurate count updated by each <command>UPDATE</command>, + <command>DELETE</command> and <command>INSERT</command> operation. (It is + only semi-accurate because some information might be lost under heavy + load.) If the <structfield>relfrozenxid</structfield> value of the table + is more than <varname>vacuum_freeze_table_age</varname> transactions old, + an aggressive vacuum is performed to freeze old tuples and advance + <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified + since the last vacuum are scanned. + </para> + + <para> + For analyze, a similar condition is used: the threshold, defined as: +<programlisting> +analyze threshold = analyze base threshold + analyze scale factor * number of tuples +</programlisting> + is compared to the total number of tuples inserted, updated, or deleted + since the last <command>ANALYZE</command>. + </para> + + <para> + Partitioned tables do not directly store tuples and consequently + are not processed by autovacuum. (Autovacuum does process table + partitions just like other tables.) Unfortunately, this means that + autovacuum does not run <command>ANALYZE</command> on partitioned + tables, and this can cause suboptimal plans for queries that reference + partitioned table statistics. You can work around this problem by + manually running <command>ANALYZE</command> on partitioned tables + when they are first populated, and again whenever the distribution + of data in their partitions changes significantly. + </para> + + <para> + Temporary tables cannot be accessed by autovacuum. Therefore, + appropriate vacuum and analyze operations should be performed via + session SQL commands. + </para> + + <para> + The default thresholds and scale factors are taken from + <filename>postgresql.conf</filename>, but it is possible to override them + (and many other autovacuum control parameters) on a per-table basis; see + <xref linkend="sql-createtable-storage-parameters"/> for more information. + If a setting has been changed via a table's storage parameters, that value + is used when processing that table; otherwise the global settings are + used. See <xref linkend="runtime-config-autovacuum"/> for more details on + the global settings. + </para> + + <para> + When multiple workers are running, the autovacuum cost delay parameters + (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are + <quote>balanced</quote> among all the running workers, so that the + total I/O impact on the system is the same regardless of the number + of workers actually running. However, any workers processing tables whose + per-table <literal>autovacuum_vacuum_cost_delay</literal> or + <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set + are not considered in the balancing algorithm. + </para> + + <para> + Autovacuum workers generally don't block other commands. If a process + attempts to acquire a lock that conflicts with the + <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock + acquisition will interrupt the autovacuum. For conflicting lock modes, + see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum + is running to prevent transaction ID wraparound (i.e., the autovacuum query + name in the <structname>pg_stat_activity</structname> view ends with + <literal>(to prevent wraparound)</literal>), the autovacuum is not + automatically interrupted. + </para> + + <warning> + <para> + Regularly running commands that acquire locks conflicting with a + <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can + effectively prevent autovacuums from ever completing. + </para> + </warning> + </sect2> + </sect1> + + + <sect1 id="routine-reindex"> + <title>Routine Reindexing</title> + + <indexterm zone="routine-reindex"> + <primary>reindex</primary> + </indexterm> + + <para> + In some situations it is worthwhile to rebuild indexes periodically + with the <xref linkend="sql-reindex"/> command or a series of individual + rebuilding steps. + + </para> + + <para> + B-tree index pages that have become completely empty are reclaimed for + re-use. However, there is still a possibility + of inefficient use of space: if all but a few index keys on a page have + been deleted, the page remains allocated. Therefore, a usage + pattern in which most, but not all, keys in each range are eventually + deleted will see poor use of space. For such usage patterns, + periodic reindexing is recommended. + </para> + + <para> + The potential for bloat in non-B-tree indexes has not been well + researched. It is a good idea to periodically monitor the index's physical + size when using any non-B-tree index type. + </para> + + <para> + Also, for B-tree indexes, a freshly-constructed index is slightly faster to + access than one that has been updated many times because logically + adjacent pages are usually also physically adjacent in a newly built index. + (This consideration does not apply to non-B-tree indexes.) It + might be worthwhile to reindex periodically just to improve access speed. + </para> + + <para> + <xref linkend="sql-reindex"/> can be used safely and easily in all cases. + This command requires an <literal>ACCESS EXCLUSIVE</literal> lock by + default, hence it is often preferable to execute it with its + <literal>CONCURRENTLY</literal> option, which requires only a + <literal>SHARE UPDATE EXCLUSIVE</literal> lock. + </para> + </sect1> + + + <sect1 id="logfile-maintenance"> + <title>Log File Maintenance</title> + + <indexterm zone="logfile-maintenance"> + <primary>server log</primary> + <secondary>log file maintenance</secondary> + </indexterm> + + <para> + It is a good idea to save the database server's log output + somewhere, rather than just discarding it via <filename>/dev/null</filename>. + The log output is invaluable when diagnosing + problems. + </para> + + <note> + <para> + The server log can contain sensitive information and needs to be protected, + no matter how or where it is stored, or the destination to which it is routed. + For example, some DDL statements might contain plaintext passwords or other + authentication details. Logged statements at the <literal>ERROR</literal> + level might show the SQL source code for applications + and might also contain some parts of data rows. Recording data, events and + related information is the intended function of this facility, so this is + not a leakage or a bug. Please ensure the server logs are visible only to + appropriately authorized people. + </para> + </note> + + <para> + Log output tends to be voluminous + (especially at higher debug levels) so you won't want to save it + indefinitely. You need to <emphasis>rotate</emphasis> the log files so that + new log files are started and old ones removed after a reasonable + period of time. + </para> + + <para> + If you simply direct the <systemitem>stderr</systemitem> of + <command>postgres</command> into a + file, you will have log output, but + the only way to truncate the log file is to stop and restart + the server. This might be acceptable if you are using + <productname>PostgreSQL</productname> in a development environment, + but few production servers would find this behavior acceptable. + </para> + + <para> + A better approach is to send the server's + <systemitem>stderr</systemitem> output to some type of log rotation program. + There is a built-in log rotation facility, which you can use by + setting the configuration parameter <varname>logging_collector</varname> to + <literal>true</literal> in <filename>postgresql.conf</filename>. The control + parameters for this program are described in <xref + linkend="runtime-config-logging-where"/>. You can also use this approach + to capture the log data in machine readable <acronym>CSV</acronym> + (comma-separated values) format. + </para> + + <para> + Alternatively, you might prefer to use an external log rotation + program if you have one that you are already using with other + server software. For example, the <application>rotatelogs</application> + tool included in the <productname>Apache</productname> distribution + can be used with <productname>PostgreSQL</productname>. One way to + do this is to pipe the server's + <systemitem>stderr</systemitem> output to the desired program. + If you start the server with + <command>pg_ctl</command>, then <systemitem>stderr</systemitem> + is already redirected to <systemitem>stdout</systemitem>, so you just need a + pipe command, for example: + +<programlisting> +pg_ctl start | rotatelogs /var/log/pgsql_log 86400 +</programlisting> + </para> + + <para> + You can combine these approaches by setting up <application>logrotate</application> + to collect log files produced by <productname>PostgreSQL</productname> built-in + logging collector. In this case, the logging collector defines the names and + location of the log files, while <application>logrotate</application> + periodically archives these files. When initiating log rotation, + <application>logrotate</application> must ensure that the application + sends further output to the new file. This is commonly done with a + <literal>postrotate</literal> script that sends a <literal>SIGHUP</literal> + signal to the application, which then reopens the log file. + In <productname>PostgreSQL</productname>, you can run <command>pg_ctl</command> + with the <literal>logrotate</literal> option instead. When the server receives + this command, the server either switches to a new log file or reopens the + existing file, depending on the logging configuration + (see <xref linkend="runtime-config-logging-where"/>). + </para> + + <note> + <para> + When using static log file names, the server might fail to reopen the log + file if the max open file limit is reached or a file table overflow occurs. + In this case, log messages are sent to the old log file until a + successful log rotation. If <application>logrotate</application> is + configured to compress the log file and delete it, the server may lose + the messages logged in this time frame. To avoid this issue, you can + configure the logging collector to dynamically assign log file names + and use a <literal>prerotate</literal> script to ignore open log files. + </para> + </note> + + <para> + Another production-grade approach to managing log output is to + send it to <application>syslog</application> and let + <application>syslog</application> deal with file rotation. To do this, set the + configuration parameter <varname>log_destination</varname> to <literal>syslog</literal> + (to log to <application>syslog</application> only) in + <filename>postgresql.conf</filename>. Then you can send a <literal>SIGHUP</literal> + signal to the <application>syslog</application> daemon whenever you want to force it + to start writing a new log file. If you want to automate log + rotation, the <application>logrotate</application> program can be + configured to work with log files from + <application>syslog</application>. + </para> + + <para> + On many systems, however, <application>syslog</application> is not very reliable, + particularly with large log messages; it might truncate or drop messages + just when you need them the most. Also, on <productname>Linux</productname>, + <application>syslog</application> will flush each message to disk, yielding poor + performance. (You can use a <quote><literal>-</literal></quote> at the start of the file name + in the <application>syslog</application> configuration file to disable syncing.) + </para> + + <para> + Note that all the solutions described above take care of starting new + log files at configurable intervals, but they do not handle deletion + of old, no-longer-useful log files. You will probably want to set + up a batch job to periodically delete old log files. Another possibility + is to configure the rotation program so that old log files are overwritten + cyclically. + </para> + + <para> + <ulink url="https://pgbadger.darold.net/"><productname>pgBadger</productname></ulink> + is an external project that does sophisticated log file analysis. + <ulink + url="https://bucardo.org/check_postgres/"><productname>check_postgres</productname></ulink> + provides Nagios alerts when important messages appear in the log + files, as well as detection of many other extraordinary conditions. + </para> + </sect1> +</chapter> |