summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/amcheck.sgml
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/src/sgml/amcheck.sgml558
1 files changed, 558 insertions, 0 deletions
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
new file mode 100644
index 0000000..c570690
--- /dev/null
+++ b/doc/src/sgml/amcheck.sgml
@@ -0,0 +1,558 @@
+<!-- doc/src/sgml/amcheck.sgml -->
+
+<sect1 id="amcheck" xreflabel="amcheck">
+ <title>amcheck</title>
+
+ <indexterm zone="amcheck">
+ <primary>amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>amcheck</filename> module provides functions that allow you to
+ verify the logical consistency of the structure of relations.
+ </para>
+
+ <para>
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
+ structure of the representation of particular relations. The
+ correctness of the access method functions behind index scans and
+ other important operations relies on these invariants always
+ holding. For example, certain functions verify, among other things,
+ that all B-Tree pages have items in <quote>logical</quote> order (e.g.,
+ for B-Tree indexes on <type>text</type>, index tuples should be in
+ collated lexical order). If that particular invariant somehow fails
+ to hold, we can expect binary searches on the affected page to
+ incorrectly guide index scans, resulting in wrong answers to SQL
+ queries. If the structure appears to be valid, no error is raised.
+ </para>
+ <para>
+ Verification is performed using the same procedures as those used by
+ index scans themselves, which may be user-defined operator class
+ code. For example, B-Tree index verification relies on comparisons
+ made with one or more B-Tree support function 1 routines. See <xref
+ linkend="xindex-support"/> for details of operator class support
+ functions.
+ </para>
+ <para>
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>bt_index_check(index regclass, heapallindexed boolean) returns void</function>
+ <indexterm>
+ <primary>bt_index_check</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>bt_index_check</function> tests that its target, a
+ B-Tree index, respects a variety of invariants. Example usage:
+<screen>
+test=# SELECT bt_index_check(index =&gt; c.oid, heapallindexed =&gt; i.indisunique),
+ c.relname,
+ c.relpages
+FROM pg_index i
+JOIN pg_opclass op ON i.indclass[0] = op.oid
+JOIN pg_am am ON op.opcmethod = am.oid
+JOIN pg_class c ON i.indexrelid = c.oid
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
+-- Don't check temp tables, which may be from another session:
+AND c.relpersistence != 't'
+-- Function may throw an error when this is omitted:
+AND c.relkind = 'i' AND i.indisready AND i.indisvalid
+ORDER BY c.relpages DESC LIMIT 10;
+ bt_index_check | relname | relpages
+----------------+---------------------------------+----------
+ | pg_depend_reference_index | 43
+ | pg_depend_depender_index | 40
+ | pg_proc_proname_args_nsp_index | 31
+ | pg_description_o_c_o_index | 21
+ | pg_attribute_relid_attnam_index | 14
+ | pg_proc_oid_index | 10
+ | pg_attribute_relid_attnum_index | 9
+ | pg_amproc_fam_proc_index | 5
+ | pg_amop_opr_fam_index | 5
+ | pg_amop_fam_strat_index | 5
+(10 rows)
+</screen>
+ This example shows a session that performs verification of the
+ 10 largest catalog indexes in the database <quote>test</quote>.
+ Verification of the presence of heap tuples as index tuples is
+ requested for the subset that are unique indexes. Since no
+ error is raised, all indexes tested appear to be logically
+ consistent. Naturally, this query could easily be changed to
+ call <function>bt_index_check</function> for every index in the
+ database where verification is supported.
+ </para>
+ <para>
+ <function>bt_index_check</function> acquires an <literal>AccessShareLock</literal>
+ on the target index and the heap relation it belongs to. This lock mode
+ is the same lock mode acquired on relations by simple
+ <literal>SELECT</literal> statements.
+ <function>bt_index_check</function> does not verify invariants
+ that span child/parent relationships, but will verify the
+ presence of all heap tuples as index tuples within the index
+ when <parameter>heapallindexed</parameter> is
+ <literal>true</literal>. When a routine, lightweight test for
+ corruption is required in a live production environment, using
+ <function>bt_index_check</function> often provides the best
+ trade-off between thoroughness of verification and limiting the
+ impact on application performance and availability.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void</function>
+ <indexterm>
+ <primary>bt_index_parent_check</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>bt_index_parent_check</function> tests that its
+ target, a B-Tree index, respects a variety of invariants.
+ Optionally, when the <parameter>heapallindexed</parameter>
+ argument is <literal>true</literal>, the function verifies the
+ presence of all heap tuples that should be found within the
+ index. When the optional <parameter>rootdescend</parameter>
+ argument is <literal>true</literal>, verification re-finds
+ tuples on the leaf level by performing a new search from the
+ root page for each tuple. The checks that can be performed by
+ <function>bt_index_parent_check</function> are a superset of the
+ checks that can be performed by <function>bt_index_check</function>.
+ <function>bt_index_parent_check</function> can be thought of as
+ a more thorough variant of <function>bt_index_check</function>:
+ unlike <function>bt_index_check</function>,
+ <function>bt_index_parent_check</function> also checks
+ invariants that span parent/child relationships, including checking
+ that there are no missing downlinks in the index structure.
+ <function>bt_index_parent_check</function> follows the general
+ convention of raising an error if it finds a logical
+ inconsistency or other problem.
+ </para>
+ <para>
+ A <literal>ShareLock</literal> is required on the target index by
+ <function>bt_index_parent_check</function> (a
+ <literal>ShareLock</literal> is also acquired on the heap relation).
+ These locks prevent concurrent data modification from
+ <command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command>
+ commands. The locks also prevent the underlying relation from
+ being concurrently processed by <command>VACUUM</command>, as well as
+ all other utility commands. Note that the function holds locks
+ only while running, not for the entire transaction.
+ </para>
+ <para>
+ <function>bt_index_parent_check</function>'s additional
+ verification is more likely to detect various pathological
+ cases. These cases may involve an incorrectly implemented
+ B-Tree operator class used by the index that is checked, or,
+ hypothetically, undiscovered bugs in the underlying B-Tree index
+ access method code. Note that
+ <function>bt_index_parent_check</function> cannot be used when
+ Hot Standby mode is enabled (i.e., on read-only physical
+ replicas), unlike <function>bt_index_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <tip>
+ <para>
+ <function>bt_index_check</function> and
+ <function>bt_index_parent_check</function> both output log
+ messages about the verification process at
+ <literal>DEBUG1</literal> and <literal>DEBUG2</literal> severity
+ levels. These messages provide detailed information about the
+ verification process that may be of interest to
+ <productname>PostgreSQL</productname> developers. Advanced users
+ may also find this information helpful, since it provides
+ additional context should verification actually detect an
+ inconsistency. Running:
+<programlisting>
+SET client_min_messages = DEBUG1;
+</programlisting>
+ in an interactive <application>psql</application> session before
+ running a verification query will display messages about the
+ progress of verification with a manageable level of detail.
+ </para>
+ </tip>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip text,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns setof record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>on_error_stop</literal></term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block in
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>check_toast</literal></term>
+ <listitem>
+ <para>
+ If true, toasted values are checked against the target relation's
+ TOAST table.
+ </para>
+ <para>
+ This option is known to be slow. Also, if the toast table or its
+ index is corrupt, checking it against toast values could conceivably
+ crash the server, although in many cases this would just produce an
+ error.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>skip</literal></term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as specified.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>startblock</literal></term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <parameter>startblock</parameter> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, checking begins at the first block.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>endblock</literal></term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <parameter>endblock</parameter> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, all blocks are checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>blkno</literal></term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>offnum</literal></term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>attnum</literal></term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>msg</literal></term>
+ <listitem>
+ <para>
+ A message describing the problem detected.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </sect2>
+
+ <sect2>
+ <title>Optional <parameter>heapallindexed</parameter> Verification</title>
+ <para>
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
+ verification functions is <literal>true</literal>, an additional
+ phase of verification is performed against the table associated with
+ the target index relation. This consists of a <quote>dummy</quote>
+ <command>CREATE INDEX</command> operation, which checks for the
+ presence of all hypothetical new index tuples against a temporary,
+ in-memory summarizing structure (this is built when needed during
+ the basic first phase of verification). The summarizing structure
+ <quote>fingerprints</quote> every tuple found within the target
+ index. The high level principle behind
+ <parameter>heapallindexed</parameter> verification is that a new
+ index that is equivalent to the existing, target index must only
+ have entries that can be found in the existing structure.
+ </para>
+ <para>
+ The additional <parameter>heapallindexed</parameter> phase adds
+ significant overhead: verification will typically take several times
+ longer. However, there is no change to the relation-level locks
+ acquired when <parameter>heapallindexed</parameter> verification is
+ performed.
+ </para>
+ <para>
+ The summarizing structure is bound in size by
+ <varname>maintenance_work_mem</varname>. In order to ensure that
+ there is no more than a 2% probability of failure to detect an
+ inconsistency for each heap tuple that should be represented in the
+ index, approximately 2 bytes of memory are needed per tuple. As
+ less memory is made available per tuple, the probability of missing
+ an inconsistency slowly increases. This approach limits the
+ overhead of verification significantly, while only slightly reducing
+ the probability of detecting a problem, especially for installations
+ where verification is treated as a routine maintenance task. Any
+ single absent or malformed tuple has a new opportunity to be
+ detected with each new verification attempt.
+ </para>
+
+ </sect2>
+
+ <sect2>
+ <title>Using <filename>amcheck</filename> Effectively</title>
+
+ <para>
+ <filename>amcheck</filename> can be effective at detecting various types of
+ failure modes that <link
+ linkend="app-initdb-data-checksums"><application>data
+ checksums</application></link> will fail to catch. These include:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Structural inconsistencies caused by incorrect operator class
+ implementations.
+ </para>
+ <para>
+ This includes issues caused by the comparison rules of operating
+ system collations changing. Comparisons of datums of a collatable
+ type like <type>text</type> must be immutable (just as all
+ comparisons used for B-Tree index scans must be immutable), which
+ implies that operating system collation rules must never change.
+ Though rare, updates to operating system collation rules can
+ cause these issues. More commonly, an inconsistency in the
+ collation order between a primary server and a standby server is
+ implicated, possibly because the <emphasis>major</emphasis> operating
+ system version in use is inconsistent. Such inconsistencies will
+ generally only arise on standby servers, and so can generally
+ only be detected on standby servers.
+ </para>
+ <para>
+ If a problem like this arises, it may not affect each individual
+ index that is ordered using an affected collation, simply because
+ <emphasis>indexed</emphasis> values might happen to have the same
+ absolute ordering regardless of the behavioral inconsistency. See
+ <xref linkend="locale"/> and <xref linkend="collation"/> for
+ further details about how <productname>PostgreSQL</productname> uses
+ operating system locales and collations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Structural inconsistencies between indexes and the heap relations
+ that are indexed (when <parameter>heapallindexed</parameter>
+ verification is performed).
+ </para>
+ <para>
+ There is no cross-checking of indexes against their heap relation
+ during normal operation. Symptoms of heap corruption can be subtle.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Corruption caused by hypothetical undiscovered bugs in the
+ underlying <productname>PostgreSQL</productname> access method
+ code, sort code, or transaction management code.
+ </para>
+ <para>
+ Automatic verification of the structural integrity of indexes
+ plays a role in the general testing of new or proposed
+ <productname>PostgreSQL</productname> features that could plausibly allow a
+ logical inconsistency to be introduced. Verification of table
+ structure and associated visibility and transaction status
+ information plays a similar role. One obvious testing strategy
+ is to call <filename>amcheck</filename> functions continuously
+ when running the standard regression tests. See <xref
+ linkend="regress-run"/> for details on running the tests.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ File system or storage subsystem faults where checksums happen to
+ simply not be enabled.
+ </para>
+ <para>
+ Note that <filename>amcheck</filename> examines a page as represented in some
+ shared memory buffer at the time of verification if there is only a
+ shared buffer hit when accessing the block. Consequently,
+ <filename>amcheck</filename> does not necessarily examine data read from the
+ file system at the time of verification. Note that when checksums are
+ enabled, <filename>amcheck</filename> may raise an error due to a checksum
+ failure when a corrupt block is read into a buffer.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Corruption caused by faulty RAM, or the broader memory subsystem.
+ </para>
+ <para>
+ <productname>PostgreSQL</productname> does not protect against correctable
+ memory errors and it is assumed you will operate using RAM that
+ uses industry standard Error Correcting Codes (ECC) or better
+ protection. However, ECC memory is typically only immune to
+ single-bit errors, and should not be assumed to provide
+ <emphasis>absolute</emphasis> protection against failures that
+ result in memory corruption.
+ </para>
+ <para>
+ When <parameter>heapallindexed</parameter> verification is
+ performed, there is generally a greatly increased chance of
+ detecting single-bit errors, since strict binary equality is
+ tested, and the indexed attributes within the heap are tested.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="checksums"><application>data page
+ checksums</application></link>.
+ </para>
+
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> is performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+
+ <para>
+ In general, <filename>amcheck</filename> can only prove the presence of
+ corruption; it cannot prove its absence.
+ </para>
+
+ </sect2>
+ <sect2>
+ <title>Repairing Corruption</title>
+ <para>
+ No error concerning corruption raised by <filename>amcheck</filename> should
+ ever be a false positive. <filename>amcheck</filename> raises
+ errors in the event of conditions that, by definition, should never
+ happen, and so careful analysis of <filename>amcheck</filename>
+ errors is often required.
+ </para>
+ <para>
+ There is no general method of repairing problems that
+ <filename>amcheck</filename> detects. An explanation for the root cause of
+ an invariant violation should be sought. <xref
+ linkend="pageinspect"/> may play a useful role in diagnosing
+ corruption that <filename>amcheck</filename> detects. A <command>REINDEX</command>
+ may not be effective in repairing corruption.
+ </para>
+
+ </sect2>
+
+</sect1>