diff options
Diffstat (limited to 'doc/src/sgml/amcheck.sgml')
-rw-r--r-- | doc/src/sgml/amcheck.sgml | 558 |
1 files changed, 558 insertions, 0 deletions
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml new file mode 100644 index 0000000..5d61a33 --- /dev/null +++ b/doc/src/sgml/amcheck.sgml @@ -0,0 +1,558 @@ +<!-- doc/src/sgml/amcheck.sgml --> + +<sect1 id="amcheck" xreflabel="amcheck"> + <title>amcheck</title> + + <indexterm zone="amcheck"> + <primary>amcheck</primary> + </indexterm> + + <para> + The <filename>amcheck</filename> module provides functions that allow you to + verify the logical consistency of the structure of relations. + </para> + + <para> + The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the + structure of the representation of particular relations. The + correctness of the access method functions behind index scans and + other important operations relies on these invariants always + holding. For example, certain functions verify, among other things, + that all B-Tree pages have items in <quote>logical</quote> order (e.g., + for B-Tree indexes on <type>text</type>, index tuples should be in + collated lexical order). If that particular invariant somehow fails + to hold, we can expect binary searches on the affected page to + incorrectly guide index scans, resulting in wrong answers to SQL + queries. If the structure appears to be valid, no error is raised. + </para> + <para> + Verification is performed using the same procedures as those used by + index scans themselves, which may be user-defined operator class + code. For example, B-Tree index verification relies on comparisons + made with one or more B-Tree support function 1 routines. See <xref + linkend="xindex-support"/> for details of operator class support + functions. + </para> + <para> + Unlike the B-Tree checking functions which report corruption by raising + errors, the heap checking function <function>verify_heapam</function> checks + a table and attempts to return a set of rows, one row per corruption + detected. Despite this, if facilities that + <function>verify_heapam</function> relies upon are themselves corrupted, the + function may be unable to continue and may instead raise an error. + </para> + <para> + Permission to execute <filename>amcheck</filename> functions may be granted + to non-superusers, but before granting such permissions careful consideration + should be given to data security and privacy concerns. Although the + corruption reports generated by these functions do not focus on the contents + of the corrupted data so much as on the structure of that data and the nature + of the corruptions found, an attacker who gains permission to execute these + functions, particularly if the attacker can also induce corruption, might be + able to infer something of the data itself from such messages. + </para> + + <sect2> + <title>Functions</title> + + <variablelist> + <varlistentry> + <term> + <function>bt_index_check(index regclass, heapallindexed boolean) returns void</function> + <indexterm> + <primary>bt_index_check</primary> + </indexterm> + </term> + + <listitem> + <para> + <function>bt_index_check</function> tests that its target, a + B-Tree index, respects a variety of invariants. Example usage: +<screen> +test=# SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique), + c.relname, + c.relpages +FROM pg_index i +JOIN pg_opclass op ON i.indclass[0] = op.oid +JOIN pg_am am ON op.opcmethod = am.oid +JOIN pg_class c ON i.indexrelid = c.oid +JOIN pg_namespace n ON c.relnamespace = n.oid +WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' +-- Don't check temp tables, which may be from another session: +AND c.relpersistence != 't' +-- Function may throw an error when this is omitted: +AND c.relkind = 'i' AND i.indisready AND i.indisvalid +ORDER BY c.relpages DESC LIMIT 10; + bt_index_check | relname | relpages +----------------+---------------------------------+---------- + | pg_depend_reference_index | 43 + | pg_depend_depender_index | 40 + | pg_proc_proname_args_nsp_index | 31 + | pg_description_o_c_o_index | 21 + | pg_attribute_relid_attnam_index | 14 + | pg_proc_oid_index | 10 + | pg_attribute_relid_attnum_index | 9 + | pg_amproc_fam_proc_index | 5 + | pg_amop_opr_fam_index | 5 + | pg_amop_fam_strat_index | 5 +(10 rows) +</screen> + This example shows a session that performs verification of the + 10 largest catalog indexes in the database <quote>test</quote>. + Verification of the presence of heap tuples as index tuples is + requested for the subset that are unique indexes. Since no + error is raised, all indexes tested appear to be logically + consistent. Naturally, this query could easily be changed to + call <function>bt_index_check</function> for every index in the + database where verification is supported. + </para> + <para> + <function>bt_index_check</function> acquires an <literal>AccessShareLock</literal> + on the target index and the heap relation it belongs to. This lock mode + is the same lock mode acquired on relations by simple + <literal>SELECT</literal> statements. + <function>bt_index_check</function> does not verify invariants + that span child/parent relationships, but will verify the + presence of all heap tuples as index tuples within the index + when <parameter>heapallindexed</parameter> is + <literal>true</literal>. When a routine, lightweight test for + corruption is required in a live production environment, using + <function>bt_index_check</function> often provides the best + trade-off between thoroughness of verification and limiting the + impact on application performance and availability. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <function>bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void</function> + <indexterm> + <primary>bt_index_parent_check</primary> + </indexterm> + </term> + + <listitem> + <para> + <function>bt_index_parent_check</function> tests that its + target, a B-Tree index, respects a variety of invariants. + Optionally, when the <parameter>heapallindexed</parameter> + argument is <literal>true</literal>, the function verifies the + presence of all heap tuples that should be found within the + index. When the optional <parameter>rootdescend</parameter> + argument is <literal>true</literal>, verification re-finds + tuples on the leaf level by performing a new search from the + root page for each tuple. The checks that can be performed by + <function>bt_index_parent_check</function> are a superset of the + checks that can be performed by <function>bt_index_check</function>. + <function>bt_index_parent_check</function> can be thought of as + a more thorough variant of <function>bt_index_check</function>: + unlike <function>bt_index_check</function>, + <function>bt_index_parent_check</function> also checks + invariants that span parent/child relationships, including checking + that there are no missing downlinks in the index structure. + <function>bt_index_parent_check</function> follows the general + convention of raising an error if it finds a logical + inconsistency or other problem. + </para> + <para> + A <literal>ShareLock</literal> is required on the target index by + <function>bt_index_parent_check</function> (a + <literal>ShareLock</literal> is also acquired on the heap relation). + These locks prevent concurrent data modification from + <command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command> + commands. The locks also prevent the underlying relation from + being concurrently processed by <command>VACUUM</command>, as well as + all other utility commands. Note that the function holds locks + only while running, not for the entire transaction. + </para> + <para> + <function>bt_index_parent_check</function>'s additional + verification is more likely to detect various pathological + cases. These cases may involve an incorrectly implemented + B-Tree operator class used by the index that is checked, or, + hypothetically, undiscovered bugs in the underlying B-Tree index + access method code. Note that + <function>bt_index_parent_check</function> cannot be used when + hot standby mode is enabled (i.e., on read-only physical + replicas), unlike <function>bt_index_check</function>. + </para> + </listitem> + </varlistentry> + </variablelist> + <tip> + <para> + <function>bt_index_check</function> and + <function>bt_index_parent_check</function> both output log + messages about the verification process at + <literal>DEBUG1</literal> and <literal>DEBUG2</literal> severity + levels. These messages provide detailed information about the + verification process that may be of interest to + <productname>PostgreSQL</productname> developers. Advanced users + may also find this information helpful, since it provides + additional context should verification actually detect an + inconsistency. Running: +<programlisting> +SET client_min_messages = DEBUG1; +</programlisting> + in an interactive <application>psql</application> session before + running a verification query will display messages about the + progress of verification with a manageable level of detail. + </para> + </tip> + + <variablelist> + <varlistentry> + <term> + <function> + verify_heapam(relation regclass, + on_error_stop boolean, + check_toast boolean, + skip text, + startblock bigint, + endblock bigint, + blkno OUT bigint, + offnum OUT integer, + attnum OUT integer, + msg OUT text) + returns setof record + </function> + </term> + <listitem> + <para> + Checks a table, sequence, or materialized view for structural corruption, + where pages in the relation contain data that is invalidly formatted, and + for logical corruption, where pages are structurally valid but + inconsistent with the rest of the database cluster. + </para> + <para> + The following optional arguments are recognized: + </para> + <variablelist> + <varlistentry> + <term><literal>on_error_stop</literal></term> + <listitem> + <para> + If true, corruption checking stops at the end of the first block in + which any corruptions are found. + </para> + <para> + Defaults to false. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>check_toast</literal></term> + <listitem> + <para> + If true, toasted values are checked against the target relation's + TOAST table. + </para> + <para> + This option is known to be slow. Also, if the toast table or its + index is corrupt, checking it against toast values could conceivably + crash the server, although in many cases this would just produce an + error. + </para> + <para> + Defaults to false. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>skip</literal></term> + <listitem> + <para> + If not <literal>none</literal>, corruption checking skips blocks that + are marked as all-visible or all-frozen, as specified. + Valid options are <literal>all-visible</literal>, + <literal>all-frozen</literal> and <literal>none</literal>. + </para> + <para> + Defaults to <literal>none</literal>. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>startblock</literal></term> + <listitem> + <para> + If specified, corruption checking begins at the specified block, + skipping all previous blocks. It is an error to specify a + <parameter>startblock</parameter> outside the range of blocks in the + target table. + </para> + <para> + By default, checking begins at the first block. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>endblock</literal></term> + <listitem> + <para> + If specified, corruption checking ends at the specified block, + skipping all remaining blocks. It is an error to specify an + <parameter>endblock</parameter> outside the range of blocks in the target + table. + </para> + <para> + By default, all blocks are checked. + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + For each corruption detected, <function>verify_heapam</function> returns + a row with the following columns: + </para> + <variablelist> + <varlistentry> + <term><literal>blkno</literal></term> + <listitem> + <para> + The number of the block containing the corrupt page. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>offnum</literal></term> + <listitem> + <para> + The OffsetNumber of the corrupt tuple. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>attnum</literal></term> + <listitem> + <para> + The attribute number of the corrupt column in the tuple, if the + corruption is specific to a column and not the tuple as a whole. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>msg</literal></term> + <listitem> + <para> + A message describing the problem detected. + </para> + </listitem> + </varlistentry> + </variablelist> + </listitem> + </varlistentry> + </variablelist> + </sect2> + + <sect2> + <title>Optional <parameter>heapallindexed</parameter> Verification</title> + <para> + When the <parameter>heapallindexed</parameter> argument to B-Tree + verification functions is <literal>true</literal>, an additional + phase of verification is performed against the table associated with + the target index relation. This consists of a <quote>dummy</quote> + <command>CREATE INDEX</command> operation, which checks for the + presence of all hypothetical new index tuples against a temporary, + in-memory summarizing structure (this is built when needed during + the basic first phase of verification). The summarizing structure + <quote>fingerprints</quote> every tuple found within the target + index. The high level principle behind + <parameter>heapallindexed</parameter> verification is that a new + index that is equivalent to the existing, target index must only + have entries that can be found in the existing structure. + </para> + <para> + The additional <parameter>heapallindexed</parameter> phase adds + significant overhead: verification will typically take several times + longer. However, there is no change to the relation-level locks + acquired when <parameter>heapallindexed</parameter> verification is + performed. + </para> + <para> + The summarizing structure is bound in size by + <varname>maintenance_work_mem</varname>. In order to ensure that + there is no more than a 2% probability of failure to detect an + inconsistency for each heap tuple that should be represented in the + index, approximately 2 bytes of memory are needed per tuple. As + less memory is made available per tuple, the probability of missing + an inconsistency slowly increases. This approach limits the + overhead of verification significantly, while only slightly reducing + the probability of detecting a problem, especially for installations + where verification is treated as a routine maintenance task. Any + single absent or malformed tuple has a new opportunity to be + detected with each new verification attempt. + </para> + + </sect2> + + <sect2> + <title>Using <filename>amcheck</filename> Effectively</title> + + <para> + <filename>amcheck</filename> can be effective at detecting various types of + failure modes that <link + linkend="app-initdb-data-checksums"><application>data + checksums</application></link> will fail to catch. These include: + + <itemizedlist> + <listitem> + <para> + Structural inconsistencies caused by incorrect operator class + implementations. + </para> + <para> + This includes issues caused by the comparison rules of operating + system collations changing. Comparisons of datums of a collatable + type like <type>text</type> must be immutable (just as all + comparisons used for B-Tree index scans must be immutable), which + implies that operating system collation rules must never change. + Though rare, updates to operating system collation rules can + cause these issues. More commonly, an inconsistency in the + collation order between a primary server and a standby server is + implicated, possibly because the <emphasis>major</emphasis> operating + system version in use is inconsistent. Such inconsistencies will + generally only arise on standby servers, and so can generally + only be detected on standby servers. + </para> + <para> + If a problem like this arises, it may not affect each individual + index that is ordered using an affected collation, simply because + <emphasis>indexed</emphasis> values might happen to have the same + absolute ordering regardless of the behavioral inconsistency. See + <xref linkend="locale"/> and <xref linkend="collation"/> for + further details about how <productname>PostgreSQL</productname> uses + operating system locales and collations. + </para> + </listitem> + <listitem> + <para> + Structural inconsistencies between indexes and the heap relations + that are indexed (when <parameter>heapallindexed</parameter> + verification is performed). + </para> + <para> + There is no cross-checking of indexes against their heap relation + during normal operation. Symptoms of heap corruption can be subtle. + </para> + </listitem> + <listitem> + <para> + Corruption caused by hypothetical undiscovered bugs in the + underlying <productname>PostgreSQL</productname> access method + code, sort code, or transaction management code. + </para> + <para> + Automatic verification of the structural integrity of indexes + plays a role in the general testing of new or proposed + <productname>PostgreSQL</productname> features that could plausibly allow a + logical inconsistency to be introduced. Verification of table + structure and associated visibility and transaction status + information plays a similar role. One obvious testing strategy + is to call <filename>amcheck</filename> functions continuously + when running the standard regression tests. See <xref + linkend="regress-run"/> for details on running the tests. + </para> + </listitem> + <listitem> + <para> + File system or storage subsystem faults where checksums happen to + simply not be enabled. + </para> + <para> + Note that <filename>amcheck</filename> examines a page as represented in some + shared memory buffer at the time of verification if there is only a + shared buffer hit when accessing the block. Consequently, + <filename>amcheck</filename> does not necessarily examine data read from the + file system at the time of verification. Note that when checksums are + enabled, <filename>amcheck</filename> may raise an error due to a checksum + failure when a corrupt block is read into a buffer. + </para> + </listitem> + <listitem> + <para> + Corruption caused by faulty RAM, or the broader memory subsystem. + </para> + <para> + <productname>PostgreSQL</productname> does not protect against correctable + memory errors and it is assumed you will operate using RAM that + uses industry standard Error Correcting Codes (ECC) or better + protection. However, ECC memory is typically only immune to + single-bit errors, and should not be assumed to provide + <emphasis>absolute</emphasis> protection against failures that + result in memory corruption. + </para> + <para> + When <parameter>heapallindexed</parameter> verification is + performed, there is generally a greatly increased chance of + detecting single-bit errors, since strict binary equality is + tested, and the indexed attributes within the heap are tested. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + Structural corruption can happen due to faulty storage hardware, or + relation files being overwritten or modified by unrelated software. + This kind of corruption can also be detected with + <link linkend="checksums"><application>data page + checksums</application></link>. + </para> + + <para> + Relation pages which are correctly formatted, internally consistent, and + correct relative to their own internal checksums may still contain + logical corruption. As such, this kind of corruption cannot be detected + with <application>checksums</application>. Examples include toasted + values in the main table which lack a corresponding entry in the toast + table, and tuples in the main table with a Transaction ID that is older + than the oldest valid Transaction ID in the database or cluster. + </para> + + <para> + Multiple causes of logical corruption have been observed in production + systems, including bugs in the <productname>PostgreSQL</productname> + server software, faulty and ill-conceived backup and restore tools, and + user error. + </para> + + <para> + Corrupt relations are most concerning in live production environments, + precisely the same environments where high risk activities are least + welcome. For this reason, <function>verify_heapam</function> has been + designed to diagnose corruption without undue risk. It cannot guard + against all causes of backend crashes, as even executing the calling + query could be unsafe on a badly corrupted system. Access to <link + linkend="catalogs-overview">catalog tables</link> is performed and could + be problematic if the catalogs themselves are corrupted. + </para> + + <para> + In general, <filename>amcheck</filename> can only prove the presence of + corruption; it cannot prove its absence. + </para> + + </sect2> + <sect2> + <title>Repairing Corruption</title> + <para> + No error concerning corruption raised by <filename>amcheck</filename> should + ever be a false positive. <filename>amcheck</filename> raises + errors in the event of conditions that, by definition, should never + happen, and so careful analysis of <filename>amcheck</filename> + errors is often required. + </para> + <para> + There is no general method of repairing problems that + <filename>amcheck</filename> detects. An explanation for the root cause of + an invariant violation should be sought. <xref + linkend="pageinspect"/> may play a useful role in diagnosing + corruption that <filename>amcheck</filename> detects. A <command>REINDEX</command> + may not be effective in repairing corruption. + </para> + + </sect2> + +</sect1> |