1 files changed, 209 insertions, 0 deletions
diff --git a/doc/src/sgml/html/indexes-index-only-scans.html b/doc/src/sgml/html/indexes-index-only-scans.html
new file mode 100644
index 0000000..f9a4e11
--- /dev/null
+++ b/doc/src/sgml/html/indexes-index-only-scans.html
@@ -0,0 +1,209 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>11.9. Index-Only Scans and Covering Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="indexes-partial.html" title="11.8. Partial Indexes" /><link rel="next" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">11.9. Index-Only Scans and Covering Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="indexes-partial.html" title="11.8. Partial Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><th width="60%" align="center">Chapter 11. Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEXES-INDEX-ONLY-SCANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">11.9. Index-Only Scans and Covering Indexes</h2></div></div></div><a id="id-1.5.10.12.2" class="indexterm"></a><a id="id-1.5.10.12.3" class="indexterm"></a><a id="id-1.5.10.12.4" class="indexterm"></a><a id="id-1.5.10.12.5" class="indexterm"></a><p>
+   All indexes in <span class="productname">PostgreSQL</span>
+   are <em class="firstterm">secondary</em> indexes, meaning that each index is
+   stored separately from the table's main data area (which is called the
+   table's <em class="firstterm">heap</em>
+   in <span class="productname">PostgreSQL</span> terminology).  This means that
+   in an ordinary index scan, each row retrieval requires fetching data from
+   both the index and the heap.  Furthermore, while the index entries that
+   match a given indexable <code class="literal">WHERE</code> condition are usually
+   close together in the index, the table rows they reference might be
+   anywhere in the heap.  The heap-access portion of an index scan thus
+   involves a lot of random access into the heap, which can be slow,
+   particularly on traditional rotating media.  (As described in
+   <a class="xref" href="indexes-bitmap-scans.html" title="11.5. Combining Multiple Indexes">Section 11.5</a>, bitmap scans try to alleviate
+   this cost by doing the heap accesses in sorted order, but that only goes
+   so far.)
+  </p><p>
+   To solve this performance problem, <span class="productname">PostgreSQL</span>
+   supports <em class="firstterm">index-only scans</em>, which can answer
+   queries from an index alone without any heap access.  The basic idea is
+   to return values directly out of each index entry instead of consulting
+   the associated heap entry.  There are two fundamental restrictions on
+   when this method can be used:
+
+   </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
+      The index type must support index-only scans.  B-tree indexes always
+      do.  GiST and SP-GiST indexes support index-only scans for some
+      operator classes but not others.  Other index types have no support.
+      The underlying requirement is that the index must physically store, or
+      else be able to reconstruct, the original data value for each index
+      entry.  As a counterexample, GIN indexes cannot support index-only
+      scans because each index entry typically holds only part of the
+      original data value.
+     </p></li><li class="listitem"><p>
+      The query must reference only columns stored in the index.  For
+      example, given an index on columns <code class="literal">x</code>
+      and <code class="literal">y</code> of a table that also has a
+      column <code class="literal">z</code>, these queries could use index-only scans:
+</p><pre class="programlisting">
+SELECT x, y FROM tab WHERE x = 'key';
+SELECT x FROM tab WHERE x = 'key' AND y &lt; 42;
+</pre><p>
+      but these queries could not:
+</p><pre class="programlisting">
+SELECT x, z FROM tab WHERE x = 'key';
+SELECT x FROM tab WHERE x = 'key' AND z &lt; 42;
+</pre><p>
+      (Expression indexes and partial indexes complicate this rule,
+      as discussed below.)
+     </p></li></ol></div><p>
+  </p><p>
+   If these two fundamental requirements are met, then all the data values
+   required by the query are available from the index, so an index-only scan
+   is physically possible.  But there is an additional requirement for any
+   table scan in <span class="productname">PostgreSQL</span>: it must verify that
+   each retrieved row be <span class="quote">“<span class="quote">visible</span>”</span> to the query's MVCC
+   snapshot, as discussed in <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>.  Visibility information
+   is not stored in index entries, only in heap entries; so at first glance
+   it would seem that every row retrieval would require a heap access
+   anyway.  And this is indeed the case, if the table row has been modified
+   recently.  However, for seldom-changing data there is a way around this
+   problem.  <span class="productname">PostgreSQL</span> tracks, for each page in
+   a table's heap, whether all rows stored in that page are old enough to be
+   visible to all current and future transactions.  This information is
+   stored in a bit in the table's <em class="firstterm">visibility map</em>.  An
+   index-only scan, after finding a candidate index entry, checks the
+   visibility map bit for the corresponding heap page.  If it's set, the row
+   is known visible and so the data can be returned with no further work.
+   If it's not set, the heap entry must be visited to find out whether it's
+   visible, so no performance advantage is gained over a standard index
+   scan.  Even in the successful case, this approach trades visibility map
+   accesses for heap accesses; but since the visibility map is four orders
+   of magnitude smaller than the heap it describes, far less physical I/O is
+   needed to access it.  In most situations the visibility map remains
+   cached in memory all the time.
+  </p><p>
+   In short, while an index-only scan is possible given the two fundamental
+   requirements, it will be a win only if a significant fraction of the
+   table's heap pages have their all-visible map bits set.  But tables in
+   which a large fraction of the rows are unchanging are common enough to
+   make this type of scan very useful in practice.
+  </p><p>
+   <a id="id-1.5.10.12.10.1" class="indexterm"></a>
+   To make effective use of the index-only scan feature, you might choose to
+   create a <em class="firstterm">covering index</em>, which is an index
+   specifically designed to include the columns needed by a particular
+   type of query that you run frequently.  Since queries typically need to
+   retrieve more columns than just the ones they search
+   on, <span class="productname">PostgreSQL</span> allows you to create an index
+   in which some columns are just <span class="quote">“<span class="quote">payload</span>”</span> and are not part
+   of the search key.  This is done by adding an <code class="literal">INCLUDE</code>
+   clause listing the extra columns.  For example, if you commonly run
+   queries like
+</p><pre class="programlisting">
+SELECT y FROM tab WHERE x = 'key';
+</pre><p>
+   the traditional approach to speeding up such queries would be to create
+   an index on <code class="literal">x</code> only.  However, an index defined as
+</p><pre class="programlisting">
+CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
+</pre><p>
+   could handle these queries as index-only scans,
+   because <code class="literal">y</code> can be obtained from the index without
+   visiting the heap.
+  </p><p>
+   Because column <code class="literal">y</code> is not part of the index's search
+   key, it does not have to be of a data type that the index can handle;
+   it's merely stored in the index and is not interpreted by the index
+   machinery.  Also, if the index is a unique index, that is
+</p><pre class="programlisting">
+CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
+</pre><p>
+   the uniqueness condition applies to just column <code class="literal">x</code>,
+   not to the combination of <code class="literal">x</code> and <code class="literal">y</code>.
+   (An <code class="literal">INCLUDE</code> clause can also be written
+   in <code class="literal">UNIQUE</code> and <code class="literal">PRIMARY KEY</code>
+   constraints, providing alternative syntax for setting up an index like
+   this.)
+  </p><p>
+   It's wise to be conservative about adding non-key payload columns to an
+   index, especially wide columns.  If an index tuple exceeds the
+   maximum size allowed for the index type, data insertion will fail.
+   In any case, non-key columns duplicate data from the index's table
+   and bloat the size of the index, thus potentially slowing searches.
+   And remember that there is little point in including payload columns in an
+   index unless the table changes slowly enough that an index-only scan is
+   likely to not need to access the heap.  If the heap tuple must be visited
+   anyway, it costs nothing more to get the column's value from there.
+   Other restrictions are that expressions are not currently supported as
+   included columns, and that only B-tree, GiST and SP-GiST indexes currently
+   support included columns.
+  </p><p>
+   Before <span class="productname">PostgreSQL</span> had
+   the <code class="literal">INCLUDE</code> feature, people sometimes made covering
+   indexes by writing the payload columns as ordinary index columns,
+   that is writing
+</p><pre class="programlisting">
+CREATE INDEX tab_x_y ON tab(x, y);
+</pre><p>
+   even though they had no intention of ever using <code class="literal">y</code> as
+   part of a <code class="literal">WHERE</code> clause.  This works fine as long as
+   the extra columns are trailing columns; making them be leading columns is
+   unwise for the reasons explained in <a class="xref" href="indexes-multicolumn.html" title="11.3. Multicolumn Indexes">Section 11.3</a>.
+   However, this method doesn't support the case where you want the index to
+   enforce uniqueness on the key column(s).
+  </p><p>
+   <em class="firstterm">Suffix truncation</em> always removes non-key
+   columns from upper B-Tree levels.  As payload columns, they are
+   never used to guide index scans.  The truncation process also
+   removes one or more trailing key column(s) when the remaining
+   prefix of key column(s) happens to be sufficient to describe tuples
+   on the lowest B-Tree level.  In practice, covering indexes without
+   an <code class="literal">INCLUDE</code> clause often avoid storing columns
+   that are effectively payload in the upper levels.  However,
+   explicitly defining payload columns as non-key columns
+   <span class="emphasis"><em>reliably</em></span> keeps the tuples in upper levels
+   small.
+  </p><p>
+   In principle, index-only scans can be used with expression indexes.
+   For example, given an index on <code class="literal">f(x)</code>
+   where <code class="literal">x</code> is a table column, it should be possible to
+   execute
+</p><pre class="programlisting">
+SELECT f(x) FROM tab WHERE f(x) &lt; 1;
+</pre><p>
+   as an index-only scan; and this is very attractive
+   if <code class="literal">f()</code> is an expensive-to-compute function.
+   However, <span class="productname">PostgreSQL</span>'s planner is currently not
+   very smart about such cases.  It considers a query to be potentially
+   executable by index-only scan only when all <span class="emphasis"><em>columns</em></span>
+   needed by the query are available from the index.  In this
+   example, <code class="literal">x</code> is not needed except in the
+   context <code class="literal">f(x)</code>, but the planner does not notice that and
+   concludes that an index-only scan is not possible.  If an index-only scan
+   seems sufficiently worthwhile, this can be worked around by
+   adding <code class="literal">x</code> as an included column, for example
+</p><pre class="programlisting">
+CREATE INDEX tab_f_x ON tab (f(x)) INCLUDE (x);
+</pre><p>
+   An additional caveat, if the goal is to avoid
+   recalculating <code class="literal">f(x)</code>, is that the planner won't
+   necessarily match uses of <code class="literal">f(x)</code> that aren't in
+   indexable <code class="literal">WHERE</code> clauses to the index column.  It will
+   usually get this right in simple queries such as shown above, but not in
+   queries that involve joins.  These deficiencies may be remedied in future
+   versions of <span class="productname">PostgreSQL</span>.
+  </p><p>
+   Partial indexes also have interesting interactions with index-only scans.
+   Consider the partial index shown in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX3" title="Example 11.3. Setting up a Partial Unique Index">Example 11.3</a>:
+</p><pre class="programlisting">
+CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
+    WHERE success;
+</pre><p>
+   In principle, we could do an index-only scan on this index to satisfy a
+   query like
+</p><pre class="programlisting">
+SELECT target FROM tests WHERE subject = 'some-subject' AND success;
+</pre><p>
+   But there's a problem: the <code class="literal">WHERE</code> clause refers
+   to <code class="literal">success</code> which is not available as a result column
+   of the index.  Nonetheless, an index-only scan is possible because the
+   plan does not need to recheck that part of the <code class="literal">WHERE</code>
+   clause at run time: all entries found in the index necessarily
+   have <code class="literal">success = true</code> so this need not be explicitly
+   checked in the plan.  <span class="productname">PostgreSQL</span> versions 9.6
+   and later will recognize such cases and allow index-only scans to be
+   generated, but older versions will not.
+  </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="indexes-partial.html" title="11.8. Partial Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Next</a></td></tr><tr><td width="40%" align="left" valign="top">11.8. Partial Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 11.10. Operator Classes and Operator Families</td></tr></table></div></body></html>
+\ No newline at end of file