summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/indexes-partial.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
commit293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
treefc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/indexes-partial.html
parentInitial commit. (diff)
downloadpostgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz
postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--doc/src/sgml/html/indexes-partial.html212
1 files changed, 212 insertions, 0 deletions
diff --git a/doc/src/sgml/html/indexes-partial.html b/doc/src/sgml/html/indexes-partial.html
new file mode 100644
index 0000000..404c38c
--- /dev/null
+++ b/doc/src/sgml/html/indexes-partial.html
@@ -0,0 +1,212 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>11.8. Partial Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="indexes-expressional.html" title="11.7. Indexes on Expressions" /><link rel="next" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">11.8. Partial Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="indexes-expressional.html" title="11.7. Indexes on Expressions">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><th width="60%" align="center">Chapter 11. Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEXES-PARTIAL"><div class="titlepage"><div><div><h2 class="title" style="clear: both">11.8. Partial Indexes <a href="#INDEXES-PARTIAL" class="id_link">#</a></h2></div></div></div><a id="id-1.5.10.11.2" class="indexterm"></a><p>
+ A <em class="firstterm">partial index</em> is an index built over a
+ subset of a table; the subset is defined by a conditional
+ expression (called the <em class="firstterm">predicate</em> of the
+ partial index). The index contains entries only for those table
+ rows that satisfy the predicate. Partial indexes are a specialized
+ feature, but there are several situations in which they are useful.
+ </p><p>
+ One major reason for using a partial index is to avoid indexing common
+ values. Since a query searching for a common value (one that
+ accounts for more than a few percent of all the table rows) will not
+ use the index anyway, there is no point in keeping those rows in the
+ index at all. This reduces the size of the index, which will speed
+ up those queries that do use the index. It will also speed up many table
+ update operations because the index does not need to be
+ updated in all cases. <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX1" title="Example 11.1. Setting up a Partial Index to Exclude Common Values">Example 11.1</a> shows a
+ possible application of this idea.
+ </p><div class="example" id="INDEXES-PARTIAL-EX1"><p class="title"><strong>Example 11.1. Setting up a Partial Index to Exclude Common Values</strong></p><div class="example-contents"><p>
+ Suppose you are storing web server access logs in a database.
+ Most accesses originate from the IP address range of your organization but
+ some are from elsewhere (say, employees on dial-up connections).
+ If your searches by IP are primarily for outside accesses,
+ you probably do not need to index the IP range that corresponds to your
+ organization's subnet.
+ </p><p>
+ Assume a table like this:
+</p><pre class="programlisting">
+CREATE TABLE access_log (
+ url varchar,
+ client_ip inet,
+ ...
+);
+</pre><p>
+ </p><p>
+ To create a partial index that suits our example, use a command
+ such as this:
+</p><pre class="programlisting">
+CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
+WHERE NOT (client_ip &gt; inet '192.168.100.0' AND
+ client_ip &lt; inet '192.168.100.255');
+</pre><p>
+ </p><p>
+ A typical query that can use this index would be:
+</p><pre class="programlisting">
+SELECT *
+FROM access_log
+WHERE url = '/index.html' AND client_ip = inet '212.78.10.32';
+</pre><p>
+ Here the query's IP address is covered by the partial index. The
+ following query cannot use the partial index, as it uses an IP address
+ that is excluded from the index:
+</p><pre class="programlisting">
+SELECT *
+FROM access_log
+WHERE url = '/index.html' AND client_ip = inet '192.168.100.23';
+</pre><p>
+ </p><p>
+ Observe that this kind of partial index requires that the common
+ values be predetermined, so such partial indexes are best used for
+ data distributions that do not change. Such indexes can be recreated
+ occasionally to adjust for new data distributions, but this adds
+ maintenance effort.
+ </p></div></div><br class="example-break" /><p>
+ Another possible use for a partial index is to exclude values from the
+ index that the
+ typical query workload is not interested in; this is shown in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX2" title="Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values">Example 11.2</a>. This results in the same
+ advantages as listed above, but it prevents the
+ <span class="quote">“<span class="quote">uninteresting</span>”</span> values from being accessed via that
+ index, even if an index scan might be profitable in that
+ case. Obviously, setting up partial indexes for this kind of
+ scenario will require a lot of care and experimentation.
+ </p><div class="example" id="INDEXES-PARTIAL-EX2"><p class="title"><strong>Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values</strong></p><div class="example-contents"><p>
+ If you have a table that contains both billed and unbilled orders,
+ where the unbilled orders take up a small fraction of the total
+ table and yet those are the most-accessed rows, you can improve
+ performance by creating an index on just the unbilled rows. The
+ command to create the index would look like this:
+</p><pre class="programlisting">
+CREATE INDEX orders_unbilled_index ON orders (order_nr)
+ WHERE billed is not true;
+</pre><p>
+ </p><p>
+ A possible query to use this index would be:
+</p><pre class="programlisting">
+SELECT * FROM orders WHERE billed is not true AND order_nr &lt; 10000;
+</pre><p>
+ However, the index can also be used in queries that do not involve
+ <code class="structfield">order_nr</code> at all, e.g.:
+</p><pre class="programlisting">
+SELECT * FROM orders WHERE billed is not true AND amount &gt; 5000.00;
+</pre><p>
+ This is not as efficient as a partial index on the
+ <code class="structfield">amount</code> column would be, since the system has to
+ scan the entire index. Yet, if there are relatively few unbilled
+ orders, using this partial index just to find the unbilled orders
+ could be a win.
+ </p><p>
+ Note that this query cannot use this index:
+</p><pre class="programlisting">
+SELECT * FROM orders WHERE order_nr = 3501;
+</pre><p>
+ The order 3501 might be among the billed or unbilled
+ orders.
+ </p></div></div><br class="example-break" /><p>
+ <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX2" title="Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values">Example 11.2</a> also illustrates that the
+ indexed column and the column used in the predicate do not need to
+ match. <span class="productname">PostgreSQL</span> supports partial
+ indexes with arbitrary predicates, so long as only columns of the
+ table being indexed are involved. However, keep in mind that the
+ predicate must match the conditions used in the queries that
+ are supposed to benefit from the index. To be precise, a partial
+ index can be used in a query only if the system can recognize that
+ the <code class="literal">WHERE</code> condition of the query mathematically implies
+ the predicate of the index.
+ <span class="productname">PostgreSQL</span> does not have a sophisticated
+ theorem prover that can recognize mathematically equivalent
+ expressions that are written in different forms. (Not
+ only is such a general theorem prover extremely difficult to
+ create, it would probably be too slow to be of any real use.)
+ The system can recognize simple inequality implications, for example
+ <span class="quote">“<span class="quote">x &lt; 1</span>”</span> implies <span class="quote">“<span class="quote">x &lt; 2</span>”</span>; otherwise
+ the predicate condition must exactly match part of the query's
+ <code class="literal">WHERE</code> condition
+ or the index will not be recognized as usable. Matching takes
+ place at query planning time, not at run time. As a result,
+ parameterized query clauses do not work with a partial index. For
+ example a prepared query with a parameter might specify
+ <span class="quote">“<span class="quote">x &lt; ?</span>”</span> which will never imply
+ <span class="quote">“<span class="quote">x &lt; 2</span>”</span> for all possible values of the parameter.
+ </p><p>
+ A third possible use for partial indexes does not require the
+ index to be used in queries at all. The idea here is to create
+ a unique index over a subset of a table, as in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX3" title="Example 11.3. Setting up a Partial Unique Index">Example 11.3</a>. This enforces uniqueness
+ among the rows that satisfy the index predicate, without constraining
+ those that do not.
+ </p><div class="example" id="INDEXES-PARTIAL-EX3"><p class="title"><strong>Example 11.3. Setting up a Partial Unique Index</strong></p><div class="example-contents"><p>
+ Suppose that we have a table describing test outcomes. We wish
+ to ensure that there is only one <span class="quote">“<span class="quote">successful</span>”</span> entry for
+ a given subject and target combination, but there might be any number of
+ <span class="quote">“<span class="quote">unsuccessful</span>”</span> entries. Here is one way to do it:
+</p><pre class="programlisting">
+CREATE TABLE tests (
+ subject text,
+ target text,
+ success boolean,
+ ...
+);
+
+CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
+ WHERE success;
+</pre><p>
+ This is a particularly efficient approach when there are few
+ successful tests and many unsuccessful ones. It is also possible to
+ allow only one null in a column by creating a unique partial index
+ with an <code class="literal">IS NULL</code> restriction.
+ </p></div></div><br class="example-break" /><p>
+ Finally, a partial index can also be used to override the system's
+ query plan choices. Also, data sets with peculiar
+ distributions might cause the system to use an index when it really
+ should not. In that case the index can be set up so that it is not
+ available for the offending query. Normally,
+ <span class="productname">PostgreSQL</span> makes reasonable choices about index
+ usage (e.g., it avoids them when retrieving common values, so the
+ earlier example really only saves index size, it is not required to
+ avoid index usage), and grossly incorrect plan choices are cause
+ for a bug report.
+ </p><p>
+ Keep in mind that setting up a partial index indicates that you
+ know at least as much as the query planner knows, in particular you
+ know when an index might be profitable. Forming this knowledge
+ requires experience and understanding of how indexes in
+ <span class="productname">PostgreSQL</span> work. In most cases, the
+ advantage of a partial index over a regular index will be minimal.
+ There are cases where they are quite counterproductive, as in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX4" title="Example 11.4. Do Not Use Partial Indexes as a Substitute for Partitioning">Example 11.4</a>.
+ </p><div class="example" id="INDEXES-PARTIAL-EX4"><p class="title"><strong>Example 11.4. Do Not Use Partial Indexes as a Substitute for Partitioning</strong></p><div class="example-contents"><p>
+ You might be tempted to create a large set of non-overlapping partial
+ indexes, for example
+
+</p><pre class="programlisting">
+CREATE INDEX mytable_cat_1 ON mytable (data) WHERE category = 1;
+CREATE INDEX mytable_cat_2 ON mytable (data) WHERE category = 2;
+CREATE INDEX mytable_cat_3 ON mytable (data) WHERE category = 3;
+...
+CREATE INDEX mytable_cat_<em class="replaceable"><code>N</code></em> ON mytable (data) WHERE category = <em class="replaceable"><code>N</code></em>;
+</pre><p>
+
+ This is a bad idea! Almost certainly, you'll be better off with a
+ single non-partial index, declared like
+
+</p><pre class="programlisting">
+CREATE INDEX mytable_cat_data ON mytable (category, data);
+</pre><p>
+
+ (Put the category column first, for the reasons described in
+ <a class="xref" href="indexes-multicolumn.html" title="11.3. Multicolumn Indexes">Section 11.3</a>.) While a search in this larger
+ index might have to descend through a couple more tree levels than a
+ search in a smaller index, that's almost certainly going to be cheaper
+ than the planner effort needed to select the appropriate one of the
+ partial indexes. The core of the problem is that the system does not
+ understand the relationship among the partial indexes, and will
+ laboriously test each one to see if it's applicable to the current
+ query.
+ </p><p>
+ If your table is large enough that a single index really is a bad idea,
+ you should look into using partitioning instead (see
+ <a class="xref" href="ddl-partitioning.html" title="5.11. Table Partitioning">Section 5.11</a>). With that mechanism, the system
+ does understand that the tables and indexes are non-overlapping, so
+ far better performance is possible.
+ </p></div></div><br class="example-break" /><p>
+ More information about partial indexes can be found in <a class="xref" href="biblio.html#STON89B">[ston89b]</a>, <a class="xref" href="biblio.html#OLSON93" title="Partial indexing in POSTGRES: research project">[olson93]</a>, and <a class="xref" href="biblio.html#SESHADRI95">[seshadri95]</a>.
+ </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="indexes-expressional.html" title="11.7. Indexes on Expressions">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes">Next</a></td></tr><tr><td width="40%" align="left" valign="top">11.7. Indexes on Expressions </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 11.9. Index-Only Scans and Covering Indexes</td></tr></table></div></body></html> \ No newline at end of file