diff options
Diffstat (limited to 'doc/src/sgml/html/indexes-partial.html')
-rw-r--r-- | doc/src/sgml/html/indexes-partial.html | 212 |
1 files changed, 212 insertions, 0 deletions
diff --git a/doc/src/sgml/html/indexes-partial.html b/doc/src/sgml/html/indexes-partial.html new file mode 100644 index 0000000..1aa6c32 --- /dev/null +++ b/doc/src/sgml/html/indexes-partial.html @@ -0,0 +1,212 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>11.8. Partial Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="indexes-expressional.html" title="11.7. Indexes on Expressions" /><link rel="next" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">11.8. Partial Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="indexes-expressional.html" title="11.7. Indexes on Expressions">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><th width="60%" align="center">Chapter 11. Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEXES-PARTIAL"><div class="titlepage"><div><div><h2 class="title" style="clear: both">11.8. Partial Indexes</h2></div></div></div><a id="id-1.5.10.11.2" class="indexterm"></a><p> + A <em class="firstterm">partial index</em> is an index built over a + subset of a table; the subset is defined by a conditional + expression (called the <em class="firstterm">predicate</em> of the + partial index). The index contains entries only for those table + rows that satisfy the predicate. Partial indexes are a specialized + feature, but there are several situations in which they are useful. + </p><p> + One major reason for using a partial index is to avoid indexing common + values. Since a query searching for a common value (one that + accounts for more than a few percent of all the table rows) will not + use the index anyway, there is no point in keeping those rows in the + index at all. This reduces the size of the index, which will speed + up those queries that do use the index. It will also speed up many table + update operations because the index does not need to be + updated in all cases. <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX1" title="Example 11.1. Setting up a Partial Index to Exclude Common Values">Example 11.1</a> shows a + possible application of this idea. + </p><div class="example" id="INDEXES-PARTIAL-EX1"><p class="title"><strong>Example 11.1. Setting up a Partial Index to Exclude Common Values</strong></p><div class="example-contents"><p> + Suppose you are storing web server access logs in a database. + Most accesses originate from the IP address range of your organization but + some are from elsewhere (say, employees on dial-up connections). + If your searches by IP are primarily for outside accesses, + you probably do not need to index the IP range that corresponds to your + organization's subnet. + </p><p> + Assume a table like this: +</p><pre class="programlisting"> +CREATE TABLE access_log ( + url varchar, + client_ip inet, + ... +); +</pre><p> + </p><p> + To create a partial index that suits our example, use a command + such as this: +</p><pre class="programlisting"> +CREATE INDEX access_log_client_ip_ix ON access_log (client_ip) +WHERE NOT (client_ip > inet '192.168.100.0' AND + client_ip < inet '192.168.100.255'); +</pre><p> + </p><p> + A typical query that can use this index would be: +</p><pre class="programlisting"> +SELECT * +FROM access_log +WHERE url = '/index.html' AND client_ip = inet '212.78.10.32'; +</pre><p> + Here the query's IP address is covered by the partial index. The + following query cannot use the partial index, as it uses an IP address + that is excluded from the index: +</p><pre class="programlisting"> +SELECT * +FROM access_log +WHERE url = '/index.html' AND client_ip = inet '192.168.100.23'; +</pre><p> + </p><p> + Observe that this kind of partial index requires that the common + values be predetermined, so such partial indexes are best used for + data distributions that do not change. Such indexes can be recreated + occasionally to adjust for new data distributions, but this adds + maintenance effort. + </p></div></div><br class="example-break" /><p> + Another possible use for a partial index is to exclude values from the + index that the + typical query workload is not interested in; this is shown in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX2" title="Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values">Example 11.2</a>. This results in the same + advantages as listed above, but it prevents the + <span class="quote">“<span class="quote">uninteresting</span>”</span> values from being accessed via that + index, even if an index scan might be profitable in that + case. Obviously, setting up partial indexes for this kind of + scenario will require a lot of care and experimentation. + </p><div class="example" id="INDEXES-PARTIAL-EX2"><p class="title"><strong>Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values</strong></p><div class="example-contents"><p> + If you have a table that contains both billed and unbilled orders, + where the unbilled orders take up a small fraction of the total + table and yet those are the most-accessed rows, you can improve + performance by creating an index on just the unbilled rows. The + command to create the index would look like this: +</p><pre class="programlisting"> +CREATE INDEX orders_unbilled_index ON orders (order_nr) + WHERE billed is not true; +</pre><p> + </p><p> + A possible query to use this index would be: +</p><pre class="programlisting"> +SELECT * FROM orders WHERE billed is not true AND order_nr < 10000; +</pre><p> + However, the index can also be used in queries that do not involve + <code class="structfield">order_nr</code> at all, e.g.: +</p><pre class="programlisting"> +SELECT * FROM orders WHERE billed is not true AND amount > 5000.00; +</pre><p> + This is not as efficient as a partial index on the + <code class="structfield">amount</code> column would be, since the system has to + scan the entire index. Yet, if there are relatively few unbilled + orders, using this partial index just to find the unbilled orders + could be a win. + </p><p> + Note that this query cannot use this index: +</p><pre class="programlisting"> +SELECT * FROM orders WHERE order_nr = 3501; +</pre><p> + The order 3501 might be among the billed or unbilled + orders. + </p></div></div><br class="example-break" /><p> + <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX2" title="Example 11.2. Setting up a Partial Index to Exclude Uninteresting Values">Example 11.2</a> also illustrates that the + indexed column and the column used in the predicate do not need to + match. <span class="productname">PostgreSQL</span> supports partial + indexes with arbitrary predicates, so long as only columns of the + table being indexed are involved. However, keep in mind that the + predicate must match the conditions used in the queries that + are supposed to benefit from the index. To be precise, a partial + index can be used in a query only if the system can recognize that + the <code class="literal">WHERE</code> condition of the query mathematically implies + the predicate of the index. + <span class="productname">PostgreSQL</span> does not have a sophisticated + theorem prover that can recognize mathematically equivalent + expressions that are written in different forms. (Not + only is such a general theorem prover extremely difficult to + create, it would probably be too slow to be of any real use.) + The system can recognize simple inequality implications, for example + <span class="quote">“<span class="quote">x < 1</span>”</span> implies <span class="quote">“<span class="quote">x < 2</span>”</span>; otherwise + the predicate condition must exactly match part of the query's + <code class="literal">WHERE</code> condition + or the index will not be recognized as usable. Matching takes + place at query planning time, not at run time. As a result, + parameterized query clauses do not work with a partial index. For + example a prepared query with a parameter might specify + <span class="quote">“<span class="quote">x < ?</span>”</span> which will never imply + <span class="quote">“<span class="quote">x < 2</span>”</span> for all possible values of the parameter. + </p><p> + A third possible use for partial indexes does not require the + index to be used in queries at all. The idea here is to create + a unique index over a subset of a table, as in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX3" title="Example 11.3. Setting up a Partial Unique Index">Example 11.3</a>. This enforces uniqueness + among the rows that satisfy the index predicate, without constraining + those that do not. + </p><div class="example" id="INDEXES-PARTIAL-EX3"><p class="title"><strong>Example 11.3. Setting up a Partial Unique Index</strong></p><div class="example-contents"><p> + Suppose that we have a table describing test outcomes. We wish + to ensure that there is only one <span class="quote">“<span class="quote">successful</span>”</span> entry for + a given subject and target combination, but there might be any number of + <span class="quote">“<span class="quote">unsuccessful</span>”</span> entries. Here is one way to do it: +</p><pre class="programlisting"> +CREATE TABLE tests ( + subject text, + target text, + success boolean, + ... +); + +CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target) + WHERE success; +</pre><p> + This is a particularly efficient approach when there are few + successful tests and many unsuccessful ones. It is also possible to + allow only one null in a column by creating a unique partial index + with an <code class="literal">IS NULL</code> restriction. + </p></div></div><br class="example-break" /><p> + Finally, a partial index can also be used to override the system's + query plan choices. Also, data sets with peculiar + distributions might cause the system to use an index when it really + should not. In that case the index can be set up so that it is not + available for the offending query. Normally, + <span class="productname">PostgreSQL</span> makes reasonable choices about index + usage (e.g., it avoids them when retrieving common values, so the + earlier example really only saves index size, it is not required to + avoid index usage), and grossly incorrect plan choices are cause + for a bug report. + </p><p> + Keep in mind that setting up a partial index indicates that you + know at least as much as the query planner knows, in particular you + know when an index might be profitable. Forming this knowledge + requires experience and understanding of how indexes in + <span class="productname">PostgreSQL</span> work. In most cases, the + advantage of a partial index over a regular index will be minimal. + There are cases where they are quite counterproductive, as in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX4" title="Example 11.4. Do Not Use Partial Indexes as a Substitute for Partitioning">Example 11.4</a>. + </p><div class="example" id="INDEXES-PARTIAL-EX4"><p class="title"><strong>Example 11.4. Do Not Use Partial Indexes as a Substitute for Partitioning</strong></p><div class="example-contents"><p> + You might be tempted to create a large set of non-overlapping partial + indexes, for example + +</p><pre class="programlisting"> +CREATE INDEX mytable_cat_1 ON mytable (data) WHERE category = 1; +CREATE INDEX mytable_cat_2 ON mytable (data) WHERE category = 2; +CREATE INDEX mytable_cat_3 ON mytable (data) WHERE category = 3; +... +CREATE INDEX mytable_cat_<em class="replaceable"><code>N</code></em> ON mytable (data) WHERE category = <em class="replaceable"><code>N</code></em>; +</pre><p> + + This is a bad idea! Almost certainly, you'll be better off with a + single non-partial index, declared like + +</p><pre class="programlisting"> +CREATE INDEX mytable_cat_data ON mytable (category, data); +</pre><p> + + (Put the category column first, for the reasons described in + <a class="xref" href="indexes-multicolumn.html" title="11.3. Multicolumn Indexes">Section 11.3</a>.) While a search in this larger + index might have to descend through a couple more tree levels than a + search in a smaller index, that's almost certainly going to be cheaper + than the planner effort needed to select the appropriate one of the + partial indexes. The core of the problem is that the system does not + understand the relationship among the partial indexes, and will + laboriously test each one to see if it's applicable to the current + query. + </p><p> + If your table is large enough that a single index really is a bad idea, + you should look into using partitioning instead (see + <a class="xref" href="ddl-partitioning.html" title="5.11. Table Partitioning">Section 5.11</a>). With that mechanism, the system + does understand that the tables and indexes are non-overlapping, so + far better performance is possible. + </p></div></div><br class="example-break" /><p> + More information about partial indexes can be found in <a class="xref" href="biblio.html#STON89B">[ston89b]</a>, <a class="xref" href="biblio.html#OLSON93" title="Partial indexing in POSTGRES: research project">[olson93]</a>, and <a class="xref" href="biblio.html#SESHADRI95">[seshadri95]</a>. + </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="indexes-expressional.html" title="11.7. Indexes on Expressions">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="indexes-index-only-scans.html" title="11.9. Index-Only Scans and Covering Indexes">Next</a></td></tr><tr><td width="40%" align="left" valign="top">11.7. Indexes on Expressions </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 11.9. Index-Only Scans and Covering Indexes</td></tr></table></div></body></html>
\ No newline at end of file |