summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/index-cost-estimation.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:17:33 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:17:33 +0000
commit5e45211a64149b3c659b90ff2de6fa982a5a93ed (patch)
tree739caf8c461053357daa9f162bef34516c7bf452 /doc/src/sgml/html/index-cost-estimation.html
parentInitial commit. (diff)
downloadpostgresql-15-5e45211a64149b3c659b90ff2de6fa982a5a93ed.tar.xz
postgresql-15-5e45211a64149b3c659b90ff2de6fa982a5a93ed.zip
Adding upstream version 15.5.upstream/15.5
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/index-cost-estimation.html')
-rw-r--r--doc/src/sgml/html/index-cost-estimation.html142
1 files changed, 142 insertions, 0 deletions
diff --git a/doc/src/sgml/html/index-cost-estimation.html b/doc/src/sgml/html/index-cost-estimation.html
new file mode 100644
index 0000000..fe706e8
--- /dev/null
+++ b/doc/src/sgml/html/index-cost-estimation.html
@@ -0,0 +1,142 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>64.6. Index Cost Estimation Functions</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="index-unique-checks.html" title="64.5. Index Uniqueness Checks" /><link rel="next" href="generic-wal.html" title="Chapter 65. Generic WAL Records" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">64.6. Index Cost Estimation Functions</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="index-unique-checks.html" title="64.5. Index Uniqueness Checks">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexam.html" title="Chapter 64. Index Access Method Interface Definition">Up</a></td><th width="60%" align="center">Chapter 64. Index Access Method Interface Definition</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="generic-wal.html" title="Chapter 65. Generic WAL Records">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEX-COST-ESTIMATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">64.6. Index Cost Estimation Functions</h2></div></div></div><p>
+ The <code class="function">amcostestimate</code> function is given information describing
+ a possible index scan, including lists of WHERE and ORDER BY clauses that
+ have been determined to be usable with the index. It must return estimates
+ of the cost of accessing the index and the selectivity of the WHERE
+ clauses (that is, the fraction of parent-table rows that will be
+ retrieved during the index scan). For simple cases, nearly all the
+ work of the cost estimator can be done by calling standard routines
+ in the optimizer; the point of having an <code class="function">amcostestimate</code> function is
+ to allow index access methods to provide index-type-specific knowledge,
+ in case it is possible to improve on the standard estimates.
+ </p><p>
+ Each <code class="function">amcostestimate</code> function must have the signature:
+
+</p><pre class="programlisting">
+void
+amcostestimate (PlannerInfo *root,
+ IndexPath *path,
+ double loop_count,
+ Cost *indexStartupCost,
+ Cost *indexTotalCost,
+ Selectivity *indexSelectivity,
+ double *indexCorrelation,
+ double *indexPages);
+</pre><p>
+
+ The first three parameters are inputs:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><em class="parameter"><code>root</code></em></span></dt><dd><p>
+ The planner's information about the query being processed.
+ </p></dd><dt><span class="term"><em class="parameter"><code>path</code></em></span></dt><dd><p>
+ The index access path being considered. All fields except cost and
+ selectivity values are valid.
+ </p></dd><dt><span class="term"><em class="parameter"><code>loop_count</code></em></span></dt><dd><p>
+ The number of repetitions of the index scan that should be factored
+ into the cost estimates. This will typically be greater than one when
+ considering a parameterized scan for use in the inside of a nestloop
+ join. Note that the cost estimates should still be for just one scan;
+ a larger <em class="parameter"><code>loop_count</code></em> means that it may be appropriate
+ to allow for some caching effects across multiple scans.
+ </p></dd></dl></div><p>
+ </p><p>
+ The last five parameters are pass-by-reference outputs:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><em class="parameter"><code>*indexStartupCost</code></em></span></dt><dd><p>
+ Set to cost of index start-up processing
+ </p></dd><dt><span class="term"><em class="parameter"><code>*indexTotalCost</code></em></span></dt><dd><p>
+ Set to total cost of index processing
+ </p></dd><dt><span class="term"><em class="parameter"><code>*indexSelectivity</code></em></span></dt><dd><p>
+ Set to index selectivity
+ </p></dd><dt><span class="term"><em class="parameter"><code>*indexCorrelation</code></em></span></dt><dd><p>
+ Set to correlation coefficient between index scan order and
+ underlying table's order
+ </p></dd><dt><span class="term"><em class="parameter"><code>*indexPages</code></em></span></dt><dd><p>
+ Set to number of index leaf pages
+ </p></dd></dl></div><p>
+ </p><p>
+ Note that cost estimate functions must be written in C, not in SQL or
+ any available procedural language, because they must access internal
+ data structures of the planner/optimizer.
+ </p><p>
+ The index access costs should be computed using the parameters used by
+ <code class="filename">src/backend/optimizer/path/costsize.c</code>: a sequential
+ disk block fetch has cost <code class="varname">seq_page_cost</code>, a nonsequential fetch
+ has cost <code class="varname">random_page_cost</code>, and the cost of processing one index
+ row should usually be taken as <code class="varname">cpu_index_tuple_cost</code>. In
+ addition, an appropriate multiple of <code class="varname">cpu_operator_cost</code> should
+ be charged for any comparison operators invoked during index processing
+ (especially evaluation of the indexquals themselves).
+ </p><p>
+ The access costs should include all disk and CPU costs associated with
+ scanning the index itself, but <span class="emphasis"><em>not</em></span> the costs of retrieving or
+ processing the parent-table rows that are identified by the index.
+ </p><p>
+ The <span class="quote">“<span class="quote">start-up cost</span>”</span> is the part of the total scan cost that
+ must be expended before we can begin to fetch the first row. For most
+ indexes this can be taken as zero, but an index type with a high start-up
+ cost might want to set it nonzero.
+ </p><p>
+ The <em class="parameter"><code>indexSelectivity</code></em> should be set to the estimated fraction of the parent
+ table rows that will be retrieved during the index scan. In the case
+ of a lossy query, this will typically be higher than the fraction of
+ rows that actually pass the given qual conditions.
+ </p><p>
+ The <em class="parameter"><code>indexCorrelation</code></em> should be set to the correlation (ranging between
+ -1.0 and 1.0) between the index order and the table order. This is used
+ to adjust the estimate for the cost of fetching rows from the parent
+ table.
+ </p><p>
+ The <em class="parameter"><code>indexPages</code></em> should be set to the number of leaf pages.
+ This is used to estimate the number of workers for parallel index scan.
+ </p><p>
+ When <em class="parameter"><code>loop_count</code></em> is greater than one, the returned numbers
+ should be averages expected for any one scan of the index.
+ </p><div class="procedure" id="id-1.10.15.12.13"><p class="title"><strong>Cost Estimation</strong></p><p>
+ A typical cost estimator will proceed as follows:
+ </p><ol class="procedure" type="1"><li class="step"><p>
+ Estimate and return the fraction of parent-table rows that will be visited
+ based on the given qual conditions. In the absence of any index-type-specific
+ knowledge, use the standard optimizer function <code class="function">clauselist_selectivity()</code>:
+
+</p><pre class="programlisting">
+*indexSelectivity = clauselist_selectivity(root, path-&gt;indexquals,
+ path-&gt;indexinfo-&gt;rel-&gt;relid,
+ JOIN_INNER, NULL);
+</pre><p>
+ </p></li><li class="step"><p>
+ Estimate the number of index rows that will be visited during the
+ scan. For many index types this is the same as <em class="parameter"><code>indexSelectivity</code></em> times
+ the number of rows in the index, but it might be more. (Note that the
+ index's size in pages and rows is available from the
+ <code class="literal">path-&gt;indexinfo</code> struct.)
+ </p></li><li class="step"><p>
+ Estimate the number of index pages that will be retrieved during the scan.
+ This might be just <em class="parameter"><code>indexSelectivity</code></em> times the index's size in pages.
+ </p></li><li class="step"><p>
+ Compute the index access cost. A generic estimator might do this:
+
+</p><pre class="programlisting">
+/*
+ * Our generic assumption is that the index pages will be read
+ * sequentially, so they cost seq_page_cost each, not random_page_cost.
+ * Also, we charge for evaluation of the indexquals at each index row.
+ * All the costs are assumed to be paid incrementally during the scan.
+ */
+cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
+*indexStartupCost = index_qual_cost.startup;
+*indexTotalCost = seq_page_cost * numIndexPages +
+ (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
+</pre><p>
+
+ However, the above does not account for amortization of index reads
+ across repeated index scans.
+ </p></li><li class="step"><p>
+ Estimate the index correlation. For a simple ordered index on a single
+ field, this can be retrieved from pg_statistic. If the correlation
+ is not known, the conservative estimate is zero (no correlation).
+ </p></li></ol></div><p>
+ Examples of cost estimator functions can be found in
+ <code class="filename">src/backend/utils/adt/selfuncs.c</code>.
+ </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="index-unique-checks.html" title="64.5. Index Uniqueness Checks">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexam.html" title="Chapter 64. Index Access Method Interface Definition">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="generic-wal.html" title="Chapter 65. Generic WAL Records">Next</a></td></tr><tr><td width="40%" align="left" valign="top">64.5. Index Uniqueness Checks </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> Chapter 65. Generic WAL Records</td></tr></table></div></body></html> \ No newline at end of file