From 5e45211a64149b3c659b90ff2de6fa982a5a93ed Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 4 May 2024 14:17:33 +0200 Subject: Adding upstream version 15.5. Signed-off-by: Daniel Baumann --- doc/src/sgml/html/index-cost-estimation.html | 142 +++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 doc/src/sgml/html/index-cost-estimation.html (limited to 'doc/src/sgml/html/index-cost-estimation.html') diff --git a/doc/src/sgml/html/index-cost-estimation.html b/doc/src/sgml/html/index-cost-estimation.html new file mode 100644 index 0000000..fe706e8 --- /dev/null +++ b/doc/src/sgml/html/index-cost-estimation.html @@ -0,0 +1,142 @@ + +64.6. Index Cost Estimation Functions

64.6. Index Cost Estimation Functions

+ The amcostestimate function is given information describing + a possible index scan, including lists of WHERE and ORDER BY clauses that + have been determined to be usable with the index. It must return estimates + of the cost of accessing the index and the selectivity of the WHERE + clauses (that is, the fraction of parent-table rows that will be + retrieved during the index scan). For simple cases, nearly all the + work of the cost estimator can be done by calling standard routines + in the optimizer; the point of having an amcostestimate function is + to allow index access methods to provide index-type-specific knowledge, + in case it is possible to improve on the standard estimates. +

+ Each amcostestimate function must have the signature: + +

+void
+amcostestimate (PlannerInfo *root,
+                IndexPath *path,
+                double loop_count,
+                Cost *indexStartupCost,
+                Cost *indexTotalCost,
+                Selectivity *indexSelectivity,
+                double *indexCorrelation,
+                double *indexPages);
+

+ + The first three parameters are inputs: + +

root

+ The planner's information about the query being processed. +

path

+ The index access path being considered. All fields except cost and + selectivity values are valid. +

loop_count

+ The number of repetitions of the index scan that should be factored + into the cost estimates. This will typically be greater than one when + considering a parameterized scan for use in the inside of a nestloop + join. Note that the cost estimates should still be for just one scan; + a larger loop_count means that it may be appropriate + to allow for some caching effects across multiple scans. +

+

+ The last five parameters are pass-by-reference outputs: + +

*indexStartupCost

+ Set to cost of index start-up processing +

*indexTotalCost

+ Set to total cost of index processing +

*indexSelectivity

+ Set to index selectivity +

*indexCorrelation

+ Set to correlation coefficient between index scan order and + underlying table's order +

*indexPages

+ Set to number of index leaf pages +

+

+ Note that cost estimate functions must be written in C, not in SQL or + any available procedural language, because they must access internal + data structures of the planner/optimizer. +

+ The index access costs should be computed using the parameters used by + src/backend/optimizer/path/costsize.c: a sequential + disk block fetch has cost seq_page_cost, a nonsequential fetch + has cost random_page_cost, and the cost of processing one index + row should usually be taken as cpu_index_tuple_cost. In + addition, an appropriate multiple of cpu_operator_cost should + be charged for any comparison operators invoked during index processing + (especially evaluation of the indexquals themselves). +

+ The access costs should include all disk and CPU costs associated with + scanning the index itself, but not the costs of retrieving or + processing the parent-table rows that are identified by the index. +

+ The start-up cost is the part of the total scan cost that + must be expended before we can begin to fetch the first row. For most + indexes this can be taken as zero, but an index type with a high start-up + cost might want to set it nonzero. +

+ The indexSelectivity should be set to the estimated fraction of the parent + table rows that will be retrieved during the index scan. In the case + of a lossy query, this will typically be higher than the fraction of + rows that actually pass the given qual conditions. +

+ The indexCorrelation should be set to the correlation (ranging between + -1.0 and 1.0) between the index order and the table order. This is used + to adjust the estimate for the cost of fetching rows from the parent + table. +

+ The indexPages should be set to the number of leaf pages. + This is used to estimate the number of workers for parallel index scan. +

+ When loop_count is greater than one, the returned numbers + should be averages expected for any one scan of the index. +

Cost Estimation

+ A typical cost estimator will proceed as follows: +

  1. + Estimate and return the fraction of parent-table rows that will be visited + based on the given qual conditions. In the absence of any index-type-specific + knowledge, use the standard optimizer function clauselist_selectivity(): + +

    +*indexSelectivity = clauselist_selectivity(root, path->indexquals,
    +                                           path->indexinfo->rel->relid,
    +                                           JOIN_INNER, NULL);
    +

    +

  2. + Estimate the number of index rows that will be visited during the + scan. For many index types this is the same as indexSelectivity times + the number of rows in the index, but it might be more. (Note that the + index's size in pages and rows is available from the + path->indexinfo struct.) +

  3. + Estimate the number of index pages that will be retrieved during the scan. + This might be just indexSelectivity times the index's size in pages. +

  4. + Compute the index access cost. A generic estimator might do this: + +

    +/*
    + * Our generic assumption is that the index pages will be read
    + * sequentially, so they cost seq_page_cost each, not random_page_cost.
    + * Also, we charge for evaluation of the indexquals at each index row.
    + * All the costs are assumed to be paid incrementally during the scan.
    + */
    +cost_qual_eval(&index_qual_cost, path->indexquals, root);
    +*indexStartupCost = index_qual_cost.startup;
    +*indexTotalCost = seq_page_cost * numIndexPages +
    +    (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
    +

    + + However, the above does not account for amortization of index reads + across repeated index scans. +

  5. + Estimate the index correlation. For a simple ordered index on a single + field, this can be retrieved from pg_statistic. If the correlation + is not known, the conservative estimate is zero (no correlation). +

+ Examples of cost estimator functions can be found in + src/backend/utils/adt/selfuncs.c. +

\ No newline at end of file -- cgit v1.2.3