11.12. Examining Index Usage

11.12. Examining Index Usage
Prev	Up	Chapter 11. Indexes	Home	Next

+ Although indexes in PostgreSQL do not need + maintenance or tuning, it is still important to check + which indexes are actually used by the real-life query workload. + Examining index usage for an individual query is done with the + EXPLAIN + command; its application for this purpose is + illustrated in Section 14.1. + It is also possible to gather overall statistics about index usage + in a running server, as described in Section 28.2. +

+ It is difficult to formulate a general procedure for determining + which indexes to create. There are a number of typical cases that + have been shown in the examples throughout the previous sections. + A good deal of experimentation is often necessary. + The rest of this section gives some tips for that: +

+ Always run ANALYZE + first. This command + collects statistics about the distribution of the values in the + table. This information is required to estimate the number of rows + returned by a query, which is needed by the planner to assign + realistic costs to each possible query plan. In absence of any + real statistics, some default values are assumed, which are + almost certain to be inaccurate. Examining an application's + index usage without having run ANALYZE is + therefore a lost cause. + See Section 25.1.3 + and Section 25.1.6 for more information. +
+ Use real data for experimentation. Using test data for setting + up indexes will tell you what indexes you need for the test data, + but that is all. +
+ It is especially fatal to use very small test data sets. + While selecting 1000 out of 100000 rows could be a candidate for + an index, selecting 1 out of 100 rows will hardly be, because the + 100 rows probably fit within a single disk page, and there + is no plan that can beat sequentially fetching 1 disk page. +
+ Also be careful when making up test data, which is often + unavoidable when the application is not yet in production. + Values that are very similar, completely random, or inserted in + sorted order will skew the statistics away from the distribution + that real data would have. +
+ When indexes are not used, it can be useful for testing to force + their use. There are run-time parameters that can turn off + various plan types (see Section 20.7.1). + For instance, turning off sequential scans + (enable_seqscan) and nested-loop joins + (enable_nestloop), which are the most basic plans, + will force the system to use a different plan. If the system + still chooses a sequential scan or nested-loop join then there is + probably a more fundamental reason why the index is not being + used; for example, the query condition does not match the index. + (What kind of query can use what kind of index is explained in + the previous sections.) +
+ If forcing index usage does use the index, then there are two + possibilities: Either the system is right and using the index is + indeed not appropriate, or the cost estimates of the query plans + are not reflecting reality. So you should time your query with + and without indexes. The EXPLAIN ANALYZE + command can be useful here. +
+ If it turns out that the cost estimates are wrong, there are, + again, two possibilities. The total cost is computed from the + per-row costs of each plan node times the selectivity estimate of + the plan node. The costs estimated for the plan nodes can be adjusted + via run-time parameters (described in Section 20.7.2). + An inaccurate selectivity estimate is due to + insufficient statistics. It might be possible to improve this by + tuning the statistics-gathering parameters (see + ALTER TABLE). +
+ If you do not succeed in adjusting the costs to be more + appropriate, then you might have to resort to forcing index usage + explicitly. You might also want to contact the + PostgreSQL developers to examine the issue. +

Prev	Up	Next
11.11. Indexes and Collations	Home	Chapter 12. Full Text Search