diff options
Diffstat (limited to 'doc/src/sgml/ref/analyze.sgml')
-rw-r--r-- | doc/src/sgml/ref/analyze.sgml | 329 |
1 files changed, 329 insertions, 0 deletions
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml new file mode 100644 index 0000000..b968f74 --- /dev/null +++ b/doc/src/sgml/ref/analyze.sgml @@ -0,0 +1,329 @@ +<!-- +doc/src/sgml/ref/analyze.sgml +PostgreSQL documentation +--> + +<refentry id="sql-analyze"> + <indexterm zone="sql-analyze"> + <primary>ANALYZE</primary> + </indexterm> + + <refmeta> + <refentrytitle>ANALYZE</refentrytitle> + <manvolnum>7</manvolnum> + <refmiscinfo>SQL - Language Statements</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>ANALYZE</refname> + <refpurpose>collect statistics about a database</refpurpose> + </refnamediv> + + <refsynopsisdiv> +<synopsis> +ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ] +ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ] + +<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase> + + VERBOSE [ <replaceable class="parameter">boolean</replaceable> ] + SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ] + +<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase> + + <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] +</synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Description</title> + + <para> + <command>ANALYZE</command> collects statistics about the contents + of tables in the database, and stores the results in the <link + linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link> + system catalog. Subsequently, the query planner uses these + statistics to help determine the most efficient execution plans for + queries. + </para> + + <para> + Without a <replaceable class="parameter">table_and_columns</replaceable> + list, <command>ANALYZE</command> processes every table and materialized view + in the current database that the current user has permission to analyze. + With a list, <command>ANALYZE</command> processes only those table(s). + It is further possible to give a list of column names for a table, + in which case only the statistics for those columns are collected. + </para> + + <para> + When the option list is surrounded by parentheses, the options can be + written in any order. The parenthesized syntax was added in + <productname>PostgreSQL</productname> 11; the unparenthesized syntax + is deprecated. + </para> + </refsect1> + + <refsect1> + <title>Parameters</title> + + <variablelist> + <varlistentry> + <term><literal>VERBOSE</literal></term> + <listitem> + <para> + Enables display of progress messages. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>SKIP_LOCKED</literal></term> + <listitem> + <para> + Specifies that <command>ANALYZE</command> should not wait for any + conflicting locks to be released when beginning work on a relation: + if a relation cannot be locked immediately without waiting, the relation + is skipped. Note that even with this option, <command>ANALYZE</command> + may still block when opening the relation's indexes or when acquiring + sample rows from partitions, table inheritance children, and some + types of foreign tables. Also, while <command>ANALYZE</command> + ordinarily processes all partitions of specified partitioned tables, + this option will cause <command>ANALYZE</command> to skip all + partitions if there is a conflicting lock on the partitioned table. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">boolean</replaceable></term> + <listitem> + <para> + Specifies whether the selected option should be turned on or off. + You can write <literal>TRUE</literal>, <literal>ON</literal>, or + <literal>1</literal> to enable the option, and <literal>FALSE</literal>, + <literal>OFF</literal>, or <literal>0</literal> to disable it. The + <replaceable class="parameter">boolean</replaceable> value can also + be omitted, in which case <literal>TRUE</literal> is assumed. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">table_name</replaceable></term> + <listitem> + <para> + The name (possibly schema-qualified) of a specific table to + analyze. If omitted, all regular tables, partitioned tables, and + materialized views in the current database are analyzed (but not + foreign tables). If the specified table is a partitioned table, both the + inheritance statistics of the partitioned table as a whole and + statistics of the individual partitions are updated. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">column_name</replaceable></term> + <listitem> + <para> + The name of a specific column to analyze. Defaults to all columns. + </para> + </listitem> + </varlistentry> + </variablelist> + </refsect1> + + <refsect1> + <title>Outputs</title> + + <para> + When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits + progress messages to indicate which table is currently being + processed. Various statistics about the tables are printed as well. + </para> + </refsect1> + + <refsect1> + <title>Notes</title> + + <para> + To analyze a table, one must ordinarily be the table's owner or a + superuser. However, database owners are allowed to + analyze all tables in their databases, except shared catalogs. + (The restriction for shared catalogs means that a true database-wide + <command>ANALYZE</command> can only be performed by a superuser.) + <command>ANALYZE</command> will skip over any tables that the calling user + does not have permission to analyze. + </para> + + <para> + Foreign tables are analyzed only when explicitly selected. Not all + foreign data wrappers support <command>ANALYZE</command>. If the table's + wrapper does not support <command>ANALYZE</command>, the command prints a + warning and does nothing. + </para> + + <para> + In the default <productname>PostgreSQL</productname> configuration, + the autovacuum daemon (see <xref linkend="autovacuum"/>) + takes care of automatic analyzing of tables when they are first loaded + with data, and as they change throughout regular operation. + When autovacuum is disabled, + it is a good idea to run <command>ANALYZE</command> periodically, or + just after making major changes in the contents of a table. Accurate + statistics will help the planner to choose the most appropriate query + plan, and thereby improve the speed of query processing. A common + strategy for read-mostly databases is to run <link linkend="sql-vacuum"><command>VACUUM</command></link> + and <command>ANALYZE</command> once a day during a low-usage time of day. + (This will not be sufficient if there is heavy update activity.) + </para> + + <para> + <command>ANALYZE</command> + requires only a read lock on the target table, so it can run in + parallel with other activity on the table. + </para> + + <para> + The statistics collected by <command>ANALYZE</command> usually + include a list of some of the most common values in each column and + a histogram showing the approximate data distribution in each + column. One or both of these can be omitted if + <command>ANALYZE</command> deems them uninteresting (for example, + in a unique-key column, there are no common values) or if the + column data type does not support the appropriate operators. There + is more information about the statistics in <xref + linkend="maintenance"/>. + </para> + + <para> + For large tables, <command>ANALYZE</command> takes a random sample + of the table contents, rather than examining every row. This + allows even very large tables to be analyzed in a small amount of + time. Note, however, that the statistics are only approximate, and + will change slightly each time <command>ANALYZE</command> is run, + even if the actual table contents did not change. This might result + in small changes in the planner's estimated costs shown by + <link linkend="sql-explain"><command>EXPLAIN</command></link>. + In rare situations, this non-determinism will cause the planner's + choices of query plans to change after <command>ANALYZE</command> is run. + To avoid this, raise the amount of statistics collected by + <command>ANALYZE</command>, as described below. + </para> + + <para> + The extent of analysis can be controlled by adjusting the + <xref linkend="guc-default-statistics-target"/> configuration variable, or + on a column-by-column basis by setting the per-column statistics + target with <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET + STATISTICS</command></link>. + The target value sets the + maximum number of entries in the most-common-value list and the + maximum number of bins in the histogram. The default target value + is 100, but this can be adjusted up or down to trade off accuracy of + planner estimates against the time taken for + <command>ANALYZE</command> and the amount of space occupied in + <literal>pg_statistic</literal>. In particular, setting the + statistics target to zero disables collection of statistics for + that column. It might be useful to do that for columns that are + never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>, + or <literal>ORDER BY</literal> clauses of queries, since the planner will + have no use for statistics on such columns. + </para> + + <para> + The largest statistics target among the columns being analyzed determines + the number of table rows sampled to prepare the statistics. Increasing + the target causes a proportional increase in the time and space needed + to do <command>ANALYZE</command>. + </para> + + <para> + One of the values estimated by <command>ANALYZE</command> is the number of + distinct values that appear in each column. Because only a subset of the + rows are examined, this estimate can sometimes be quite inaccurate, even + with the largest possible statistics target. If this inaccuracy leads to + bad query plans, a more accurate value can be determined manually and then + installed with + <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command></link>. + </para> + + <para> + If the table being analyzed has one or more children, + <command>ANALYZE</command> will gather statistics twice: once on the + rows of the parent table only, and a second time on the rows of the + parent table with all of its children. This second set of statistics + is needed when planning queries that traverse the entire inheritance + tree. The autovacuum daemon, however, will only consider inserts or + updates on the parent table itself when deciding whether to trigger an + automatic analyze for that table. If that table is rarely inserted into + or updated, the inheritance statistics will not be up to date unless you + run <command>ANALYZE</command> manually. + </para> + + <para> + For partitioned tables, <command>ANALYZE</command> gathers statistics by + sampling rows from all partitions; in addition, it will recurse into each + partition and update its statistics. Each leaf partition is analyzed only + once, even with multi-level partitioning. No statistics are collected for + only the parent table (without data from its partitions), because with + partitioning it's guaranteed to be empty. + </para> + + <para> + By contrast, if the table being analyzed has inheritance children, + <command>ANALYZE</command> gathers two sets of statistics: one on the rows + of the parent table only, and a second including rows of both the parent + table and all of its children. This second set of statistics is needed when + planning queries that process the inheritance tree as a whole. The child + tables themselves are not individually analyzed in this case. + </para> + + <para> + The autovacuum daemon does not process partitioned tables, nor does it + process inheritance parents if only the children are ever modified. + It is usually necessary to periodically run a manual + <command>ANALYZE</command> to keep the statistics of the table hierarchy + up to date. + </para> + + <para> + If any child tables or partitions are foreign tables whose foreign + data wrappers do not support <command>ANALYZE</command>, those tables are + ignored while gathering inheritance statistics. + </para> + + <para> + If the table being analyzed is completely empty, <command>ANALYZE</command> + will not record new statistics for that table. Any existing statistics + will be retained. + </para> + + <para> + Each backend running <command>ANALYZE</command> will report its progress + in the <structname>pg_stat_progress_analyze</structname> view. See + <xref linkend="analyze-progress-reporting"/> for details. + </para> + </refsect1> + + <refsect1> + <title>Compatibility</title> + + <para> + There is no <command>ANALYZE</command> statement in the SQL standard. + </para> + </refsect1> + + <refsect1> + <title>See Also</title> + + <simplelist type="inline"> + <member><xref linkend="sql-vacuum"/></member> + <member><xref linkend="app-vacuumdb"/></member> + <member><xref linkend="runtime-config-resource-vacuum-cost"/></member> + <member><xref linkend="autovacuum"/></member> + <member><xref linkend="analyze-progress-reporting"/></member> + </simplelist> + </refsect1> +</refentry> |