summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/ref/analyze.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/ref/analyze.sgml')
-rw-r--r--doc/src/sgml/ref/analyze.sgml329
1 files changed, 329 insertions, 0 deletions
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
new file mode 100644
index 0000000..b968f74
--- /dev/null
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -0,0 +1,329 @@
+<!--
+doc/src/sgml/ref/analyze.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-analyze">
+ <indexterm zone="sql-analyze">
+ <primary>ANALYZE</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>ANALYZE</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>ANALYZE</refname>
+ <refpurpose>collect statistics about a database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
+ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+ VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+ SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+ <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>ANALYZE</command> collects statistics about the contents
+ of tables in the database, and stores the results in the <link
+ linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
+ system catalog. Subsequently, the query planner uses these
+ statistics to help determine the most efficient execution plans for
+ queries.
+ </para>
+
+ <para>
+ Without a <replaceable class="parameter">table_and_columns</replaceable>
+ list, <command>ANALYZE</command> processes every table and materialized view
+ in the current database that the current user has permission to analyze.
+ With a list, <command>ANALYZE</command> processes only those table(s).
+ It is further possible to give a list of column names for a table,
+ in which case only the statistics for those columns are collected.
+ </para>
+
+ <para>
+ When the option list is surrounded by parentheses, the options can be
+ written in any order. The parenthesized syntax was added in
+ <productname>PostgreSQL</productname> 11; the unparenthesized syntax
+ is deprecated.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>VERBOSE</literal></term>
+ <listitem>
+ <para>
+ Enables display of progress messages.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SKIP_LOCKED</literal></term>
+ <listitem>
+ <para>
+ Specifies that <command>ANALYZE</command> should not wait for any
+ conflicting locks to be released when beginning work on a relation:
+ if a relation cannot be locked immediately without waiting, the relation
+ is skipped. Note that even with this option, <command>ANALYZE</command>
+ may still block when opening the relation's indexes or when acquiring
+ sample rows from partitions, table inheritance children, and some
+ types of foreign tables. Also, while <command>ANALYZE</command>
+ ordinarily processes all partitions of specified partitioned tables,
+ this option will cause <command>ANALYZE</command> to skip all
+ partitions if there is a conflicting lock on the partitioned table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether the selected option should be turned on or off.
+ You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+ <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+ <literal>OFF</literal>, or <literal>0</literal> to disable it. The
+ <replaceable class="parameter">boolean</replaceable> value can also
+ be omitted, in which case <literal>TRUE</literal> is assumed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">table_name</replaceable></term>
+ <listitem>
+ <para>
+ The name (possibly schema-qualified) of a specific table to
+ analyze. If omitted, all regular tables, partitioned tables, and
+ materialized views in the current database are analyzed (but not
+ foreign tables). If the specified table is a partitioned table, both the
+ inheritance statistics of the partitioned table as a whole and
+ statistics of the individual partitions are updated.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">column_name</replaceable></term>
+ <listitem>
+ <para>
+ The name of a specific column to analyze. Defaults to all columns.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+
+ <para>
+ When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits
+ progress messages to indicate which table is currently being
+ processed. Various statistics about the tables are printed as well.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ To analyze a table, one must ordinarily be the table's owner or a
+ superuser. However, database owners are allowed to
+ analyze all tables in their databases, except shared catalogs.
+ (The restriction for shared catalogs means that a true database-wide
+ <command>ANALYZE</command> can only be performed by a superuser.)
+ <command>ANALYZE</command> will skip over any tables that the calling user
+ does not have permission to analyze.
+ </para>
+
+ <para>
+ Foreign tables are analyzed only when explicitly selected. Not all
+ foreign data wrappers support <command>ANALYZE</command>. If the table's
+ wrapper does not support <command>ANALYZE</command>, the command prints a
+ warning and does nothing.
+ </para>
+
+ <para>
+ In the default <productname>PostgreSQL</productname> configuration,
+ the autovacuum daemon (see <xref linkend="autovacuum"/>)
+ takes care of automatic analyzing of tables when they are first loaded
+ with data, and as they change throughout regular operation.
+ When autovacuum is disabled,
+ it is a good idea to run <command>ANALYZE</command> periodically, or
+ just after making major changes in the contents of a table. Accurate
+ statistics will help the planner to choose the most appropriate query
+ plan, and thereby improve the speed of query processing. A common
+ strategy for read-mostly databases is to run <link linkend="sql-vacuum"><command>VACUUM</command></link>
+ and <command>ANALYZE</command> once a day during a low-usage time of day.
+ (This will not be sufficient if there is heavy update activity.)
+ </para>
+
+ <para>
+ <command>ANALYZE</command>
+ requires only a read lock on the target table, so it can run in
+ parallel with other activity on the table.
+ </para>
+
+ <para>
+ The statistics collected by <command>ANALYZE</command> usually
+ include a list of some of the most common values in each column and
+ a histogram showing the approximate data distribution in each
+ column. One or both of these can be omitted if
+ <command>ANALYZE</command> deems them uninteresting (for example,
+ in a unique-key column, there are no common values) or if the
+ column data type does not support the appropriate operators. There
+ is more information about the statistics in <xref
+ linkend="maintenance"/>.
+ </para>
+
+ <para>
+ For large tables, <command>ANALYZE</command> takes a random sample
+ of the table contents, rather than examining every row. This
+ allows even very large tables to be analyzed in a small amount of
+ time. Note, however, that the statistics are only approximate, and
+ will change slightly each time <command>ANALYZE</command> is run,
+ even if the actual table contents did not change. This might result
+ in small changes in the planner's estimated costs shown by
+ <link linkend="sql-explain"><command>EXPLAIN</command></link>.
+ In rare situations, this non-determinism will cause the planner's
+ choices of query plans to change after <command>ANALYZE</command> is run.
+ To avoid this, raise the amount of statistics collected by
+ <command>ANALYZE</command>, as described below.
+ </para>
+
+ <para>
+ The extent of analysis can be controlled by adjusting the
+ <xref linkend="guc-default-statistics-target"/> configuration variable, or
+ on a column-by-column basis by setting the per-column statistics
+ target with <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET
+ STATISTICS</command></link>.
+ The target value sets the
+ maximum number of entries in the most-common-value list and the
+ maximum number of bins in the histogram. The default target value
+ is 100, but this can be adjusted up or down to trade off accuracy of
+ planner estimates against the time taken for
+ <command>ANALYZE</command> and the amount of space occupied in
+ <literal>pg_statistic</literal>. In particular, setting the
+ statistics target to zero disables collection of statistics for
+ that column. It might be useful to do that for columns that are
+ never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>,
+ or <literal>ORDER BY</literal> clauses of queries, since the planner will
+ have no use for statistics on such columns.
+ </para>
+
+ <para>
+ The largest statistics target among the columns being analyzed determines
+ the number of table rows sampled to prepare the statistics. Increasing
+ the target causes a proportional increase in the time and space needed
+ to do <command>ANALYZE</command>.
+ </para>
+
+ <para>
+ One of the values estimated by <command>ANALYZE</command> is the number of
+ distinct values that appear in each column. Because only a subset of the
+ rows are examined, this estimate can sometimes be quite inaccurate, even
+ with the largest possible statistics target. If this inaccuracy leads to
+ bad query plans, a more accurate value can be determined manually and then
+ installed with
+ <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command></link>.
+ </para>
+
+ <para>
+ If the table being analyzed has one or more children,
+ <command>ANALYZE</command> will gather statistics twice: once on the
+ rows of the parent table only, and a second time on the rows of the
+ parent table with all of its children. This second set of statistics
+ is needed when planning queries that traverse the entire inheritance
+ tree. The autovacuum daemon, however, will only consider inserts or
+ updates on the parent table itself when deciding whether to trigger an
+ automatic analyze for that table. If that table is rarely inserted into
+ or updated, the inheritance statistics will not be up to date unless you
+ run <command>ANALYZE</command> manually.
+ </para>
+
+ <para>
+ For partitioned tables, <command>ANALYZE</command> gathers statistics by
+ sampling rows from all partitions; in addition, it will recurse into each
+ partition and update its statistics. Each leaf partition is analyzed only
+ once, even with multi-level partitioning. No statistics are collected for
+ only the parent table (without data from its partitions), because with
+ partitioning it's guaranteed to be empty.
+ </para>
+
+ <para>
+ By contrast, if the table being analyzed has inheritance children,
+ <command>ANALYZE</command> gathers two sets of statistics: one on the rows
+ of the parent table only, and a second including rows of both the parent
+ table and all of its children. This second set of statistics is needed when
+ planning queries that process the inheritance tree as a whole. The child
+ tables themselves are not individually analyzed in this case.
+ </para>
+
+ <para>
+ The autovacuum daemon does not process partitioned tables, nor does it
+ process inheritance parents if only the children are ever modified.
+ It is usually necessary to periodically run a manual
+ <command>ANALYZE</command> to keep the statistics of the table hierarchy
+ up to date.
+ </para>
+
+ <para>
+ If any child tables or partitions are foreign tables whose foreign
+ data wrappers do not support <command>ANALYZE</command>, those tables are
+ ignored while gathering inheritance statistics.
+ </para>
+
+ <para>
+ If the table being analyzed is completely empty, <command>ANALYZE</command>
+ will not record new statistics for that table. Any existing statistics
+ will be retained.
+ </para>
+
+ <para>
+ Each backend running <command>ANALYZE</command> will report its progress
+ in the <structname>pg_stat_progress_analyze</structname> view. See
+ <xref linkend="analyze-progress-reporting"/> for details.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no <command>ANALYZE</command> statement in the SQL standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-vacuum"/></member>
+ <member><xref linkend="app-vacuumdb"/></member>
+ <member><xref linkend="runtime-config-resource-vacuum-cost"/></member>
+ <member><xref linkend="autovacuum"/></member>
+ <member><xref linkend="analyze-progress-reporting"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>