summaryrefslogtreecommitdiffstats
path: root/www/lang_analyze.html
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--www/lang_analyze.html204
1 files changed, 168 insertions, 36 deletions
diff --git a/www/lang_analyze.html b/www/lang_analyze.html
index 7f65a7f..f841633 100644
--- a/www/lang_analyze.html
+++ b/www/lang_analyze.html
@@ -120,6 +120,34 @@ antiRobotDefense();
<div class="fancy_title">
ANALYZE
</div>
+<div class="fancy_toc">
+<a onclick="toggle_toc()">
+<span class="fancy_toc_mark" id="toc_mk">&#x25ba;</span>
+Table Of Contents
+</a>
+<div id="toc_sub"><div class="fancy-toc1"><a href="#overview">1. Overview</a></div>
+<div class="fancy-toc1"><a href="#recommended_usage_patterns">2. Recommended usage patterns</a></div>
+<div class="fancy-toc2"><a href="#periodically_run_pragma_optimize_">2.1. Periodically run "PRAGMA optimize"</a></div>
+<div class="fancy-toc2"><a href="#fixed_results_of_analyze">2.2. Fixed results of ANALYZE</a></div>
+<div class="fancy-toc1"><a href="#details">3. Details</a></div>
+<div class="fancy-toc1"><a href="#automatically_running_analyze">4. Automatically Running ANALYZE</a></div>
+<div class="fancy-toc1"><a href="#approximate_analyze_for_large_databases">5. Approximate ANALYZE For Large Databases</a></div>
+<div class="fancy-toc2"><a href="#limitations_of_approximate_analyze">5.1. Limitations of approximate ANALYZE</a></div>
+</div>
+</div>
+<script>
+function toggle_toc(){
+var sub = document.getElementById("toc_sub")
+var mk = document.getElementById("toc_mk")
+if( sub.style.display!="block" ){
+sub.style.display = "block";
+mk.innerHTML = "&#x25bc;";
+} else {
+sub.style.display = "none";
+mk.innerHTML = "&#x25ba;";
+}
+}
+</script>
</div>
@@ -128,8 +156,8 @@ ANALYZE
<h1 id="overview"><span>1. </span>Overview</h1>
<p><b><a href="syntax/analyze-stmt.html">analyze-stmt:</a></b>
-<button id='x9d08cb93' onclick='hideorshow("x9d08cb93","x6f8054a5")'>hide</button></p>
- <div id='x6f8054a5' class='imgcontainer'>
+<button id='x3115000d' onclick='hideorshow("x3115000d","x4b619059")'>hide</button></p>
+ <div id='x4b619059' class='imgcontainer'>
<div style="max-width:654px"><svg xmlns='http://www.w3.org/2000/svg' class="pikchr" viewBox="0 0 654.557 140.4">
<circle cx="5" cy="17" r="3.6" style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<polygon points="32,17 20,21 20,12" style="fill:rgb(0,0,0)"/>
@@ -181,7 +209,7 @@ ANALYZE
indices and stores the collected information
in <a href="fileformat2.html#intschema">internal tables</a> of the database where the query optimizer can
access the information and use it to help make better query planning choices.
-If no arguments are given, all attached databases are
+If no arguments are given, the main database and all attached databases are
analyzed. If a schema name is given as the argument, then all tables
and indices in that one database are analyzed.
If the argument is a table name, then only that table and the
@@ -190,24 +218,121 @@ is an index name, then only that one index is analyzed.</p>
<a name="req"></a>
-<h2 id="recommended_usage_pattern"><span>1.1. </span>Recommended usage pattern</h2>
+<h1 id="recommended_usage_patterns"><span>2. </span>Recommended usage patterns</h1>
+
+<p> The use of ANALYZE is never required. However, if an application
+makes complex queries that have many possible query plans, the query
+planner will be better able to pick the best plan if ANALYZE has
+been run. This can result it significant performance improvements for
+some queries.
+
+</p><p> Two recommended approaches for when and how to run ANALYZE are
+described in the next subsections, in order of preference.
+
+<a name="pragopt"></a>
+
+</p><h2 id="periodically_run_pragma_optimize_"><span>2.1. </span>Periodically run "PRAGMA optimize"</h2>
+
+<p>The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command will automatically run ANALYZE when
+needed. Suggested use:
+
+</p><ol>
+<li><p>
+Applications with short-lived database connections should run
+"PRAGMA optimize;" once, just prior to closing each database connection.
+
+</p></li><li><p>
+Applications that use long-lived database connections should run
+"PRAGMA optimize=0x10002;" when the connection is first opened, and then
+also run "PRAGMA optimize;" periodically, perhaps once per day, or more if
+the database is evolving rapidly.
+
+</p></li><li><p>All applications should run "PRAGMA optimize;" after a schema change,
+especially after one or more <a href="lang_createindex.html">CREATE INDEX</a> statements.
+</p></li></ol>
+
+<p>
+
+</p><p>The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command is usually a no-op but it will occasionally
+run one or more ANALYZE subcommands on individual tables of the database
+if doing so will be useful to the query planner.
+Since SQLite version 3.46.0 (2024-05-23), the "PRAGMA optimize" command
+automatically limits the scope of ANALYZE subcommands so that
+the overall "PRAGMA optimize" command completes quickly even on enormous
+databases. There is no need to use <a href="pragma.html#pragma_analysis_limit">PRAGMA analysis_limit</a>. This is the
+recommended way of running ANALYZE moving forward.
+
+</p><p>The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command will normally only consider running ANALYZE on
+tables that have been previously queried by the same database connection or
+that do not have entries in the <a href="fileformat2.html#stat1tab">sqlite_stat1</a> table.
+However, if the 0x10000 bit is added to the argument, PRAGMA optimize will
+examine all tables to see if they can benefit from ANALYZE, not just those
+that have been recently queried.
+There is no query history when a database connection first opens, and
+that is why adding the 0x10000 bit is recommended when running PRAGMA optimize
+on a fresh database connection.
+
+</p><p>See the <a href="lang_analyze.html#autoanalyze">Automatically Running ANALYZE</a> and
+<a href="lang_analyze.html#approx">Approximate ANALYZE For Large Databases</a> sections below for additional
+information.
+
+<a name="statanal"></a>
+
+</p><h2 id="fixed_results_of_analyze"><span>2.2. </span>Fixed results of ANALYZE</h2>
+
+<p>Running ANALYZE can cause SQLite to choose different query plans
+for subsequent queries. This is almost always a positive thing, as the
+query plans chosen after ANALYZE will in nearly every case be better than
+the query plans picked before ANALYZE. That is the whole point of ANALYZE.
+But there can be no proof of running ANALYZE will always be beneficial.
+One can construct pathological cases where running
+ANALYZE could make some subsequent queries run slower.
+
+</p><p>Some developers prefer that once the design of an application is frozen,
+SQLite will always pick the same query plans as it did during
+development and testing.
+Then if a millions of copies of the application are shipped to customers,
+the developers are assured that all of those millions of copies are running
+the same query plans regardless of what data the individual customers insert
+into their particular databases. This can help in reproducing complaints
+of performance problems coming back from the field.
+
+</p><p>To achieve this objection, never run a full ANALYZE nor the
+"PRAGMA optimize" command in the application.
+Rather, only run ANALYZE during development, manually using the
+<a href="cli.html">command-line interface</a> or similar, on a test database
+that is similar in size and content to live databases. Then capture
+the result of this one-time ANALYZE using a script like the
+following:
+
+</p><div class="codeblock"><pre>.mode list
+SELECT
+ 'ANALYZE sqlite_schema;' ||
+ 'DELETE FROM sqlite_stat1;' ||
+ 'INSERT INTO sqlite_stat1(tbl,idx,stat)VALUES' ||
+ (SELECT group_concat(format('(%Q,%Q,%Q)',tbl,idx,stat),',')
+ FROM sqlite_stat1) ||
+ ';ANALYZE sqlite_schema;';
+</pre></div>
-<p>Applications with long-lived databases that use complex
-queries should consider running the following commands just prior
-to closing each database connection:
+<p>When creating a new instance of the database in deployed instances of
+the application, or perhaps every time the application is started up in
+the case of long-running applications, run the commands generated by
+script above. This will populate the <a href="fileformat2.html#stat1tab">sqlite_stat1</a> table exactly as
+it was during development and testing and ensure that the query plans
+selected in the field are same has those selected during testing in the
+lab. Maybe copy/paste the string generated by the script above into
+a static string constant named "zStat1Init" and then invoke:
-</p><div class="codeblock"><pre>PRAGMA analysis_limit=400;
-PRAGMA optimize;
+</p><div class="codeblock"><pre>sqlite3_exec(db, zStat1Init, 0, 0, 0);
</pre></div>
-<p>The <a href="pragma.html#pragma_optimize">optimize pragma</a> is usually a no-op but it will occasionally
-run ANALYZE if it seems like doing so will be useful to the query planner.
-The <a href="pragma.html#pragma_analysis_limit">analysis_limit pragma</a> limits the scope of any ANALYZE command that
-the <a href="pragma.html#pragma_optimize">optimize pragma</a> runs so that it does not consume too many CPU cycles.
-The constant "400" can be adjusted as needed. Values between 100 and
-1000 work well for most applications.
+<p>Perhaps also add "BEGIN;" at the start of the string constant and
+"COMMIT;" at the end, depending on the context in which the script is run.
-</p><h1 id="details"><span>2. </span>Details</h1>
+</p><p>See the <a href="queryplanner-ng.html#qpstab">query planner stability guarantee</a> for addition information.
+
+</p><h1 id="details"><span>3. </span>Details</h1>
<p> The default implementation stores all statistics in a single
table named "<a href="fileformat2.html#stat1tab">sqlite_stat1</a>".
@@ -238,7 +363,7 @@ ANALYZE command.
See "<a href="optoverview.html#manctrl">Manual Control Of Query Plans Using SQLITE_STAT Tables</a>" for
further information.</p>
-<p> Statistics gathered by ANALYZE are not automatically updated as
+<p> Statistics gathered by ANALYZE are <u>not</u> updated as
the content of the database changes. If the content of the database
changes significantly, or if the database schema changes, then one should
consider rerunning the ANALYZE command in order to update the statistics.</p>
@@ -252,12 +377,15 @@ can force the query planner to reread the statistics tables by running
<a name="autoanalyze"></a>
-<h1 id="automatically_running_analyze"><span>3. </span>Automatically Running ANALYZE</h1>
+<h1 id="automatically_running_analyze"><span>4. </span>Automatically Running ANALYZE</h1>
<p>The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command will automatically run ANALYZE on individual
tables on an as-needed basis. The recommended practice is for applications
to invoke the <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> statement just before closing each database
-connection.</p>
+connection. Or, if the application keeps a single database connection open
+for a long time, then it should run "PRAGMA optimize=0x10002" when the
+connection is first opened and run "PRAGMA optimize;" periodically thereafter,
+perhaps once per day or even once per hour.</p>
<p>Each SQLite <a href="c3ref/sqlite3.html">database connection</a> records cases when the query planner would
benefit from having accurate results of ANALYZE at hand. These records
@@ -274,22 +402,23 @@ is recommended that <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> be
is closing and has thus had an opportunity to accumulate as much usage information
as possible. It is also reasonable to set a timer to run <a href="pragma.html#pragma_optimize">PRAGMA optimize</a>
every few hours, or every few days, for database connections that stay open
-for a long time.</p>
-
-<p>Applications that desire more control can run <a href="pragma.html#pragma_optimize">PRAGMA optimize(0x03)</a> to
-obtain a list of ANALYZE commands that SQLite thinks are appropriate to run,
-but without actually running those commands. If the returned set is
-non-empty, the application can then make a decision about whether or not
-to run the suggested ANALYZE commands, perhaps after prompting the user
-for guidance.</p>
+for a long time. When running <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> immediately after a
+database connection is opened, one can add the 0x10000 bit to the bitmask
+argument (thus making the command read "PRAGMA optimize=0x10002") which
+causes all tables to be examined, even tables that have not been
+queried during the current connection.</p>
<p>The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command was first introduced with
SQLite 3.18.0 (2017-03-28) and is a no-op for all prior releases
-of SQLite.</p>
+of SQLite. The <a href="pragma.html#pragma_optimize">PRAGMA optimize</a> command was significantly enhanced
+in SQLite 3.46.0 (2024-05-23) and the advice given in this
+documentation is based on those enhancements. Applications that
+use earlier versions of SQLite should consult the corresponding
+documentation for better advice on the best ways to use PRAGMA optimize.</p>
<a name="approx"></a>
-<h1 id="approximate_analyze_for_large_databases"><span>4. </span>Approximate ANALYZE For Large Databases</h1>
+<h1 id="approximate_analyze_for_large_databases"><span>5. </span>Approximate ANALYZE For Large Databases</h1>
<p>By default, ANALYZE does a full scan of every index. This can be slow for
large databases. So beginning with SQLite version 3.32.0 (2020-05-22), the
@@ -335,16 +464,19 @@ is noticeably inferior to a full-scan ANALYZE, but such cases are rare in
real-world problems.
</p><p>A good rule of thumb seems to be to always set "PRAGMA analysis_limit=N"
-for N between 100 and 1000 prior to running either "ANALYZE" or
-"<a href="pragma.html#pragma_optimize">PRAGMA optimize</a>". The results are not quite as precise, but they
-are precise enough, and the fact that the results are computed so much
-faster means that developers are more likely to compute them. An
-approximate ANALYZE is better than not running ANALYZE at all.
+for N between 100 and 1000 prior to running either "ANALYZE". It used
+to be that this was also recommended prior to running
+"<a href="pragma.html#pragma_optimize">PRAGMA optimize</a>", but since version 3.46.0 (2024-05-23) that
+happens automatically. The results are not quite as precise when using
+PRAGMA analysis_limit, but they are precise enough, and the fact that
+the results are computed so much faster means that developers are more
+likely to compute them. An approximate ANALYZE is better than not
+running ANALYZE at all.
-</p><h2 id="limitations_of_approximate_analyze"><span>4.1. </span>Limitations of approximate ANALYZE</h2>
+</p><h2 id="limitations_of_approximate_analyze"><span>5.1. </span>Limitations of approximate ANALYZE</h2>
<p>The content in the sqlite_stat4 table cannot be computed with
anything less than a full scan. Hence, if a non-zero analysis limit
is specified, the sqlite_stat4 table is not computed.
-</p><p align="center"><small><i>This page last modified on <a href="https://sqlite.org/docsrc/honeypot" id="mtimelink" data-href="https://sqlite.org/docsrc/finfo/pages/lang_analyze.in?m=79afc38f73">2024-02-21 13:43:47</a> UTC </small></i></p>
+</p><p align="center"><small><i>This page last modified on <a href="https://sqlite.org/docsrc/honeypot" id="mtimelink" data-href="https://sqlite.org/docsrc/finfo/pages/lang_analyze.in?m=4607c3507e">2024-05-05 15:23:53</a> UTC </small></i></p>