Adding upstream version 16.2.upstream/16.2

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-13 13:44:03 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-13 13:44:03 +0000
commit: 293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
tree: fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/populate.html
parent: Initial commit. (diff)
download: postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz
postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip
1 files changed, 206 insertions, 0 deletions
diff --git a/doc/src/sgml/html/populate.html b/doc/src/sgml/html/populate.html
new file mode 100644
index 0000000..b49980b
--- /dev/null
+++ b/doc/src/sgml/html/populate.html
@@ -0,0 +1,206 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.4. Populating a Database</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses" /><link rel="next" href="non-durability.html" title="14.5. Non-Durable Settings" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.4. Populating a Database</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="non-durability.html" title="14.5. Non-Durable Settings">Next</a></td></tr></table><hr /></div><div class="sect1" id="POPULATE"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.4. Populating a Database <a href="#POPULATE" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="populate.html#DISABLE-AUTOCOMMIT">14.4.1. Disable Autocommit</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-COPY-FROM">14.4.2. Use <code class="command">COPY</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-RM-INDEXES">14.4.3. Remove Indexes</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-RM-FKEYS">14.4.4. Remove Foreign Key Constraints</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-WORK-MEM">14.4.5. Increase <code class="varname">maintenance_work_mem</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-MAX-WAL-SIZE">14.4.6. Increase <code class="varname">max_wal_size</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-PITR">14.4.7. Disable WAL Archival and Streaming Replication</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-ANALYZE">14.4.8. Run <code class="command">ANALYZE</code> Afterwards</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-PG-DUMP">14.4.9. Some Notes about <span class="application">pg_dump</span></a></span></dt></dl></div><p>
+   One might need to insert a large amount of data when first populating
+   a database. This section contains some suggestions on how to make
+   this process as efficient as possible.
+  </p><div class="sect2" id="DISABLE-AUTOCOMMIT"><div class="titlepage"><div><div><h3 class="title">14.4.1. Disable Autocommit <a href="#DISABLE-AUTOCOMMIT" class="id_link">#</a></h3></div></div></div><a id="id-1.5.13.7.3.2" class="indexterm"></a><p>
+    When using multiple <code class="command">INSERT</code>s, turn off autocommit and just do
+    one commit at the end.  (In plain
+    SQL, this means issuing <code class="command">BEGIN</code> at the start and
+    <code class="command">COMMIT</code> at the end.  Some client libraries might
+    do this behind your back, in which case you need to make sure the
+    library does it when you want it done.)  If you allow each
+    insertion to be committed separately,
+    <span class="productname">PostgreSQL</span> is doing a lot of work for
+    each row that is added.  An additional benefit of doing all
+    insertions in one transaction is that if the insertion of one row
+    were to fail then the insertion of all rows inserted up to that
+    point would be rolled back, so you won't be stuck with partially
+    loaded data.
+   </p></div><div class="sect2" id="POPULATE-COPY-FROM"><div class="titlepage"><div><div><h3 class="title">14.4.2. Use <code class="command">COPY</code> <a href="#POPULATE-COPY-FROM" class="id_link">#</a></h3></div></div></div><p>
+    Use <a class="link" href="sql-copy.html" title="COPY"><code class="command">COPY</code></a> to load
+    all the rows in one command, instead of using a series of
+    <code class="command">INSERT</code> commands.  The <code class="command">COPY</code>
+    command is optimized for loading large numbers of rows; it is less
+    flexible than <code class="command">INSERT</code>, but incurs significantly
+    less overhead for large data loads. Since <code class="command">COPY</code>
+    is a single command, there is no need to disable autocommit if you
+    use this method to populate a table.
+   </p><p>
+    If you cannot use <code class="command">COPY</code>, it might help to use <a class="link" href="sql-prepare.html" title="PREPARE"><code class="command">PREPARE</code></a> to create a
+    prepared <code class="command">INSERT</code> statement, and then use
+    <code class="command">EXECUTE</code> as many times as required.  This avoids
+    some of the overhead of repeatedly parsing and planning
+    <code class="command">INSERT</code>. Different interfaces provide this facility
+    in different ways; look for <span class="quote">“<span class="quote">prepared statements</span>”</span> in the interface
+    documentation.
+   </p><p>
+    Note that loading a large number of rows using
+    <code class="command">COPY</code> is almost always faster than using
+    <code class="command">INSERT</code>, even if <code class="command">PREPARE</code> is used and
+    multiple insertions are batched into a single transaction.
+   </p><p>
+    <code class="command">COPY</code> is fastest when used within the same
+    transaction as an earlier <code class="command">CREATE TABLE</code> or
+    <code class="command">TRUNCATE</code> command. In such cases no WAL
+    needs to be written, because in case of an error, the files
+    containing the newly loaded data will be removed anyway.
+    However, this consideration only applies when
+    <a class="xref" href="runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> is <code class="literal">minimal</code>
+    as all commands must write WAL otherwise.
+   </p></div><div class="sect2" id="POPULATE-RM-INDEXES"><div class="titlepage"><div><div><h3 class="title">14.4.3. Remove Indexes <a href="#POPULATE-RM-INDEXES" class="id_link">#</a></h3></div></div></div><p>
+    If you are loading a freshly created table, the fastest method is to
+    create the table, bulk load the table's data using
+    <code class="command">COPY</code>, then create any indexes needed for the
+    table.  Creating an index on pre-existing data is quicker than
+    updating it incrementally as each row is loaded.
+   </p><p>
+    If you are adding large amounts of data to an existing table,
+    it might be a win to drop the indexes,
+    load the table, and then recreate the indexes.  Of course, the
+    database performance for other users might suffer
+    during the time the indexes are missing.  One should also think
+    twice before dropping a unique index, since the error checking
+    afforded by the unique constraint will be lost while the index is
+    missing.
+   </p></div><div class="sect2" id="POPULATE-RM-FKEYS"><div class="titlepage"><div><div><h3 class="title">14.4.4. Remove Foreign Key Constraints <a href="#POPULATE-RM-FKEYS" class="id_link">#</a></h3></div></div></div><p>
+    Just as with indexes, a foreign key constraint can be checked
+    <span class="quote">“<span class="quote">in bulk</span>”</span> more efficiently than row-by-row.  So it might be
+    useful to drop foreign key constraints, load data, and re-create
+    the constraints.  Again, there is a trade-off between data load
+    speed and loss of error checking while the constraint is missing.
+   </p><p>
+    What's more, when you load data into a table with existing foreign key
+    constraints, each new row requires an entry in the server's list of
+    pending trigger events (since it is the firing of a trigger that checks
+    the row's foreign key constraint).  Loading many millions of rows can
+    cause the trigger event queue to overflow available memory, leading to
+    intolerable swapping or even outright failure of the command.  Therefore
+    it may be <span class="emphasis"><em>necessary</em></span>, not just desirable, to drop and re-apply
+    foreign keys when loading large amounts of data.  If temporarily removing
+    the constraint isn't acceptable, the only other recourse may be to split
+    up the load operation into smaller transactions.
+   </p></div><div class="sect2" id="POPULATE-WORK-MEM"><div class="titlepage"><div><div><h3 class="title">14.4.5. Increase <code class="varname">maintenance_work_mem</code> <a href="#POPULATE-WORK-MEM" class="id_link">#</a></h3></div></div></div><p>
+    Temporarily increasing the <a class="xref" href="runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM">maintenance_work_mem</a>
+    configuration variable when loading large amounts of data can
+    lead to improved performance.  This will help to speed up <code class="command">CREATE
+    INDEX</code> commands and <code class="command">ALTER TABLE ADD FOREIGN KEY</code> commands.
+    It won't do much for <code class="command">COPY</code> itself, so this advice is
+    only useful when you are using one or both of the above techniques.
+   </p></div><div class="sect2" id="POPULATE-MAX-WAL-SIZE"><div class="titlepage"><div><div><h3 class="title">14.4.6. Increase <code class="varname">max_wal_size</code> <a href="#POPULATE-MAX-WAL-SIZE" class="id_link">#</a></h3></div></div></div><p>
+    Temporarily increasing the <a class="xref" href="runtime-config-wal.html#GUC-MAX-WAL-SIZE">max_wal_size</a>
+    configuration variable can also
+    make large data loads faster.  This is because loading a large
+    amount of data into <span class="productname">PostgreSQL</span> will
+    cause checkpoints to occur more often than the normal checkpoint
+    frequency (specified by the <code class="varname">checkpoint_timeout</code>
+    configuration variable). Whenever a checkpoint occurs, all dirty
+    pages must be flushed to disk. By increasing
+    <code class="varname">max_wal_size</code> temporarily during bulk
+    data loads, the number of checkpoints that are required can be
+    reduced.
+   </p></div><div class="sect2" id="POPULATE-PITR"><div class="titlepage"><div><div><h3 class="title">14.4.7. Disable WAL Archival and Streaming Replication <a href="#POPULATE-PITR" class="id_link">#</a></h3></div></div></div><p>
+    When loading large amounts of data into an installation that uses
+    WAL archiving or streaming replication, it might be faster to take a
+    new base backup after the load has completed than to process a large
+    amount of incremental WAL data.  To prevent incremental WAL logging
+    while loading, disable archiving and streaming replication, by setting
+    <a class="xref" href="runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> to <code class="literal">minimal</code>,
+    <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-MODE">archive_mode</a> to <code class="literal">off</code>, and
+    <a class="xref" href="runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</a> to zero.
+    But note that changing these settings requires a server restart,
+    and makes any base backups taken before unavailable for archive
+    recovery and standby server, which may lead to data loss.
+   </p><p>
+    Aside from avoiding the time for the archiver or WAL sender to process the
+    WAL data, doing this will actually make certain commands faster, because
+    they do not to write WAL at all if <code class="varname">wal_level</code>
+    is <code class="literal">minimal</code> and the current subtransaction (or top-level
+    transaction) created or truncated the table or index they change.  (They
+    can guarantee crash safety more cheaply by doing
+    an <code class="function">fsync</code> at the end than by writing WAL.)
+   </p></div><div class="sect2" id="POPULATE-ANALYZE"><div class="titlepage"><div><div><h3 class="title">14.4.8. Run <code class="command">ANALYZE</code> Afterwards <a href="#POPULATE-ANALYZE" class="id_link">#</a></h3></div></div></div><p>
+    Whenever you have significantly altered the distribution of data
+    within a table, running <a class="link" href="sql-analyze.html" title="ANALYZE"><code class="command">ANALYZE</code></a> is strongly recommended. This
+    includes bulk loading large amounts of data into the table.  Running
+    <code class="command">ANALYZE</code> (or <code class="command">VACUUM ANALYZE</code>)
+    ensures that the planner has up-to-date statistics about the
+    table.  With no statistics or obsolete statistics, the planner might
+    make poor decisions during query planning, leading to poor
+    performance on any tables with inaccurate or nonexistent
+    statistics.  Note that if the autovacuum daemon is enabled, it might
+    run <code class="command">ANALYZE</code> automatically; see
+    <a class="xref" href="routine-vacuuming.html#VACUUM-FOR-STATISTICS" title="25.1.3. Updating Planner Statistics">Section 25.1.3</a>
+    and <a class="xref" href="routine-vacuuming.html#AUTOVACUUM" title="25.1.6. The Autovacuum Daemon">Section 25.1.6</a> for more information.
+   </p></div><div class="sect2" id="POPULATE-PG-DUMP"><div class="titlepage"><div><div><h3 class="title">14.4.9. Some Notes about <span class="application">pg_dump</span> <a href="#POPULATE-PG-DUMP" class="id_link">#</a></h3></div></div></div><p>
+    Dump scripts generated by <span class="application">pg_dump</span> automatically apply
+    several, but not all, of the above guidelines.  To restore a
+    <span class="application">pg_dump</span> dump as quickly as possible, you need to
+    do a few extra things manually.  (Note that these points apply while
+    <span class="emphasis"><em>restoring</em></span> a dump, not while <span class="emphasis"><em>creating</em></span> it.
+    The same points apply whether loading a text dump with
+    <span class="application">psql</span> or using <span class="application">pg_restore</span> to load
+    from a <span class="application">pg_dump</span> archive file.)
+   </p><p>
+    By default, <span class="application">pg_dump</span> uses <code class="command">COPY</code>, and when
+    it is generating a complete schema-and-data dump, it is careful to
+    load data before creating indexes and foreign keys.  So in this case
+    several guidelines are handled automatically.  What is left
+    for you to do is to:
+    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+       Set appropriate (i.e., larger than normal) values for
+       <code class="varname">maintenance_work_mem</code> and
+       <code class="varname">max_wal_size</code>.
+      </p></li><li class="listitem"><p>
+       If using WAL archiving or streaming replication, consider disabling
+       them during the restore. To do that, set <code class="varname">archive_mode</code>
+       to <code class="literal">off</code>,
+       <code class="varname">wal_level</code> to <code class="literal">minimal</code>, and
+       <code class="varname">max_wal_senders</code> to zero before loading the dump.
+       Afterwards, set them back to the right values and take a fresh
+       base backup.
+      </p></li><li class="listitem"><p>
+       Experiment with the parallel dump and restore modes of both
+       <span class="application">pg_dump</span> and <span class="application">pg_restore</span> and find the
+       optimal number of concurrent jobs to use. Dumping and restoring in
+       parallel by means of the <code class="option">-j</code> option should give you a
+       significantly higher performance over the serial mode.
+      </p></li><li class="listitem"><p>
+       Consider whether the whole dump should be restored as a single
+       transaction.  To do that, pass the <code class="option">-1</code> or
+       <code class="option">--single-transaction</code> command-line option to
+       <span class="application">psql</span> or <span class="application">pg_restore</span>. When using this
+       mode, even the smallest of errors will rollback the entire restore,
+       possibly discarding many hours of processing.  Depending on how
+       interrelated the data is, that might seem preferable to manual cleanup,
+       or not.  <code class="command">COPY</code> commands will run fastest if you use a single
+       transaction and have WAL archiving turned off.
+      </p></li><li class="listitem"><p>
+       If multiple CPUs are available in the database server, consider using
+       <span class="application">pg_restore</span>'s <code class="option">--jobs</code> option.  This
+       allows concurrent data loading and index creation.
+      </p></li><li class="listitem"><p>
+       Run <code class="command">ANALYZE</code> afterwards.
+      </p></li></ul></div><p>
+   </p><p>
+    A data-only dump will still use <code class="command">COPY</code>, but it does not
+    drop or recreate indexes, and it does not normally touch foreign
+    keys.
+
+     <a href="#ftn.id-1.5.13.7.11.4.2" class="footnote"><sup class="footnote" id="id-1.5.13.7.11.4.2">[14]</sup></a>
+
+    So when loading a data-only dump, it is up to you to drop and recreate
+    indexes and foreign keys if you wish to use those techniques.
+    It's still useful to increase <code class="varname">max_wal_size</code>
+    while loading the data, but don't bother increasing
+    <code class="varname">maintenance_work_mem</code>; rather, you'd do that while
+    manually recreating indexes and foreign keys afterwards.
+    And don't forget to <code class="command">ANALYZE</code> when you're done; see
+    <a class="xref" href="routine-vacuuming.html#VACUUM-FOR-STATISTICS" title="25.1.3. Updating Planner Statistics">Section 25.1.3</a>
+    and <a class="xref" href="routine-vacuuming.html#AUTOVACUUM" title="25.1.6. The Autovacuum Daemon">Section 25.1.6</a> for more information.
+   </p></div><div class="footnotes"><br /><hr style="width:100; text-align:left;margin-left: 0" /><div id="ftn.id-1.5.13.7.11.4.2" class="footnote"><p><a href="#id-1.5.13.7.11.4.2" class="para"><sup class="para">[14] </sup></a>
+       You can get the effect of disabling foreign keys by using
+       the <code class="option">--disable-triggers</code> option — but realize that
+       that eliminates, rather than just postpones, foreign key
+       validation, and so it is possible to insert bad data if you use it.
+      </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="non-durability.html" title="14.5. Non-Durable Settings">Next</a></td></tr><tr><td width="40%" align="left" valign="top">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 14.5. Non-Durable Settings</td></tr></table></div></body></html>
+\ No newline at end of file
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-13 13:44:03 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-13 13:44:03 +0000
commit	293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
tree	fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/populate.html
parent	Initial commit. (diff)
download	postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip