1 files changed, 295 insertions, 0 deletions
diff --git a/doc/src/sgml/html/wal-configuration.html b/doc/src/sgml/html/wal-configuration.html
new file mode 100644
index 0000000..45d4f2f
--- /dev/null
+++ b/doc/src/sgml/html/wal-configuration.html
@@ -0,0 +1,295 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>30.5. WAL Configuration</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="wal-async-commit.html" title="30.4. Asynchronous Commit" /><link rel="next" href="wal-internals.html" title="30.6. WAL Internals" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">30.5. <acronym class="acronym">WAL</acronym> Configuration</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="wal-async-commit.html" title="30.4. Asynchronous Commit">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Up</a></td><th width="60%" align="center">Chapter 30. Reliability and the Write-Ahead Log</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="wal-internals.html" title="30.6. WAL Internals">Next</a></td></tr></table><hr /></div><div class="sect1" id="WAL-CONFIGURATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">30.5. <acronym class="acronym">WAL</acronym> Configuration</h2></div></div></div><p>
+   There are several <acronym class="acronym">WAL</acronym>-related configuration parameters that
+   affect database performance. This section explains their use.
+   Consult <a class="xref" href="runtime-config.html" title="Chapter 20. Server Configuration">Chapter 20</a> for general information about
+   setting server configuration parameters.
+  </p><p>
+   <em class="firstterm">Checkpoints</em><a id="id-1.6.17.7.3.2" class="indexterm"></a>
+   are points in the sequence of transactions at which it is guaranteed
+   that the heap and index data files have been updated with all
+   information written before that checkpoint.  At checkpoint time, all
+   dirty data pages are flushed to disk and a special checkpoint record is
+   written to the log file.  (The change records were previously flushed
+   to the <acronym class="acronym">WAL</acronym> files.)
+   In the event of a crash, the crash recovery procedure looks at the latest
+   checkpoint record to determine the point in the log (known as the redo
+   record) from which it should start the REDO operation.  Any changes made to
+   data files before that point are guaranteed to be already on disk.
+   Hence, after a checkpoint, log segments preceding the one containing
+   the redo record are no longer needed and can be recycled or removed. (When
+   <acronym class="acronym">WAL</acronym> archiving is being done, the log segments must be
+   archived before being recycled or removed.)
+  </p><p>
+   The checkpoint requirement of flushing all dirty data pages to disk
+   can cause a significant I/O load.  For this reason, checkpoint
+   activity is throttled so that I/O begins at checkpoint start and completes
+   before the next checkpoint is due to start; this minimizes performance
+   degradation during checkpoints.
+  </p><p>
+   The server's checkpointer process automatically performs
+   a checkpoint every so often.  A checkpoint is begun every <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-TIMEOUT">checkpoint_timeout</a> seconds, or if
+   <a class="xref" href="runtime-config-wal.html#GUC-MAX-WAL-SIZE">max_wal_size</a> is about to be exceeded,
+   whichever comes first.
+   The default settings are 5 minutes and 1 GB, respectively.
+   If no WAL has been written since the previous checkpoint, new checkpoints
+   will be skipped even if <code class="varname">checkpoint_timeout</code> has passed.
+   (If WAL archiving is being used and you want to put a lower limit on how
+   often files are archived in order to bound potential data loss, you should
+   adjust the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT">archive_timeout</a> parameter rather than the
+   checkpoint parameters.)
+   It is also possible to force a checkpoint by using the SQL
+   command <code class="command">CHECKPOINT</code>.
+  </p><p>
+   Reducing <code class="varname">checkpoint_timeout</code> and/or
+   <code class="varname">max_wal_size</code> causes checkpoints to occur
+   more often. This allows faster after-crash recovery, since less work
+   will need to be redone. However, one must balance this against the
+   increased cost of flushing dirty data pages more often. If
+   <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set (as is the default), there is
+   another factor to consider. To ensure data page consistency,
+   the first modification of a data page after each checkpoint results in
+   logging the entire page content. In that case,
+   a smaller checkpoint interval increases the volume of output to the WAL log,
+   partially negating the goal of using a smaller interval,
+   and in any case causing more disk I/O.
+  </p><p>
+   Checkpoints are fairly expensive, first because they require writing
+   out all currently dirty buffers, and second because they result in
+   extra subsequent WAL traffic as discussed above.  It is therefore
+   wise to set the checkpointing parameters high enough so that checkpoints
+   don't happen too often.  As a simple sanity check on your checkpointing
+   parameters, you can set the <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-WARNING">checkpoint_warning</a>
+   parameter.  If checkpoints happen closer together than
+   <code class="varname">checkpoint_warning</code> seconds,
+   a message will be output to the server log recommending increasing
+   <code class="varname">max_wal_size</code>.  Occasional appearance of such
+   a message is not cause for alarm, but if it appears often then the
+   checkpoint control parameters should be increased. Bulk operations such
+   as large <code class="command">COPY</code> transfers might cause a number of such warnings
+   to appear if you have not set <code class="varname">max_wal_size</code> high
+   enough.
+  </p><p>
+   To avoid flooding the I/O system with a burst of page writes,
+   writing dirty buffers during a checkpoint is spread over a period of time.
+   That period is controlled by
+   <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET">checkpoint_completion_target</a>, which is
+   given as a fraction of the checkpoint interval (configured by using
+   <code class="varname">checkpoint_timeout</code>).
+   The I/O rate is adjusted so that the checkpoint finishes when the
+   given fraction of
+   <code class="varname">checkpoint_timeout</code> seconds have elapsed, or before
+   <code class="varname">max_wal_size</code> is exceeded, whichever is sooner.
+   With the default value of 0.9,
+   <span class="productname">PostgreSQL</span> can be expected to complete each checkpoint
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible so that the checkpoint
+   I/O load is consistent throughout the checkpoint interval.  The disadvantage of
+   this is that prolonging checkpoints affects recovery time, because more WAL
+   segments will need to be kept around for possible use in recovery.  A user
+   concerned about the amount of time required to recover might wish to reduce
+   <code class="varname">checkpoint_timeout</code> so that checkpoints occur more frequently
+   but still spread the I/O across the checkpoint interval.  Alternatively,
+   <code class="varname">checkpoint_completion_target</code> could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I/O
+   (after the checkpoint completed but before the next scheduled checkpoint) and
+   therefore is not recommended.
+   Although <code class="varname">checkpoint_completion_target</code> could be set as high as
+   1.0, it is typically recommended to set it to no higher than 0.9 (the default)
+   since checkpoints include some other activities besides writing dirty buffers.
+   A setting of 1.0 is quite likely to result in checkpoints not being
+   completed on time, which would result in performance loss due to
+   unexpected variation in the number of WAL segments needed.
+  </p><p>
+   On Linux and POSIX platforms <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-FLUSH-AFTER">checkpoint_flush_after</a>
+   allows to force the OS that pages written by the checkpoint should be
+   flushed to disk after a configurable number of bytes.  Otherwise, these
+   pages may be kept in the OS's page cache, inducing a stall when
+   <code class="literal">fsync</code> is issued at the end of a checkpoint.  This setting will
+   often help to reduce transaction latency, but it also can have an adverse
+   effect on performance; particularly for workloads that are bigger than
+   <a class="xref" href="runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</a>, but smaller than the OS's page cache.
+  </p><p>
+   The number of WAL segment files in <code class="filename">pg_wal</code> directory depends on
+   <code class="varname">min_wal_size</code>, <code class="varname">max_wal_size</code> and
+   the amount of WAL generated in previous checkpoint cycles. When old log
+   segment files are no longer needed, they are removed or recycled (that is,
+   renamed to become future segments in the numbered sequence). If, due to a
+   short-term peak of log output rate, <code class="varname">max_wal_size</code> is
+   exceeded, the unneeded segment files will be removed until the system
+   gets back under this limit. Below that limit, the system recycles enough
+   WAL files to cover the estimated need until the next checkpoint, and
+   removes the rest. The estimate is based on a moving average of the number
+   of WAL files used in previous checkpoint cycles. The moving average
+   is increased immediately if the actual usage exceeds the estimate, so it
+   accommodates peak usage rather than average usage to some extent.
+   <code class="varname">min_wal_size</code> puts a minimum on the amount of WAL files
+   recycled for future usage; that much WAL is always recycled for future use,
+   even if the system is idle and the WAL usage estimate suggests that little
+   WAL is needed.
+  </p><p>
+   Independently of <code class="varname">max_wal_size</code>,
+   the most recent <a class="xref" href="runtime-config-replication.html#GUC-WAL-KEEP-SIZE">wal_keep_size</a> megabytes of
+   WAL files plus one additional WAL file are
+   kept at all times. Also, if WAL archiving is used, old segments cannot be
+   removed or recycled until they are archived. If WAL archiving cannot keep up
+   with the pace that WAL is generated, or if <code class="varname">archive_command</code>
+   or <code class="varname">archive_library</code>
+   fails repeatedly, old WAL files will accumulate in <code class="filename">pg_wal</code>
+   until the situation is resolved. A slow or failed standby server that
+   uses a replication slot will have the same effect (see
+   <a class="xref" href="warm-standby.html#STREAMING-REPLICATION-SLOTS" title="27.2.6. Replication Slots">Section 27.2.6</a>).
+  </p><p>
+   In archive recovery or standby mode, the server periodically performs
+   <em class="firstterm">restartpoints</em>,<a id="id-1.6.17.7.12.2" class="indexterm"></a>
+   which are similar to checkpoints in normal operation: the server forces
+   all its state to disk, updates the <code class="filename">pg_control</code> file to
+   indicate that the already-processed WAL data need not be scanned again,
+   and then recycles any old log segment files in the <code class="filename">pg_wal</code>
+   directory.
+   Restartpoints can't be performed more frequently than checkpoints on the
+   primary because restartpoints can only be performed at checkpoint records.
+   A restartpoint is triggered when a checkpoint record is reached if at
+   least <code class="varname">checkpoint_timeout</code> seconds have passed since the last
+   restartpoint, or if WAL size is about to exceed
+   <code class="varname">max_wal_size</code>. However, because of limitations on when a
+   restartpoint can be performed, <code class="varname">max_wal_size</code> is often exceeded
+   during recovery, by up to one checkpoint cycle's worth of WAL.
+   (<code class="varname">max_wal_size</code> is never a hard limit anyway, so you should
+   always leave plenty of headroom to avoid running out of disk space.)
+  </p><p>
+   There are two commonly used internal <acronym class="acronym">WAL</acronym> functions:
+   <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>.
+   <code class="function">XLogInsertRecord</code> is used to place a new record into
+   the <acronym class="acronym">WAL</acronym> buffers in shared memory. If there is no
+   space for the new record, <code class="function">XLogInsertRecord</code> will have
+   to write (move to kernel cache) a few filled <acronym class="acronym">WAL</acronym>
+   buffers. This is undesirable because <code class="function">XLogInsertRecord</code>
+   is used on every database low level modification (for example, row
+   insertion) at a time when an exclusive lock is held on affected
+   data pages, so the operation needs to be as fast as possible.  What
+   is worse, writing <acronym class="acronym">WAL</acronym> buffers might also force the
+   creation of a new log segment, which takes even more
+   time. Normally, <acronym class="acronym">WAL</acronym> buffers should be written
+   and flushed by an <code class="function">XLogFlush</code> request, which is
+   made, for the most part, at transaction commit time to ensure that
+   transaction records are flushed to permanent storage. On systems
+   with high log output, <code class="function">XLogFlush</code> requests might
+   not occur often enough to prevent <code class="function">XLogInsertRecord</code>
+   from having to do writes.  On such systems
+   one should increase the number of <acronym class="acronym">WAL</acronym> buffers by
+   modifying the <a class="xref" href="runtime-config-wal.html#GUC-WAL-BUFFERS">wal_buffers</a> parameter.  When
+   <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set and the system is very busy,
+   setting <code class="varname">wal_buffers</code> higher will help smooth response times
+   during the period immediately following each checkpoint.
+  </p><p>
+   The <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-DELAY">commit_delay</a> parameter defines for how many
+   microseconds a group commit leader process will sleep after acquiring a
+   lock within <code class="function">XLogFlush</code>, while group commit
+   followers queue up behind the leader.  This delay allows other server
+   processes to add their commit records to the WAL buffers so that all of
+   them will be flushed by the leader's eventual sync operation.  No sleep
+   will occur if <a class="xref" href="runtime-config-wal.html#GUC-FSYNC">fsync</a> is not enabled, or if fewer
+   than <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-SIBLINGS">commit_siblings</a> other sessions are currently
+   in active transactions; this avoids sleeping when it's unlikely that
+   any other session will commit soon.  Note that on some platforms, the
+   resolution of a sleep request is ten milliseconds, so that any nonzero
+   <code class="varname">commit_delay</code> setting between 1 and 10000
+   microseconds would have the same effect.  Note also that on some
+   platforms, sleep operations may take slightly longer than requested by
+   the parameter.
+  </p><p>
+   Since the purpose of <code class="varname">commit_delay</code> is to allow the
+   cost of each flush operation to be amortized across concurrently
+   committing transactions (potentially at the expense of transaction
+   latency), it is necessary to quantify that cost before the setting can
+   be chosen intelligently.  The higher that cost is, the more effective
+   <code class="varname">commit_delay</code> is expected to be in increasing
+   transaction throughput, up to a point.  The <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program can be used to measure the average time
+   in microseconds that a single WAL flush operation takes.  A value of
+   half of the average time the program reports it takes to flush after a
+   single 8kB write operation is often the most effective setting for
+   <code class="varname">commit_delay</code>, so this value is recommended as the
+   starting point to use when optimizing for a particular workload.  While
+   tuning <code class="varname">commit_delay</code> is particularly useful when the
+   WAL log is stored on high-latency rotating disks, benefits can be
+   significant even on storage media with very fast sync times, such as
+   solid-state drives or RAID arrays with a battery-backed write cache;
+   but this should definitely be tested against a representative workload.
+   Higher values of <code class="varname">commit_siblings</code> should be used in
+   such cases, whereas smaller <code class="varname">commit_siblings</code> values
+   are often helpful on higher latency media.  Note that it is quite
+   possible that a setting of <code class="varname">commit_delay</code> that is too
+   high can increase transaction latency by so much that total transaction
+   throughput suffers.
+  </p><p>
+   When <code class="varname">commit_delay</code> is set to zero (the default), it
+   is still possible for a form of group commit to occur, but each group
+   will consist only of sessions that reach the point where they need to
+   flush their commit records during the window in which the previous
+   flush operation (if any) is occurring.  At higher client counts a
+   <span class="quote">“<span class="quote">gangway effect</span>”</span> tends to occur, so that the effects of group
+   commit become significant even when <code class="varname">commit_delay</code> is
+   zero, and thus explicitly setting <code class="varname">commit_delay</code> tends
+   to help less.  Setting <code class="varname">commit_delay</code> can only help
+   when (1) there are some concurrently committing transactions, and (2)
+   throughput is limited to some degree by commit rate; but with high
+   rotational latency this setting can be effective in increasing
+   transaction throughput with as few as two clients (that is, a single
+   committing client with one sibling transaction).
+  </p><p>
+   The <a class="xref" href="runtime-config-wal.html#GUC-WAL-SYNC-METHOD">wal_sync_method</a> parameter determines how
+   <span class="productname">PostgreSQL</span> will ask the kernel to force
+   <acronym class="acronym">WAL</acronym> updates out to disk.
+   All the options should be the same in terms of reliability, with
+   the exception of <code class="literal">fsync_writethrough</code>, which can sometimes
+   force a flush of the disk cache even when other options do not do so.
+   However, it's quite platform-specific which one will be the fastest.
+   You can test the speeds of different options using the <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program.
+   Note that this parameter is irrelevant if <code class="varname">fsync</code>
+   has been turned off.
+  </p><p>
+   Enabling the <a class="xref" href="runtime-config-developer.html#GUC-WAL-DEBUG">wal_debug</a> configuration parameter
+   (provided that <span class="productname">PostgreSQL</span> has been
+   compiled with support for it) will result in each
+   <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>
+   <acronym class="acronym">WAL</acronym> call being logged to the server log. This
+   option might be replaced by a more general mechanism in the future.
+  </p><p>
+   There are two internal functions to write WAL data to disk:
+   <code class="function">XLogWrite</code> and <code class="function">issue_xlog_fsync</code>.
+   When <a class="xref" href="runtime-config-statistics.html#GUC-TRACK-WAL-IO-TIMING">track_wal_io_timing</a> is enabled, the total
+   amounts of time <code class="function">XLogWrite</code> writes and
+   <code class="function">issue_xlog_fsync</code> syncs WAL data to disk are counted as
+   <code class="literal">wal_write_time</code> and <code class="literal">wal_sync_time</code> in
+   <a class="xref" href="monitoring-stats.html#PG-STAT-WAL-VIEW" title="Table 28.24. pg_stat_wal View">pg_stat_wal</a>, respectively.
+   <code class="function">XLogWrite</code> is normally called by
+   <code class="function">XLogInsertRecord</code> (when there is no space for the new
+   record in WAL buffers), <code class="function">XLogFlush</code> and the WAL writer,
+   to write WAL buffers to disk and call <code class="function">issue_xlog_fsync</code>.
+   <code class="function">issue_xlog_fsync</code> is normally called by
+   <code class="function">XLogWrite</code> to sync WAL files to disk.
+   If <code class="varname">wal_sync_method</code> is either
+   <code class="literal">open_datasync</code> or <code class="literal">open_sync</code>,
+   a write operation in <code class="function">XLogWrite</code> guarantees to sync written
+   WAL data to disk and <code class="function">issue_xlog_fsync</code> does nothing.
+   If <code class="varname">wal_sync_method</code> is either <code class="literal">fdatasync</code>,
+   <code class="literal">fsync</code>, or <code class="literal">fsync_writethrough</code>,
+   the write operation moves WAL buffers to kernel cache and
+   <code class="function">issue_xlog_fsync</code> syncs them to disk. Regardless
+   of the setting of <code class="varname">track_wal_io_timing</code>, the number
+   of times <code class="function">XLogWrite</code> writes and
+   <code class="function">issue_xlog_fsync</code> syncs WAL data to disk are also
+   counted as <code class="literal">wal_write</code> and <code class="literal">wal_sync</code>
+   in <code class="structname">pg_stat_wal</code>, respectively.
+  </p><p>
+   The <a class="xref" href="runtime-config-wal.html#GUC-RECOVERY-PREFETCH">recovery_prefetch</a> parameter can be used to reduce
+   I/O wait times during recovery by instructing the kernel to initiate reads
+   of disk blocks that will soon be needed but are not currently in
+   <span class="productname">PostgreSQL</span>'s buffer pool.
+   The <a class="xref" href="runtime-config-resource.html#GUC-MAINTENANCE-IO-CONCURRENCY">maintenance_io_concurrency</a> and
+   <a class="xref" href="runtime-config-wal.html#GUC-WAL-DECODE-BUFFER-SIZE">wal_decode_buffer_size</a> settings limit prefetching
+   concurrency and distance, respectively.  By default, it is set to
+   <code class="literal">try</code>, which enables the feature on systems where
+   <code class="function">posix_fadvise</code> is available.
+  </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="wal-async-commit.html" title="30.4. Asynchronous Commit">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="wal-internals.html" title="30.6. WAL Internals">Next</a></td></tr><tr><td width="40%" align="left" valign="top">30.4. Asynchronous Commit </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 30.6. WAL Internals</td></tr></table></div></body></html>
+\ No newline at end of file