diff options
Diffstat (limited to '')
-rw-r--r-- | doc/src/sgml/html/continuous-archiving.html | 775 |
1 files changed, 775 insertions, 0 deletions
diff --git a/doc/src/sgml/html/continuous-archiving.html b/doc/src/sgml/html/continuous-archiving.html new file mode 100644 index 0000000..e5a6b24 --- /dev/null +++ b/doc/src/sgml/html/continuous-archiving.html @@ -0,0 +1,775 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>26.3. Continuous Archiving and Point-in-Time Recovery (PITR)</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="backup-file.html" title="26.2. File System Level Backup" /><link rel="next" href="high-availability.html" title="Chapter 27. High Availability, Load Balancing, and Replication" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">26.3. Continuous Archiving and Point-in-Time Recovery (PITR)</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="backup-file.html" title="26.2. File System Level Backup">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="backup.html" title="Chapter 26. Backup and Restore">Up</a></td><th width="60%" align="center">Chapter 26. Backup and Restore</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="high-availability.html" title="Chapter 27. High Availability, Load Balancing, and Replication">Next</a></td></tr></table><hr /></div><div class="sect1" id="CONTINUOUS-ARCHIVING"><div class="titlepage"><div><div><h2 class="title" style="clear: both">26.3. Continuous Archiving and Point-in-Time Recovery (PITR) <a href="#CONTINUOUS-ARCHIVING" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-ARCHIVING-WAL">26.3.1. Setting Up WAL Archiving</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-BASE-BACKUP">26.3.2. Making a Base Backup</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP">26.3.3. Making a Base Backup Using the Low Level API</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-PITR-RECOVERY">26.3.4. Recovering Using a Continuous Archive Backup</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-TIMELINES">26.3.5. Timelines</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#BACKUP-TIPS">26.3.6. Tips and Examples</a></span></dt><dt><span class="sect2"><a href="continuous-archiving.html#CONTINUOUS-ARCHIVING-CAVEATS">26.3.7. Caveats</a></span></dt></dl></div><a id="id-1.6.13.7.2" class="indexterm"></a><a id="id-1.6.13.7.3" class="indexterm"></a><a id="id-1.6.13.7.4" class="indexterm"></a><p> + At all times, <span class="productname">PostgreSQL</span> maintains a + <em class="firstterm">write ahead log</em> (WAL) in the <code class="filename">pg_wal/</code> + subdirectory of the cluster's data directory. The log records + every change made to the database's data files. This log exists + primarily for crash-safety purposes: if the system crashes, the + database can be restored to consistency by <span class="quote">“<span class="quote">replaying</span>”</span> the + log entries made since the last checkpoint. However, the existence + of the log makes it possible to use a third strategy for backing up + databases: we can combine a file-system-level backup with backup of + the WAL files. If recovery is needed, we restore the file system backup and + then replay from the backed-up WAL files to bring the system to a + current state. This approach is more complex to administer than + either of the previous approaches, but it has some significant + benefits: + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + We do not need a perfectly consistent file system backup as the starting point. + Any internal inconsistency in the backup will be corrected by log + replay (this is not significantly different from what happens during + crash recovery). So we do not need a file system snapshot capability, + just <span class="application">tar</span> or a similar archiving tool. + </p></li><li class="listitem"><p> + Since we can combine an indefinitely long sequence of WAL files + for replay, continuous backup can be achieved simply by continuing to archive + the WAL files. This is particularly valuable for large databases, where + it might not be convenient to take a full backup frequently. + </p></li><li class="listitem"><p> + It is not necessary to replay the WAL entries all the + way to the end. We could stop the replay at any point and have a + consistent snapshot of the database as it was at that time. Thus, + this technique supports <em class="firstterm">point-in-time recovery</em>: it is + possible to restore the database to its state at any time since your base + backup was taken. + </p></li><li class="listitem"><p> + If we continuously feed the series of WAL files to another + machine that has been loaded with the same base backup file, we + have a <em class="firstterm">warm standby</em> system: at any point we can bring up + the second machine and it will have a nearly-current copy of the + database. + </p></li></ul></div><p> + </p><div class="note"><h3 class="title">Note</h3><p> + <span class="application">pg_dump</span> and + <span class="application">pg_dumpall</span> do not produce file-system-level + backups and cannot be used as part of a continuous-archiving solution. + Such dumps are <span class="emphasis"><em>logical</em></span> and do not contain enough + information to be used by WAL replay. + </p></div><p> + As with the plain file-system-backup technique, this method can only + support restoration of an entire database cluster, not a subset. + Also, it requires a lot of archival storage: the base backup might be bulky, + and a busy system will generate many megabytes of WAL traffic that + have to be archived. Still, it is the preferred backup technique in + many situations where high reliability is needed. + </p><p> + To recover successfully using continuous archiving (also called + <span class="quote">“<span class="quote">online backup</span>”</span> by many database vendors), you need a continuous + sequence of archived WAL files that extends back at least as far as the + start time of your backup. So to get started, you should set up and test + your procedure for archiving WAL files <span class="emphasis"><em>before</em></span> you take your + first base backup. Accordingly, we first discuss the mechanics of + archiving WAL files. + </p><div class="sect2" id="BACKUP-ARCHIVING-WAL"><div class="titlepage"><div><div><h3 class="title">26.3.1. Setting Up WAL Archiving <a href="#BACKUP-ARCHIVING-WAL" class="id_link">#</a></h3></div></div></div><p> + In an abstract sense, a running <span class="productname">PostgreSQL</span> system + produces an indefinitely long sequence of WAL records. The system + physically divides this sequence into WAL <em class="firstterm">segment + files</em>, which are normally 16MB apiece (although the segment size + can be altered during <span class="application">initdb</span>). The segment + files are given numeric names that reflect their position in the + abstract WAL sequence. When not using WAL archiving, the system + normally creates just a few segment files and then + <span class="quote">“<span class="quote">recycles</span>”</span> them by renaming no-longer-needed segment files + to higher segment numbers. It's assumed that segment files whose + contents precede the last checkpoint are no longer of + interest and can be recycled. + </p><p> + When archiving WAL data, we need to capture the contents of each segment + file once it is filled, and save that data somewhere before the segment + file is recycled for reuse. Depending on the application and the + available hardware, there could be many different ways of <span class="quote">“<span class="quote">saving + the data somewhere</span>”</span>: we could copy the segment files to an NFS-mounted + directory on another machine, write them onto a tape drive (ensuring that + you have a way of identifying the original name of each file), or batch + them together and burn them onto CDs, or something else entirely. To + provide the database administrator with flexibility, + <span class="productname">PostgreSQL</span> tries not to make any assumptions about how + the archiving will be done. Instead, <span class="productname">PostgreSQL</span> lets + the administrator specify a shell command or an archive library to be executed to copy a + completed segment file to wherever it needs to go. This could be as simple + as a shell command that uses <code class="literal">cp</code>, or it could invoke a + complex C function — it's all up to you. + </p><p> + To enable WAL archiving, set the <a class="xref" href="runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> + configuration parameter to <code class="literal">replica</code> or higher, + <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-MODE">archive_mode</a> to <code class="literal">on</code>, + specify the shell command to use in the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-COMMAND">archive_command</a> configuration parameter + or specify the library to use in the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-LIBRARY">archive_library</a> configuration parameter. In practice + these settings will always be placed in the + <code class="filename">postgresql.conf</code> file. + </p><p> + In <code class="varname">archive_command</code>, + <code class="literal">%p</code> is replaced by the path name of the file to + archive, while <code class="literal">%f</code> is replaced by only the file name. + (The path name is relative to the current working directory, + i.e., the cluster's data directory.) + Use <code class="literal">%%</code> if you need to embed an actual <code class="literal">%</code> + character in the command. The simplest useful command is something + like: +</p><pre class="programlisting"> +archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' # Unix +archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows +</pre><p> + which will copy archivable WAL segments to the directory + <code class="filename">/mnt/server/archivedir</code>. (This is an example, not a + recommendation, and might not work on all platforms.) After the + <code class="literal">%p</code> and <code class="literal">%f</code> parameters have been replaced, + the actual command executed might look like this: +</p><pre class="programlisting"> +test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/00000001000000A900000065 /mnt/server/archivedir/00000001000000A900000065 +</pre><p> + A similar command will be generated for each new file to be archived. + </p><p> + The archive command will be executed under the ownership of the same + user that the <span class="productname">PostgreSQL</span> server is running as. Since + the series of WAL files being archived contains effectively everything + in your database, you will want to be sure that the archived data is + protected from prying eyes; for example, archive into a directory that + does not have group or world read access. + </p><p> + It is important that the archive command return zero exit status if and + only if it succeeds. Upon getting a zero result, + <span class="productname">PostgreSQL</span> will assume that the file has been + successfully archived, and will remove or recycle it. However, a nonzero + status tells <span class="productname">PostgreSQL</span> that the file was not archived; + it will try again periodically until it succeeds. + </p><p> + Another way to archive is to use a custom archive module as the + <code class="varname">archive_library</code>. Since such modules are written in + <code class="literal">C</code>, creating your own may require considerably more effort + than writing a shell command. However, archive modules can be more + performant than archiving via shell, and they will have access to many + useful server resources. For more information about archive modules, see + <a class="xref" href="archive-modules.html" title="Chapter 51. Archive Modules">Chapter 51</a>. + </p><p> + When the archive command is terminated by a signal (other than + <span class="systemitem">SIGTERM</span> that is used as part of a server + shutdown) or an error by the shell with an exit status greater than + 125 (such as command not found), or if the archive function emits an + <code class="literal">ERROR</code> or <code class="literal">FATAL</code>, the archiver process + aborts and gets restarted by the postmaster. In such cases, the failure is + not reported in <a class="xref" href="monitoring-stats.html#PG-STAT-ARCHIVER-VIEW" title="Table 28.22. pg_stat_archiver View">pg_stat_archiver</a>. + </p><p> + Archive commands and libraries should generally be designed to refuse to overwrite + any pre-existing archive file. This is an important safety feature to + preserve the integrity of your archive in case of administrator error + (such as sending the output of two different servers to the same archive + directory). It is advisable to test your proposed archive library to ensure + that it does not overwrite an existing file. + </p><p> + In rare cases, <span class="productname">PostgreSQL</span> may attempt to + re-archive a WAL file that was previously archived. For example, if the + system crashes before the server makes a durable record of archival + success, the server will attempt to archive the file again after + restarting (provided archiving is still enabled). When an archive command or library + encounters a pre-existing file, it should return a zero status or <code class="literal">true</code>, respectively, + if the WAL file has identical contents to the pre-existing archive and the + pre-existing archive is fully persisted to storage. If a pre-existing + file contains different contents than the WAL file being archived, the + archive command or library <span class="emphasis"><em>must</em></span> return a nonzero status or + <code class="literal">false</code>, respectively. + </p><p> + The example command above for Unix avoids overwriting a pre-existing archive + by including a separate + <code class="command">test</code> step. On some Unix platforms, <code class="command">cp</code> has + switches such as <code class="option">-i</code> that can be used to do the same thing + less verbosely, but you should not rely on these without verifying that + the right exit status is returned. (In particular, GNU <code class="command">cp</code> + will return status zero when <code class="option">-i</code> is used and the target file + already exists, which is <span class="emphasis"><em>not</em></span> the desired behavior.) + </p><p> + While designing your archiving setup, consider what will happen if + the archive command or library fails repeatedly because some aspect requires + operator intervention or the archive runs out of space. For example, this + could occur if you write to tape without an autochanger; when the tape + fills, nothing further can be archived until the tape is swapped. + You should ensure that any error condition or request to a human operator + is reported appropriately so that the situation can be + resolved reasonably quickly. The <code class="filename">pg_wal/</code> directory will + continue to fill with WAL segment files until the situation is resolved. + (If the file system containing <code class="filename">pg_wal/</code> fills up, + <span class="productname">PostgreSQL</span> will do a PANIC shutdown. No committed + transactions will be lost, but the database will remain offline until + you free some space.) + </p><p> + The speed of the archive command or library is unimportant as long as it can keep up + with the average rate at which your server generates WAL data. Normal + operation continues even if the archiving process falls a little behind. + If archiving falls significantly behind, this will increase the amount of + data that would be lost in the event of a disaster. It will also mean that + the <code class="filename">pg_wal/</code> directory will contain large numbers of + not-yet-archived segment files, which could eventually exceed available + disk space. You are advised to monitor the archiving process to ensure that + it is working as you intend. + </p><p> + In writing your archive command or library, you should assume that the file names to + be archived can be up to 64 characters long and can contain any + combination of ASCII letters, digits, and dots. It is not necessary to + preserve the original relative path (<code class="literal">%p</code>) but it is necessary to + preserve the file name (<code class="literal">%f</code>). + </p><p> + Note that although WAL archiving will allow you to restore any + modifications made to the data in your <span class="productname">PostgreSQL</span> database, + it will not restore changes made to configuration files (that is, + <code class="filename">postgresql.conf</code>, <code class="filename">pg_hba.conf</code> and + <code class="filename">pg_ident.conf</code>), since those are edited manually rather + than through SQL operations. + You might wish to keep the configuration files in a location that will + be backed up by your regular file system backup procedures. See + <a class="xref" href="runtime-config-file-locations.html" title="20.2. File Locations">Section 20.2</a> for how to relocate the + configuration files. + </p><p> + The archive command or function is only invoked on completed WAL segments. Hence, + if your server generates only little WAL traffic (or has slack periods + where it does so), there could be a long delay between the completion + of a transaction and its safe recording in archive storage. To put + a limit on how old unarchived data can be, you can set + <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT">archive_timeout</a> to force the server to switch + to a new WAL segment file at least that often. Note that archived + files that are archived early due to a forced switch are still the same + length as completely full files. It is therefore unwise to set a very + short <code class="varname">archive_timeout</code> — it will bloat your archive + storage. <code class="varname">archive_timeout</code> settings of a minute or so are + usually reasonable. + </p><p> + Also, you can force a segment switch manually with + <code class="function">pg_switch_wal</code> if you want to ensure that a + just-finished transaction is archived as soon as possible. Other utility + functions related to WAL management are listed in <a class="xref" href="functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE" title="Table 9.91. Backup Control Functions">Table 9.91</a>. + </p><p> + When <code class="varname">wal_level</code> is <code class="literal">minimal</code> some SQL commands + are optimized to avoid WAL logging, as described in <a class="xref" href="populate.html#POPULATE-PITR" title="14.4.7. Disable WAL Archival and Streaming Replication">Section 14.4.7</a>. If archiving or streaming replication were + turned on during execution of one of these statements, WAL would not + contain enough information for archive recovery. (Crash recovery is + unaffected.) For this reason, <code class="varname">wal_level</code> can only be changed at + server start. However, <code class="varname">archive_command</code> and <code class="varname">archive_library</code> can be changed with a + configuration file reload. If you are archiving via shell and wish to + temporarily stop archiving, + one way to do it is to set <code class="varname">archive_command</code> to the empty + string (<code class="literal">''</code>). + This will cause WAL files to accumulate in <code class="filename">pg_wal/</code> until a + working <code class="varname">archive_command</code> is re-established. + </p></div><div class="sect2" id="BACKUP-BASE-BACKUP"><div class="titlepage"><div><div><h3 class="title">26.3.2. Making a Base Backup <a href="#BACKUP-BASE-BACKUP" class="id_link">#</a></h3></div></div></div><p> + The easiest way to perform a base backup is to use the + <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a> tool. It can create + a base backup either as regular files or as a tar archive. If more + flexibility than <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a> can provide is + required, you can also make a base backup using the low level API + (see <a class="xref" href="continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP" title="26.3.3. Making a Base Backup Using the Low Level API">Section 26.3.3</a>). + </p><p> + It is not necessary to be concerned about the amount of time it takes + to make a base backup. However, if you normally run the + server with <code class="varname">full_page_writes</code> disabled, you might notice a drop + in performance while the backup runs since <code class="varname">full_page_writes</code> is + effectively forced on during backup mode. + </p><p> + To make use of the backup, you will need to keep all the WAL + segment files generated during and after the file system backup. + To aid you in doing this, the base backup process + creates a <em class="firstterm">backup history file</em> that is immediately + stored into the WAL archive area. This file is named after the first + WAL segment file that you need for the file system backup. + For example, if the starting WAL file is + <code class="literal">0000000100001234000055CD</code> the backup history file will be + named something like + <code class="literal">0000000100001234000055CD.007C9330.backup</code>. (The second + part of the file name stands for an exact position within the WAL + file, and can ordinarily be ignored.) Once you have safely archived + the file system backup and the WAL segment files used during the + backup (as specified in the backup history file), all archived WAL + segments with names numerically less are no longer needed to recover + the file system backup and can be deleted. However, you should + consider keeping several backup sets to be absolutely certain that + you can recover your data. + </p><p> + The backup history file is just a small text file. It contains the + label string you gave to <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a>, as well as + the starting and ending times and WAL segments of the backup. + If you used the label to identify the associated dump file, + then the archived history file is enough to tell you which dump file to + restore. + </p><p> + Since you have to keep around all the archived WAL files back to your + last base backup, the interval between base backups should usually be + chosen based on how much storage you want to expend on archived WAL + files. You should also consider how long you are prepared to spend + recovering, if recovery should be necessary — the system will have to + replay all those WAL segments, and that could take awhile if it has + been a long time since the last base backup. + </p></div><div class="sect2" id="BACKUP-LOWLEVEL-BASE-BACKUP"><div class="titlepage"><div><div><h3 class="title">26.3.3. Making a Base Backup Using the Low Level API <a href="#BACKUP-LOWLEVEL-BASE-BACKUP" class="id_link">#</a></h3></div></div></div><p> + The procedure for making a base backup using the low level + APIs contains a few more steps than + the <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a> method, but is relatively + simple. It is very important that these steps are executed in + sequence, and that the success of a step is verified before + proceeding to the next step. + </p><p> + Multiple backups are able to be run concurrently (both those + started using this backup API and those started using + <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a>). + </p><p> + </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> + Ensure that WAL archiving is enabled and working. + </p></li><li class="listitem"><p> + Connect to the server (it does not matter which database) as a user with + rights to run <code class="function">pg_backup_start</code> (superuser, + or a user who has been granted <code class="literal">EXECUTE</code> on the + function) and issue the command: +</p><pre class="programlisting"> +SELECT pg_backup_start(label => 'label', fast => false); +</pre><p> + where <code class="literal">label</code> is any string you want to use to uniquely + identify this backup operation. The connection + calling <code class="function">pg_backup_start</code> must be maintained until the end of + the backup, or the backup will be automatically aborted. + </p><p> + Online backups are always started at the beginning of a checkpoint. + By default, <code class="function">pg_backup_start</code> will wait for the next + regularly scheduled checkpoint to complete, which may take a long time (see the + configuration parameters <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-TIMEOUT">checkpoint_timeout</a> and + <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET">checkpoint_completion_target</a>). This is + usually preferable as it minimizes the impact on the running system. If you + want to start the backup as soon as possible, pass <code class="literal">true</code> as + the second parameter to <code class="function">pg_backup_start</code> and it will + request an immediate checkpoint, which will finish as fast as possible using + as much I/O as possible. + </p></li><li class="listitem"><p> + Perform the backup, using any convenient file-system-backup tool + such as <span class="application">tar</span> or <span class="application">cpio</span> (not + <span class="application">pg_dump</span> or + <span class="application">pg_dumpall</span>). It is neither + necessary nor desirable to stop normal operation of the database + while you do this. See + <a class="xref" href="continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA" title="26.3.3.1. Backing Up the Data Directory">Section 26.3.3.1</a> for things to + consider during this backup. + </p></li><li class="listitem"><p> + In the same connection as before, issue the command: +</p><pre class="programlisting"> +SELECT * FROM pg_backup_stop(wait_for_archive => true); +</pre><p> + This terminates backup mode. On a primary, it also performs an automatic + switch to the next WAL segment. On a standby, it is not possible to + automatically switch WAL segments, so you may wish to run + <code class="function">pg_switch_wal</code> on the primary to perform a manual + switch. The reason for the switch is to arrange for + the last WAL segment file written during the backup interval to be + ready to archive. + </p><p> + <code class="function">pg_backup_stop</code> will return one row with three + values. The second of these fields should be written to a file named + <code class="filename">backup_label</code> in the root directory of the backup. The + third field should be written to a file named + <code class="filename">tablespace_map</code> unless the field is empty. These files are + vital to the backup working and must be written byte for byte without + modification, which may require opening the file in binary mode. + </p></li><li class="listitem"><p> + Once the WAL segment files active during the backup are archived, you are + done. The file identified by <code class="function">pg_backup_stop</code>'s first return + value is the last segment that is required to form a complete set of + backup files. On a primary, if <code class="varname">archive_mode</code> is enabled and the + <code class="literal">wait_for_archive</code> parameter is <code class="literal">true</code>, + <code class="function">pg_backup_stop</code> does not return until the last segment has + been archived. + On a standby, <code class="varname">archive_mode</code> must be <code class="literal">always</code> in order + for <code class="function">pg_backup_stop</code> to wait. + Archiving of these files happens automatically since you have + already configured <code class="varname">archive_command</code> or <code class="varname">archive_library</code>. + In most cases this happens quickly, but you are advised to monitor your + archive system to ensure there are no delays. + If the archive process has fallen behind because of failures of the + archive command or library, it will keep retrying + until the archive succeeds and the backup is complete. + If you wish to place a time limit on the execution of + <code class="function">pg_backup_stop</code>, set an appropriate + <code class="varname">statement_timeout</code> value, but make note that if + <code class="function">pg_backup_stop</code> terminates because of this your backup + may not be valid. + </p><p> + If the backup process monitors and ensures that all WAL segment files + required for the backup are successfully archived then the + <code class="literal">wait_for_archive</code> parameter (which defaults to true) can be set + to false to have + <code class="function">pg_backup_stop</code> return as soon as the stop backup record is + written to the WAL. By default, <code class="function">pg_backup_stop</code> will wait + until all WAL has been archived, which can take some time. This option + must be used with caution: if WAL archiving is not monitored correctly + then the backup might not include all of the WAL files and will + therefore be incomplete and not able to be restored. + </p></li></ol></div><p> + </p><div class="sect3" id="BACKUP-LOWLEVEL-BASE-BACKUP-DATA"><div class="titlepage"><div><div><h4 class="title">26.3.3.1. Backing Up the Data Directory <a href="#BACKUP-LOWLEVEL-BASE-BACKUP-DATA" class="id_link">#</a></h4></div></div></div><p> + Some file system backup tools emit warnings or errors + if the files they are trying to copy change while the copy proceeds. + When taking a base backup of an active database, this situation is normal + and not an error. However, you need to ensure that you can distinguish + complaints of this sort from real errors. For example, some versions + of <span class="application">rsync</span> return a separate exit code for + <span class="quote">“<span class="quote">vanished source files</span>”</span>, and you can write a driver script to + accept this exit code as a non-error case. Also, some versions of + GNU <span class="application">tar</span> return an error code indistinguishable from + a fatal error if a file was truncated while <span class="application">tar</span> was + copying it. Fortunately, GNU <span class="application">tar</span> versions 1.16 and + later exit with 1 if a file was changed during the backup, + and 2 for other errors. With GNU <span class="application">tar</span> version 1.23 and + later, you can use the warning options <code class="literal">--warning=no-file-changed + --warning=no-file-removed</code> to hide the related warning messages. + </p><p> + Be certain that your backup includes all of the files under + the database cluster directory (e.g., <code class="filename">/usr/local/pgsql/data</code>). + If you are using tablespaces that do not reside underneath this directory, + be careful to include them as well (and be sure that your backup + archives symbolic links as links, otherwise the restore will corrupt + your tablespaces). + </p><p> + You should, however, omit from the backup the files within the + cluster's <code class="filename">pg_wal/</code> subdirectory. This + slight adjustment is worthwhile because it reduces the risk + of mistakes when restoring. This is easy to arrange if + <code class="filename">pg_wal/</code> is a symbolic link pointing to someplace outside + the cluster directory, which is a common setup anyway for performance + reasons. You might also want to exclude <code class="filename">postmaster.pid</code> + and <code class="filename">postmaster.opts</code>, which record information + about the running <span class="application">postmaster</span>, not about the + <span class="application">postmaster</span> which will eventually use this backup. + (These files can confuse <span class="application">pg_ctl</span>.) + </p><p> + It is often a good idea to also omit from the backup the files + within the cluster's <code class="filename">pg_replslot/</code> directory, so that + replication slots that exist on the primary do not become part of the + backup. Otherwise, the subsequent use of the backup to create a standby + may result in indefinite retention of WAL files on the standby, and + possibly bloat on the primary if hot standby feedback is enabled, because + the clients that are using those replication slots will still be connecting + to and updating the slots on the primary, not the standby. Even if the + backup is only intended for use in creating a new primary, copying the + replication slots isn't expected to be particularly useful, since the + contents of those slots will likely be badly out of date by the time + the new primary comes on line. + </p><p> + The contents of the directories <code class="filename">pg_dynshmem/</code>, + <code class="filename">pg_notify/</code>, <code class="filename">pg_serial/</code>, + <code class="filename">pg_snapshots/</code>, <code class="filename">pg_stat_tmp/</code>, + and <code class="filename">pg_subtrans/</code> (but not the directories themselves) can be + omitted from the backup as they will be initialized on postmaster startup. + </p><p> + Any file or directory beginning with <code class="filename">pgsql_tmp</code> can be + omitted from the backup. These files are removed on postmaster start and + the directories will be recreated as needed. + </p><p> + <code class="filename">pg_internal.init</code> files can be omitted from the + backup whenever a file of that name is found. These files contain + relation cache data that is always rebuilt when recovering. + </p><p> + The backup label + file includes the label string you gave to <code class="function">pg_backup_start</code>, + as well as the time at which <code class="function">pg_backup_start</code> was run, and + the name of the starting WAL file. In case of confusion it is therefore + possible to look inside a backup file and determine exactly which + backup session the dump file came from. The tablespace map file includes + the symbolic link names as they exist in the directory + <code class="filename">pg_tblspc/</code> and the full path of each symbolic link. + These files are not merely for your information; their presence and + contents are critical to the proper operation of the system's recovery + process. + </p><p> + It is also possible to make a backup while the server is + stopped. In this case, you obviously cannot use + <code class="function">pg_backup_start</code> or <code class="function">pg_backup_stop</code>, and + you will therefore be left to your own devices to keep track of which + backup is which and how far back the associated WAL files go. + It is generally better to follow the continuous archiving procedure above. + </p></div></div><div class="sect2" id="BACKUP-PITR-RECOVERY"><div class="titlepage"><div><div><h3 class="title">26.3.4. Recovering Using a Continuous Archive Backup <a href="#BACKUP-PITR-RECOVERY" class="id_link">#</a></h3></div></div></div><p> + Okay, the worst has happened and you need to recover from your backup. + Here is the procedure: + </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> + Stop the server, if it's running. + </p></li><li class="listitem"><p> + If you have the space to do so, + copy the whole cluster data directory and any tablespaces to a temporary + location in case you need them later. Note that this precaution will + require that you have enough free space on your system to hold two + copies of your existing database. If you do not have enough space, + you should at least save the contents of the cluster's <code class="filename">pg_wal</code> + subdirectory, as it might contain WAL files which + were not archived before the system went down. + </p></li><li class="listitem"><p> + Remove all existing files and subdirectories under the cluster data + directory and under the root directories of any tablespaces you are using. + </p></li><li class="listitem"><p> + Restore the database files from your file system backup. Be sure that they + are restored with the right ownership (the database system user, not + <code class="literal">root</code>!) and with the right permissions. If you are using + tablespaces, + you should verify that the symbolic links in <code class="filename">pg_tblspc/</code> + were correctly restored. + </p></li><li class="listitem"><p> + Remove any files present in <code class="filename">pg_wal/</code>; these came from the + file system backup and are therefore probably obsolete rather than current. + If you didn't archive <code class="filename">pg_wal/</code> at all, then recreate + it with proper permissions, + being careful to ensure that you re-establish it as a symbolic link + if you had it set up that way before. + </p></li><li class="listitem"><p> + If you have unarchived WAL segment files that you saved in step 2, + copy them into <code class="filename">pg_wal/</code>. (It is best to copy them, + not move them, so you still have the unmodified files if a + problem occurs and you have to start over.) + </p></li><li class="listitem"><p> + Set recovery configuration settings in + <code class="filename">postgresql.conf</code> (see <a class="xref" href="runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY" title="20.5.5. Archive Recovery">Section 20.5.5</a>) and create a file + <code class="filename">recovery.signal</code> in the cluster + data directory. You might + also want to temporarily modify <code class="filename">pg_hba.conf</code> to prevent + ordinary users from connecting until you are sure the recovery was successful. + </p></li><li class="listitem"><p> + Start the server. The server will go into recovery mode and + proceed to read through the archived WAL files it needs. Should the + recovery be terminated because of an external error, the server can + simply be restarted and it will continue recovery. Upon completion + of the recovery process, the server will remove + <code class="filename">recovery.signal</code> (to prevent + accidentally re-entering recovery mode later) and then + commence normal database operations. + </p></li><li class="listitem"><p> + Inspect the contents of the database to ensure you have recovered to + the desired state. If not, return to step 1. If all is well, + allow your users to connect by restoring <code class="filename">pg_hba.conf</code> to normal. + </p></li></ol></div><p> + </p><p> + The key part of all this is to set up a recovery configuration that + describes how you want to recover and how far the recovery should + run. The one thing that you absolutely must specify is the <code class="varname">restore_command</code>, + which tells <span class="productname">PostgreSQL</span> how to retrieve archived + WAL file segments. Like the <code class="varname">archive_command</code>, this is + a shell command string. It can contain <code class="literal">%f</code>, which is + replaced by the name of the desired WAL file, and <code class="literal">%p</code>, + which is replaced by the path name to copy the WAL file to. + (The path name is relative to the current working directory, + i.e., the cluster's data directory.) + Write <code class="literal">%%</code> if you need to embed an actual <code class="literal">%</code> + character in the command. The simplest useful command is + something like: +</p><pre class="programlisting"> +restore_command = 'cp /mnt/server/archivedir/%f %p' +</pre><p> + which will copy previously archived WAL segments from the directory + <code class="filename">/mnt/server/archivedir</code>. Of course, you can use something + much more complicated, perhaps even a shell script that requests the + operator to mount an appropriate tape. + </p><p> + It is important that the command return nonzero exit status on failure. + The command <span class="emphasis"><em>will</em></span> be called requesting files that are not + present in the archive; it must return nonzero when so asked. This is not + an error condition. An exception is that if the command was terminated by + a signal (other than <span class="systemitem">SIGTERM</span>, which is used as + part of a database server shutdown) or an error by the shell (such as + command not found), then recovery will abort and the server will not start + up. + </p><p> + Not all of the requested files will be WAL segment + files; you should also expect requests for files with a suffix of + <code class="literal">.history</code>. Also be aware that + the base name of the <code class="literal">%p</code> path will be different from + <code class="literal">%f</code>; do not expect them to be interchangeable. + </p><p> + WAL segments that cannot be found in the archive will be sought in + <code class="filename">pg_wal/</code>; this allows use of recent un-archived segments. + However, segments that are available from the archive will be used in + preference to files in <code class="filename">pg_wal/</code>. + </p><p> + Normally, recovery will proceed through all available WAL segments, + thereby restoring the database to the current point in time (or as + close as possible given the available WAL segments). Therefore, a normal + recovery will end with a <span class="quote">“<span class="quote">file not found</span>”</span> message, the exact text + of the error message depending upon your choice of + <code class="varname">restore_command</code>. You may also see an error message + at the start of recovery for a file named something like + <code class="filename">00000001.history</code>. This is also normal and does not + indicate a problem in simple recovery situations; see + <a class="xref" href="continuous-archiving.html#BACKUP-TIMELINES" title="26.3.5. Timelines">Section 26.3.5</a> for discussion. + </p><p> + If you want to recover to some previous point in time (say, right before + the junior DBA dropped your main transaction table), just specify the + required <a class="link" href="runtime-config-wal.html#RUNTIME-CONFIG-WAL-RECOVERY-TARGET" title="20.5.6. Recovery Target">stopping point</a>. You can specify + the stop point, known as the <span class="quote">“<span class="quote">recovery target</span>”</span>, either by + date/time, named restore point or by completion of a specific transaction + ID. As of this writing only the date/time and named restore point options + are very usable, since there are no tools to help you identify with any + accuracy which transaction ID to use. + </p><div class="note"><h3 class="title">Note</h3><p> + The stop point must be after the ending time of the base backup, i.e., + the end time of <code class="function">pg_backup_stop</code>. You cannot use a base backup + to recover to a time when that backup was in progress. (To + recover to such a time, you must go back to your previous base backup + and roll forward from there.) + </p></div><p> + If recovery finds corrupted WAL data, recovery will + halt at that point and the server will not start. In such a case the + recovery process could be re-run from the beginning, specifying a + <span class="quote">“<span class="quote">recovery target</span>”</span> before the point of corruption so that recovery + can complete normally. + If recovery fails for an external reason, such as a system crash or + if the WAL archive has become inaccessible, then the recovery can simply + be restarted and it will restart almost from where it failed. + Recovery restart works much like checkpointing in normal operation: + the server periodically forces all its state to disk, and then updates + the <code class="filename">pg_control</code> file to indicate that the already-processed + WAL data need not be scanned again. + </p></div><div class="sect2" id="BACKUP-TIMELINES"><div class="titlepage"><div><div><h3 class="title">26.3.5. Timelines <a href="#BACKUP-TIMELINES" class="id_link">#</a></h3></div></div></div><a id="id-1.6.13.7.13.2" class="indexterm"></a><p> + The ability to restore the database to a previous point in time creates + some complexities that are akin to science-fiction stories about time + travel and parallel universes. For example, in the original history of the database, + suppose you dropped a critical table at 5:15PM on Tuesday evening, but + didn't realize your mistake until Wednesday noon. + Unfazed, you get out your backup, restore to the point-in-time 5:14PM + Tuesday evening, and are up and running. In <span class="emphasis"><em>this</em></span> history of + the database universe, you never dropped the table. But suppose + you later realize this wasn't such a great idea, and would like + to return to sometime Wednesday morning in the original history. + You won't be able + to if, while your database was up-and-running, it overwrote some of the + WAL segment files that led up to the time you now wish you + could get back to. Thus, to avoid this, you need to distinguish the series of + WAL records generated after you've done a point-in-time recovery from + those that were generated in the original database history. + </p><p> + To deal with this problem, <span class="productname">PostgreSQL</span> has a notion + of <em class="firstterm">timelines</em>. Whenever an archive recovery completes, + a new timeline is created to identify the series of WAL records + generated after that recovery. The timeline + ID number is part of WAL segment file names so a new timeline does + not overwrite the WAL data generated by previous timelines. + For example, in the WAL file name + <code class="filename">0000000100001234000055CD</code>, the leading + <code class="literal">00000001</code> is the timeline ID in hexadecimal. (Note that + in other contexts, such as server log messages, timeline IDs are + usually printed in decimal.) + </p><p> + It is + in fact possible to archive many different timelines. While that might + seem like a useless feature, it's often a lifesaver. Consider the + situation where you aren't quite sure what point-in-time to recover to, + and so have to do several point-in-time recoveries by trial and error + until you find the best place to branch off from the old history. Without + timelines this process would soon generate an unmanageable mess. With + timelines, you can recover to <span class="emphasis"><em>any</em></span> prior state, including + states in timeline branches that you abandoned earlier. + </p><p> + Every time a new timeline is created, <span class="productname">PostgreSQL</span> creates + a <span class="quote">“<span class="quote">timeline history</span>”</span> file that shows which timeline it branched + off from and when. These history files are necessary to allow the system + to pick the right WAL segment files when recovering from an archive that + contains multiple timelines. Therefore, they are archived into the WAL + archive area just like WAL segment files. The history files are just + small text files, so it's cheap and appropriate to keep them around + indefinitely (unlike the segment files which are large). You can, if + you like, add comments to a history file to record your own notes about + how and why this particular timeline was created. Such comments will be + especially valuable when you have a thicket of different timelines as + a result of experimentation. + </p><p> + The default behavior of recovery is to recover to the latest timeline found + in the archive. If you wish to recover to the timeline that was current + when the base backup was taken or into a specific child timeline (that + is, you want to return to some state that was itself generated after a + recovery attempt), you need to specify <code class="literal">current</code> or the + target timeline ID in <a class="xref" href="runtime-config-wal.html#GUC-RECOVERY-TARGET-TIMELINE">recovery_target_timeline</a>. You + cannot recover into timelines that branched off earlier than the base backup. + </p></div><div class="sect2" id="BACKUP-TIPS"><div class="titlepage"><div><div><h3 class="title">26.3.6. Tips and Examples <a href="#BACKUP-TIPS" class="id_link">#</a></h3></div></div></div><p> + Some tips for configuring continuous archiving are given here. + </p><div class="sect3" id="BACKUP-STANDALONE"><div class="titlepage"><div><div><h4 class="title">26.3.6.1. Standalone Hot Backups <a href="#BACKUP-STANDALONE" class="id_link">#</a></h4></div></div></div><p> + It is possible to use <span class="productname">PostgreSQL</span>'s backup facilities to + produce standalone hot backups. These are backups that cannot be used + for point-in-time recovery, yet are typically much faster to backup and + restore than <span class="application">pg_dump</span> dumps. (They are also much larger + than <span class="application">pg_dump</span> dumps, so in some cases the speed advantage + might be negated.) + </p><p> + As with base backups, the easiest way to produce a standalone + hot backup is to use the <a class="xref" href="app-pgbasebackup.html" title="pg_basebackup"><span class="refentrytitle"><span class="application">pg_basebackup</span></span></a> + tool. If you include the <code class="literal">-X</code> parameter when calling + it, all the write-ahead log required to use the backup will be + included in the backup automatically, and no special action is + required to restore the backup. + </p></div><div class="sect3" id="COMPRESSED-ARCHIVE-LOGS"><div class="titlepage"><div><div><h4 class="title">26.3.6.2. Compressed Archive Logs <a href="#COMPRESSED-ARCHIVE-LOGS" class="id_link">#</a></h4></div></div></div><p> + If archive storage size is a concern, you can use + <span class="application">gzip</span> to compress the archive files: +</p><pre class="programlisting"> +archive_command = 'gzip < %p > /mnt/server/archivedir/%f.gz' +</pre><p> + You will then need to use <span class="application">gunzip</span> during recovery: +</p><pre class="programlisting"> +restore_command = 'gunzip < /mnt/server/archivedir/%f.gz > %p' +</pre><p> + </p></div><div class="sect3" id="BACKUP-SCRIPTS"><div class="titlepage"><div><div><h4 class="title">26.3.6.3. <code class="varname">archive_command</code> Scripts <a href="#BACKUP-SCRIPTS" class="id_link">#</a></h4></div></div></div><p> + Many people choose to use scripts to define their + <code class="varname">archive_command</code>, so that their + <code class="filename">postgresql.conf</code> entry looks very simple: +</p><pre class="programlisting"> +archive_command = 'local_backup_script.sh "%p" "%f"' +</pre><p> + Using a separate script file is advisable any time you want to use + more than a single command in the archiving process. + This allows all complexity to be managed within the script, which + can be written in a popular scripting language such as + <span class="application">bash</span> or <span class="application">perl</span>. + </p><p> + Examples of requirements that might be solved within a script include: + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + Copying data to secure off-site data storage + </p></li><li class="listitem"><p> + Batching WAL files so that they are transferred every three hours, + rather than one at a time + </p></li><li class="listitem"><p> + Interfacing with other backup and recovery software + </p></li><li class="listitem"><p> + Interfacing with monitoring software to report errors + </p></li></ul></div><p> + </p><div class="tip"><h3 class="title">Tip</h3><p> + When using an <code class="varname">archive_command</code> script, it's desirable + to enable <a class="xref" href="runtime-config-logging.html#GUC-LOGGING-COLLECTOR">logging_collector</a>. + Any messages written to <span class="systemitem">stderr</span> from the script will then + appear in the database server log, allowing complex configurations to + be diagnosed easily if they fail. + </p></div></div></div><div class="sect2" id="CONTINUOUS-ARCHIVING-CAVEATS"><div class="titlepage"><div><div><h3 class="title">26.3.7. Caveats <a href="#CONTINUOUS-ARCHIVING-CAVEATS" class="id_link">#</a></h3></div></div></div><p> + At this writing, there are several limitations of the continuous archiving + technique. These will probably be fixed in future releases: + + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + If a <a class="link" href="sql-createdatabase.html" title="CREATE DATABASE"><code class="command">CREATE DATABASE</code></a> + command is executed while a base backup is being taken, and then + the template database that the <code class="command">CREATE DATABASE</code> copied + is modified while the base backup is still in progress, it is + possible that recovery will cause those modifications to be + propagated into the created database as well. This is of course + undesirable. To avoid this risk, it is best not to modify any + template databases while taking a base backup. + </p></li><li class="listitem"><p> + <a class="link" href="sql-createtablespace.html" title="CREATE TABLESPACE"><code class="command">CREATE TABLESPACE</code></a> + commands are WAL-logged with the literal absolute path, and will + therefore be replayed as tablespace creations with the same + absolute path. This might be undesirable if the WAL is being + replayed on a different machine. It can be dangerous even if the + WAL is being replayed on the same machine, but into a new data + directory: the replay will still overwrite the contents of the + original tablespace. To avoid potential gotchas of this sort, + the best practice is to take a new base backup after creating or + dropping tablespaces. + </p></li></ul></div><p> + </p><p> + It should also be noted that the default <acronym class="acronym">WAL</acronym> + format is fairly bulky since it includes many disk page snapshots. + These page snapshots are designed to support crash recovery, since + we might need to fix partially-written disk pages. Depending on + your system hardware and software, the risk of partial writes might + be small enough to ignore, in which case you can significantly + reduce the total volume of archived WAL files by turning off page + snapshots using the <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> + parameter. (Read the notes and warnings in <a class="xref" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Chapter 30</a> + before you do so.) Turning off page snapshots does not prevent + use of the WAL for PITR operations. An area for future + development is to compress archived WAL data by removing + unnecessary page copies even when <code class="varname">full_page_writes</code> is + on. In the meantime, administrators might wish to reduce the number + of page snapshots included in WAL by increasing the checkpoint + interval parameters as much as feasible. + </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="backup-file.html" title="26.2. File System Level Backup">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="backup.html" title="Chapter 26. Backup and Restore">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="high-availability.html" title="Chapter 27. High Availability, Load Balancing, and Replication">Next</a></td></tr><tr><td width="40%" align="left" valign="top">26.2. File System Level Backup </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> Chapter 27. High Availability, Load Balancing, and Replication</td></tr></table></div></body></html>
\ No newline at end of file |