1 files changed, 161 insertions, 0 deletions
diff --git a/doc/src/sgml/html/wal-reliability.html b/doc/src/sgml/html/wal-reliability.html
new file mode 100644
index 0000000..84505e3
--- /dev/null
+++ b/doc/src/sgml/html/wal-reliability.html
@@ -0,0 +1,161 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>30.1. Reliability</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log" /><link rel="next" href="checksums.html" title="30.2. Data Checksums" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">30.1. Reliability</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Up</a></td><th width="60%" align="center">Chapter 30. Reliability and the Write-Ahead Log</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="checksums.html" title="30.2. Data Checksums">Next</a></td></tr></table><hr /></div><div class="sect1" id="WAL-RELIABILITY"><div class="titlepage"><div><div><h2 class="title" style="clear: both">30.1. Reliability</h2></div></div></div><p>
+   Reliability is an important property of any serious database
+   system, and <span class="productname">PostgreSQL</span> does everything possible to
+   guarantee reliable operation. One aspect of reliable operation is
+   that all data recorded by a committed transaction should be stored
+   in a nonvolatile area that is safe from power loss, operating
+   system failure, and hardware failure (except failure of the
+   nonvolatile area itself, of course).  Successfully writing the data
+   to the computer's permanent storage (disk drive or equivalent)
+   ordinarily meets this requirement.  In fact, even if a computer is
+   fatally damaged, if the disk drives survive they can be moved to
+   another computer with similar hardware and all committed
+   transactions will remain intact.
+  </p><p>
+   While forcing data to the disk platters periodically might seem like
+   a simple operation, it is not. Because disk drives are dramatically
+   slower than main memory and CPUs, several layers of caching exist
+   between the computer's main memory and the disk platters.
+   First, there is the operating system's buffer cache, which caches
+   frequently requested disk blocks and combines disk writes. Fortunately,
+   all operating systems give applications a way to force writes from
+   the buffer cache to disk, and <span class="productname">PostgreSQL</span> uses those
+   features.  (See the <a class="xref" href="runtime-config-wal.html#GUC-WAL-SYNC-METHOD">wal_sync_method</a> parameter
+   to adjust how this is done.)
+  </p><p>
+   Next, there might be a cache in the disk drive controller; this is
+   particularly common on <acronym class="acronym">RAID</acronym> controller cards. Some of
+   these caches are <em class="firstterm">write-through</em>, meaning writes are sent
+   to the drive as soon as they arrive. Others are
+   <em class="firstterm">write-back</em>, meaning data is sent to the drive at
+   some later time. Such caches can be a reliability hazard because the
+   memory in the disk controller cache is volatile, and will lose its
+   contents in a power failure.  Better controller cards have
+   <em class="firstterm">battery-backup units</em> (<acronym class="acronym">BBU</acronym>s), meaning
+   the card has a battery that
+   maintains power to the cache in case of system power loss.  After power
+   is restored the data will be written to the disk drives.
+  </p><p>
+   And finally, most disk drives have caches. Some are write-through
+   while some are write-back, and the same concerns about data loss
+   exist for write-back drive caches as for disk controller
+   caches.  Consumer-grade IDE and SATA drives are particularly likely
+   to have write-back caches that will not survive a power failure.  Many
+   solid-state drives (SSD) also have volatile write-back caches.
+  </p><p>
+   These caches can typically be disabled; however, the method for doing
+   this varies by operating system and drive type:
+  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+        On <span class="productname">Linux</span>, IDE and SATA drives can be queried using
+        <code class="command">hdparm -I</code>; write caching is enabled if there is
+        a <code class="literal">*</code> next to <code class="literal">Write cache</code>.  <code class="command">hdparm -W 0</code>
+        can be used to turn off write caching.  SCSI drives can be queried
+        using <a class="ulink" href="http://sg.danny.cz/sg/sdparm.html" target="_top"><span class="application">sdparm</span></a>.
+        Use <code class="command">sdparm --get=WCE</code> to check
+        whether the write cache is enabled and <code class="command">sdparm --clear=WCE</code>
+        to disable it.
+      </p></li><li class="listitem"><p>
+        On <span class="productname">FreeBSD</span>, IDE drives can be queried using
+        <code class="command">atacontrol</code> and write caching turned off using
+        <code class="literal">hw.ata.wc=0</code> in <code class="filename">/boot/loader.conf</code>;
+        SCSI drives can be queried using <code class="command">camcontrol identify</code>,
+        and the write cache both queried and changed using
+        <code class="command">sdparm</code> when available.
+      </p></li><li class="listitem"><p>
+        On <span class="productname">Solaris</span>, the disk write cache is controlled by
+        <code class="command">format -e</code>.
+        (The Solaris <acronym class="acronym">ZFS</acronym> file system is safe with disk write-cache
+        enabled because it issues its own disk cache flush commands.)
+      </p></li><li class="listitem"><p>
+        On <span class="productname">Windows</span>, if <code class="varname">wal_sync_method</code> is
+        <code class="literal">open_datasync</code> (the default), write caching can be disabled
+        by unchecking <code class="literal">My Computer\Open\<em class="replaceable"><code>disk drive</code></em>\Properties\Hardware\Properties\Policies\Enable write caching on the disk</code>.
+        Alternatively, set <code class="varname">wal_sync_method</code> to
+        <code class="literal">fsync</code> or <code class="literal">fsync_writethrough</code>, which prevent
+        write caching.
+      </p></li><li class="listitem"><p>
+        On <span class="productname">macOS</span>, write caching can be prevented by
+        setting <code class="varname">wal_sync_method</code> to <code class="literal">fsync_writethrough</code>.
+      </p></li></ul></div><p>
+   Recent SATA drives (those following <acronym class="acronym">ATAPI-6</acronym> or later)
+   offer a drive cache flush command (<code class="command">FLUSH CACHE EXT</code>),
+   while SCSI drives have long supported a similar command
+   <code class="command">SYNCHRONIZE CACHE</code>.  These commands are not directly
+   accessible to <span class="productname">PostgreSQL</span>, but some file systems
+   (e.g., <acronym class="acronym">ZFS</acronym>, <acronym class="acronym">ext4</acronym>) can use them to flush
+   data to the platters on write-back-enabled drives.  Unfortunately, such
+   file systems behave suboptimally when combined with battery-backup unit
+   (<acronym class="acronym">BBU</acronym>) disk controllers.  In such setups, the synchronize
+   command forces all data from the controller cache to the disks,
+   eliminating much of the benefit of the BBU.  You can run the
+   <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program to see
+   if you are affected.  If you are affected, the performance benefits
+   of the BBU can be regained by turning off write barriers in
+   the file system or reconfiguring the disk controller, if that is
+   an option.  If write barriers are turned off, make sure the battery
+   remains functional; a faulty battery can potentially lead to data loss.
+   Hopefully file system and disk controller designers will eventually
+   address this suboptimal behavior.
+  </p><p>
+   When the operating system sends a write request to the storage hardware,
+   there is little it can do to make sure the data has arrived at a truly
+   non-volatile storage area. Rather, it is the
+   administrator's responsibility to make certain that all storage components
+   ensure integrity for both data and file-system metadata.
+   Avoid disk controllers that have non-battery-backed write caches.
+   At the drive level, disable write-back caching if the
+   drive cannot guarantee the data will be written before shutdown.
+   If you use SSDs, be aware that many of these do not honor cache flush
+   commands by default.
+   You can test for reliable I/O subsystem behavior using <a class="ulink" href="https://brad.livejournal.com/2116715.html" target="_top"><code class="filename">diskchecker.pl</code></a>.
+  </p><p>
+   Another risk of data loss is posed by the disk platter write
+   operations themselves. Disk platters are divided into sectors,
+   commonly 512 bytes each.  Every physical read or write operation
+   processes a whole sector.
+   When a write request arrives at the drive, it might be for some multiple
+   of 512 bytes (<span class="productname">PostgreSQL</span> typically writes 8192 bytes, or
+   16 sectors, at a time), and the process of writing could fail due
+   to power loss at any time, meaning some of the 512-byte sectors were
+   written while others were not.  To guard against such failures,
+   <span class="productname">PostgreSQL</span> periodically writes full page images to
+   permanent WAL storage <span class="emphasis"><em>before</em></span> modifying the actual page on
+   disk. By doing this, during crash recovery <span class="productname">PostgreSQL</span> can
+   restore partially-written pages from WAL.  If you have file-system software
+   that prevents partial page writes (e.g., ZFS),  you can turn off
+   this page imaging by turning off the <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> parameter. Battery-Backed Unit
+   (BBU) disk controllers do not prevent partial page writes unless
+   they guarantee that data is written to the BBU as full (8kB) pages.
+  </p><p>
+   <span class="productname">PostgreSQL</span> also protects against some kinds of data corruption
+   on storage devices that may occur because of hardware errors or media failure over time,
+   such as reading/writing garbage data.
+   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+      Each individual record in a WAL file is protected by a CRC-32 (32-bit) check
+      that allows us to tell if record contents are correct. The CRC value
+      is set when we write each WAL record and checked during crash recovery,
+      archive recovery and replication.
+     </p></li><li class="listitem"><p>
+      Data pages are not currently checksummed by default, though full page images
+      recorded in WAL records will be protected; see <a class="link" href="app-initdb.html#APP-INITDB-DATA-CHECKSUMS"><span class="application">initdb</span></a>
+      for details about enabling data checksums.
+     </p></li><li class="listitem"><p>
+      Internal data structures such as <code class="filename">pg_xact</code>, <code class="filename">pg_subtrans</code>, <code class="filename">pg_multixact</code>,
+      <code class="filename">pg_serial</code>, <code class="filename">pg_notify</code>, <code class="filename">pg_stat</code>, <code class="filename">pg_snapshots</code> are not directly
+      checksummed, nor are pages protected by full page writes. However, where
+      such data structures are persistent, WAL records are written that allow
+      recent changes to be accurately rebuilt at crash recovery and those
+      WAL records are protected as discussed above.
+     </p></li><li class="listitem"><p>
+      Individual state files in <code class="filename">pg_twophase</code> are protected by CRC-32.
+     </p></li><li class="listitem"><p>
+      Temporary data files used in larger SQL queries for sorts,
+      materializations and intermediate results are not currently checksummed,
+      nor will WAL records be written for changes to those files.
+     </p></li></ul></div><p>
+  </p><p>
+   <span class="productname">PostgreSQL</span> does not protect against correctable memory errors
+   and it is assumed you will operate using RAM that uses industry standard
+   Error Correcting Codes (ECC) or better protection.
+  </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="wal.html" title="Chapter 30. Reliability and the Write-Ahead Log">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="checksums.html" title="30.2. Data Checksums">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 30. Reliability and the Write-Ahead Log </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 30.2. Data Checksums</td></tr></table></div></body></html>
+\ No newline at end of file