From 18657a960e125336f704ea058e25c27bd3900dcb Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 5 May 2024 19:28:19 +0200 Subject: Adding upstream version 3.40.1. Signed-off-by: Daniel Baumann --- www/psow.html | 286 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 www/psow.html (limited to 'www/psow.html') diff --git a/www/psow.html b/www/psow.html new file mode 100644 index 0000000..75c7942 --- /dev/null +++ b/www/psow.html @@ -0,0 +1,286 @@ + + + + + +Powersafe Overwrite + + + +
+ + + +
+
+Small. Fast. Reliable.
Choose any three. +
+ + +
+
+ + + +
+
+
+ + + + +

Powersafe Overwrite

+ +

"Powersafe overwrite" is a term used by the SQLite team to describe +a behavior of some filesystems and disk-controllers related to +data preservation during a power loss. Powersafe overwrite +is a boolean property: either the storage system has it or it does not. + +

We say that a system has the powersafe overwrite property if the following +statement is true: + +

+ When an application writes a range of bytes in a file, no + bytes outside of that range will change, even if the write occurs + just before a crash or power failure. +
+ +

The powersafe overwrite property says nothing about the state of the +bytes that were written. Those bytes might contain their old values, +their new values, random values, or some combination of these. The powersafe +overwrite property merely states that writes cannot change bytes outside +of the range of bytes written. + +

In other words, powersafe overwrite means that there is no "collateral +damage" when a power loss occurs while writing. Only those bytes actually +being written might be damaged. + +

In practical terms, what the powersafe write property means is that when +the disk controller detects an impending power loss, it finishes writing +whatever sector it is working on prior to parking the heads. It means that +individual sector writes will complete once started, even if +there is a power loss. + +

Consider what would happen if disk sector writes are interrupted +by a power loss. If an application writes two or three bytes in the middle +of some file, the operating system will implement this by first reading +the entire sector containing those bytes, making the change to the +sector in memory, then writing the entire sector back to the disk. If a power +loss occurs during the writeback and the sector was not completely written, +then on the next read after reboot, error correcting codes +in the sector will probably detect irreparable damage and the disk +controller will read out the sector as all zeros or all ones. Thus +values will have changed outside of the range of the two or three bytes +that were written at the application level - a violation of the powersafe +overwrite property. + +

SQLite Assumptions About Powersafe Overwrite

+ +

All versions of SQLite up to and including version 3.7.9 +(2011-11-01) assume that +the filesystem does not provide powersafe overwrite. SQLite +has traditionally assumed that when any one byte of a file changes, all +other bytes within the same sector of that byte have the potential of +being corrupted on a power loss. When writing, SQLite has made sure +to journal all bytes in the same sector of any modifications +and it pads journal files out to the next sector boundary so that +subsequent appends to that journal cannot damage prior records. +SQLite understands the sector size to be the value returned by the +xSectorSize method in the VFS. The SQLite team has often referred +to the value returned by xSectorSize as the "blast radius" of a write, +since it expresses the range of bytes that might be damaged if a power +loss occurs during the write. +The default VFSes for unix and windows have always returned 512 as +the sector size (or blast radius) for all versions of SQLite up to +and including version 3.7.9. + +

Newer disk drives have begun using 4096 byte sectors however. Beginning +with SQLite version 3.7.10 (2012-01-16), +the SQLite development team experimented with +changes xSectorSize to report 4096 bytes as the blast radius. +This had the effect of increasing write overhead on +many databases. For a database with a PRAGMA page_size of 1024 +(a very common choice) making a change to a single page in the database +now requires SQLite to backup three other adjacent pages to the rollback +journal, whereas formerly it only had to backup the one page that was +changing. In WAL mode, each transaction had to be padded out to the +next 4096-byte boundary in the WAL file, rather than the next 512-byte +boundary, resulting in thousands of extra bytes being written +per transaction. + +

The extra write overhead prompted a reexamination of assumptions about +powersafe overwrite. With modern disk drives, the capacity has become +so large and the data density so great that a single sector is very +small and writing a single sector takes very little time. We know that +disk drives can detect an impending power loss and continue +to operate for some small amount of time on residual energy because those +drives are able to park their heads before spinning down. And +so if an impending power loss is detectable by the disk controller, it +seems reasonable that the controller will finish writing +whatever sector it is current working on when the imminent power loss +is first detected, prior to parking the heads, as long as doing so +does not take too long, which it should not with +small and dense sectors. Hence it seems reasonable +to assume powersafe overwrite for modern disks. Indeed, BerkeleyDB has +made this assumption for decades, we are told. Caution is advised +though. As Roger Binns noted on the SQLite developers mailing list: +"'poorly written' should be the main assumption about drive firmware." + + + +

Torn Pages

+ +

A torn page occurs when a database page is larger than a disk sector, +the database page is written to disk, but a power loss occurs prior to +all sectors of the database page being written. Then, upon recovery, part of +the database page will have the old content while some other parts of the +page will have the new content. Some database engines assume that +page writes are atomic and hence a torn page is an unrecoverable error. +

+ +

SQLite never assumes that database page writes are atomic, +regardless of the PSOW setting.(1) +And hence SQLite is always able to automatically recover from torn pages +induced by a crash. Enabling PSOW does not decrease SQLite's ability +to recover from a torn page.

+ +

Changes In SQLite Version 3.7.10

+ +

The VFS for SQLite version 3.7.10 (2012-01-16) +adds a new device characteristic +named SQLITE_IOCAP_POWERSAFE_OVERWRITE. Database files that report this +characteristic are assumed to reside on storage systems that have the +powersafe overwrite property. +The default unix and windows VFSes now report +SQLITE_IOCAP_POWERSAFE_OVERWRITE if SQLite is compiled with +-DSQLITE_POWERSAFE_OVERWRITE=1 or they +make the legacy assumption that storage does not have the powersafe +overwrite property if compiled with +-DSQLITE_POWERSAFE_OVERWRITE=0. +For now, the default is for powersafe overwrite to be turned on, though +we may revisit this in the future and default it off. + +

The powersafe overwrite property for individual databases can be +specified as the database is opened using the "psow" query parameter +with a URI filename. For example, to always assume powersafe +overwrite for a file (perhaps to ensure maximum write performance), +open it as + +

+ file:somefile.db?psow=1 +
+ +

Or to be extra safe with a database and to force SQLite to assume the +database lacks powersafe overwrite, open it using + +

+ file:somefile.db?psow=0 +
+ +

There is also a new SQLITE_FCNTL_POWERSAFE_OVERWRITE opcode for +the sqlite3_file_control() that allows +an application to query the powersafe overwrite property for a database +file. + +


+

Notes:

+
  1. +SQLite never assumes atomic page writes in its default configurations. +But a custom VFS can set one of the +SQLITE_IOCAP_ATOMIC bits in the result of the xDeviceCharacteristic() +method and then SQLite will assume that page writes are atomic. The +application must supply a custom VFS to accomplish this, however, since +none of the standard VFSes will ever set any of the atomic bits in the +xDeviceCharacteristics() vector. +

+

This page last modified on 2016-09-14 18:40:10 UTC

+ -- cgit v1.2.3