summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/storage-toast.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/storage-toast.html')
-rw-r--r--doc/src/sgml/html/storage-toast.html221
1 files changed, 221 insertions, 0 deletions
diff --git a/doc/src/sgml/html/storage-toast.html b/doc/src/sgml/html/storage-toast.html
new file mode 100644
index 0000000..f0724d1
--- /dev/null
+++ b/doc/src/sgml/html/storage-toast.html
@@ -0,0 +1,221 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>69.2. TOAST</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="storage-file-layout.html" title="69.1. Database File Layout" /><link rel="next" href="storage-fsm.html" title="69.3. Free Space Map" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">69.2. TOAST</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="storage-file-layout.html" title="69.1. Database File Layout">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="storage.html" title="Chapter 69. Database Physical Storage">Up</a></td><th width="60%" align="center">Chapter 69. Database Physical Storage</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="storage-fsm.html" title="69.3. Free Space Map">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="STORAGE-TOAST"><div class="titlepage"><div><div><h2 class="title" style="clear: both">69.2. TOAST</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-ONDISK">69.2.1. Out-of-Line, On-Disk TOAST Storage</a></span></dt><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-INMEMORY">69.2.2. Out-of-Line, In-Memory TOAST Storage</a></span></dt></dl></div><a id="id-1.10.22.4.2" class="indexterm"></a><a id="id-1.10.22.4.3" class="indexterm"></a><p>
+This section provides an overview of <acronym class="acronym">TOAST</acronym> (The
+Oversized-Attribute Storage Technique).
+</p><p>
+<span class="productname">PostgreSQL</span> uses a fixed page size (commonly
+8 kB), and does not allow tuples to span multiple pages. Therefore, it is
+not possible to store very large field values directly. To overcome
+this limitation, large field values are compressed and/or broken up into
+multiple physical rows. This happens transparently to the user, with only
+small impact on most of the backend code. The technique is affectionately
+known as <acronym class="acronym">TOAST</acronym> (or <span class="quote">“<span class="quote">the best thing since sliced bread</span>”</span>).
+The <acronym class="acronym">TOAST</acronym> infrastructure is also used to improve handling of
+large data values in-memory.
+</p><p>
+Only certain data types support <acronym class="acronym">TOAST</acronym> — there is no need to
+impose the overhead on data types that cannot produce large field values.
+To support <acronym class="acronym">TOAST</acronym>, a data type must have a variable-length
+(<em class="firstterm">varlena</em>) representation, in which, ordinarily, the first
+four-byte word of any stored value contains the total length of the value in
+bytes (including itself). <acronym class="acronym">TOAST</acronym> does not constrain the rest
+of the data type's representation. The special representations collectively
+called <em class="firstterm"><acronym class="acronym">TOAST</acronym>ed values</em> work by modifying or
+reinterpreting this initial length word. Therefore, the C-level functions
+supporting a <acronym class="acronym">TOAST</acronym>-able data type must be careful about how they
+handle potentially <acronym class="acronym">TOAST</acronym>ed input values: an input might not
+actually consist of a four-byte length word and contents until after it's
+been <em class="firstterm">detoasted</em>. (This is normally done by invoking
+<code class="function">PG_DETOAST_DATUM</code> before doing anything with an input value,
+but in some cases more efficient approaches are possible.
+See <a class="xref" href="xtypes.html#XTYPES-TOAST" title="37.13.1. TOAST Considerations">Section 37.13.1</a> for more detail.)
+</p><p>
+<acronym class="acronym">TOAST</acronym> usurps two bits of the varlena length word (the high-order
+bits on big-endian machines, the low-order bits on little-endian machines),
+thereby limiting the logical size of any value of a <acronym class="acronym">TOAST</acronym>-able
+data type to 1 GB (2<sup>30</sup> - 1 bytes). When both bits are zero,
+the value is an ordinary un-<acronym class="acronym">TOAST</acronym>ed value of the data type, and
+the remaining bits of the length word give the total datum size (including
+length word) in bytes. When the highest-order or lowest-order bit is set,
+the value has only a single-byte header instead of the normal four-byte
+header, and the remaining bits of that byte give the total datum size
+(including length byte) in bytes. This alternative supports space-efficient
+storage of values shorter than 127 bytes, while still allowing the data type
+to grow to 1 GB at need. Values with single-byte headers aren't aligned on
+any particular boundary, whereas values with four-byte headers are aligned on
+at least a four-byte boundary; this omission of alignment padding provides
+additional space savings that is significant compared to short values.
+As a special case, if the remaining bits of a single-byte header are all
+zero (which would be impossible for a self-inclusive length), the value is
+a pointer to out-of-line data, with several possible alternatives as
+described below. The type and size of such a <em class="firstterm">TOAST pointer</em>
+are determined by a code stored in the second byte of the datum.
+Lastly, when the highest-order or lowest-order bit is clear but the adjacent
+bit is set, the content of the datum has been compressed and must be
+decompressed before use. In this case the remaining bits of the four-byte
+length word give the total size of the compressed datum, not the
+original data. Note that compression is also possible for out-of-line data
+but the varlena header does not tell whether it has occurred —
+the content of the <acronym class="acronym">TOAST</acronym> pointer tells that, instead.
+</p><p>
+As mentioned, there are multiple types of <acronym class="acronym">TOAST</acronym> pointer datums.
+The oldest and most common type is a pointer to out-of-line data stored in
+a <em class="firstterm"><acronym class="acronym">TOAST</acronym> table</em> that is separate from, but
+associated with, the table containing the <acronym class="acronym">TOAST</acronym> pointer datum
+itself. These <em class="firstterm">on-disk</em> pointer datums are created by the
+<acronym class="acronym">TOAST</acronym> management code (in <code class="filename">access/common/toast_internals.c</code>)
+when a tuple to be stored on disk is too large to be stored as-is.
+Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-ONDISK" title="69.2.1. Out-of-Line, On-Disk TOAST Storage">Section 69.2.1</a>.
+Alternatively, a <acronym class="acronym">TOAST</acronym> pointer datum can contain a pointer to
+out-of-line data that appears elsewhere in memory. Such datums are
+necessarily short-lived, and will never appear on-disk, but they are very
+useful for avoiding copying and redundant processing of large data values.
+Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-INMEMORY" title="69.2.2. Out-of-Line, In-Memory TOAST Storage">Section 69.2.2</a>.
+</p><p>
+The compression technique used for either in-line or out-of-line compressed
+data is a fairly simple and very fast member
+of the LZ family of compression techniques. See
+<code class="filename">src/common/pg_lzcompress.c</code> for the details.
+</p><div class="sect2" id="STORAGE-TOAST-ONDISK"><div class="titlepage"><div><div><h3 class="title">69.2.1. Out-of-Line, On-Disk TOAST Storage</h3></div></div></div><p>
+If any of the columns of a table are <acronym class="acronym">TOAST</acronym>-able, the table will
+have an associated <acronym class="acronym">TOAST</acronym> table, whose OID is stored in the table's
+<code class="structname">pg_class</code>.<code class="structfield">reltoastrelid</code> entry. On-disk
+<acronym class="acronym">TOAST</acronym>ed values are kept in the <acronym class="acronym">TOAST</acronym> table, as
+described in more detail below.
+</p><p>
+Out-of-line values are divided (after compression if used) into chunks of at
+most <code class="symbol">TOAST_MAX_CHUNK_SIZE</code> bytes (by default this value is chosen
+so that four chunk rows will fit on a page, making it about 2000 bytes).
+Each chunk is stored as a separate row in the <acronym class="acronym">TOAST</acronym> table
+belonging to the owning table. Every
+<acronym class="acronym">TOAST</acronym> table has the columns <code class="structfield">chunk_id</code> (an OID
+identifying the particular <acronym class="acronym">TOAST</acronym>ed value),
+<code class="structfield">chunk_seq</code> (a sequence number for the chunk within its value),
+and <code class="structfield">chunk_data</code> (the actual data of the chunk). A unique index
+on <code class="structfield">chunk_id</code> and <code class="structfield">chunk_seq</code> provides fast
+retrieval of the values. A pointer datum representing an out-of-line on-disk
+<acronym class="acronym">TOAST</acronym>ed value therefore needs to store the OID of the
+<acronym class="acronym">TOAST</acronym> table in which to look and the OID of the specific value
+(its <code class="structfield">chunk_id</code>). For convenience, pointer datums also store the
+logical datum size (original uncompressed data length) and physical stored size
+(different if compression was applied). Allowing for the varlena header bytes,
+the total size of an on-disk <acronym class="acronym">TOAST</acronym> pointer datum is therefore 18
+bytes regardless of the actual size of the represented value.
+</p><p>
+The <acronym class="acronym">TOAST</acronym> management code is triggered only
+when a row value to be stored in a table is wider than
+<code class="symbol">TOAST_TUPLE_THRESHOLD</code> bytes (normally 2 kB).
+The <acronym class="acronym">TOAST</acronym> code will compress and/or move
+field values out-of-line until the row value is shorter than
+<code class="symbol">TOAST_TUPLE_TARGET</code> bytes (also normally 2 kB, adjustable)
+or no more gains can be had. During an UPDATE
+operation, values of unchanged fields are normally preserved as-is; so an
+UPDATE of a row with out-of-line values incurs no <acronym class="acronym">TOAST</acronym> costs if
+none of the out-of-line values change.
+</p><p>
+The <acronym class="acronym">TOAST</acronym> management code recognizes four different strategies
+for storing <acronym class="acronym">TOAST</acronym>-able columns on disk:
+
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ <code class="literal">PLAIN</code> prevents either compression or
+ out-of-line storage; furthermore it disables use of single-byte headers
+ for varlena types.
+ This is the only possible strategy for
+ columns of non-<acronym class="acronym">TOAST</acronym>-able data types.
+ </p></li><li class="listitem"><p>
+ <code class="literal">EXTENDED</code> allows both compression and out-of-line
+ storage. This is the default for most <acronym class="acronym">TOAST</acronym>-able data types.
+ Compression will be attempted first, then out-of-line storage if
+ the row is still too big.
+ </p></li><li class="listitem"><p>
+ <code class="literal">EXTERNAL</code> allows out-of-line storage but not
+ compression. Use of <code class="literal">EXTERNAL</code> will
+ make substring operations on wide <code class="type">text</code> and
+ <code class="type">bytea</code> columns faster (at the penalty of increased storage
+ space) because these operations are optimized to fetch only the
+ required parts of the out-of-line value when it is not compressed.
+ </p></li><li class="listitem"><p>
+ <code class="literal">MAIN</code> allows compression but not out-of-line
+ storage. (Actually, out-of-line storage will still be performed
+ for such columns, but only as a last resort when there is no other
+ way to make the row small enough to fit on a page.)
+ </p></li></ul></div><p>
+
+Each <acronym class="acronym">TOAST</acronym>-able data type specifies a default strategy for columns
+of that data type, but the strategy for a given table column can be altered
+with <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET STORAGE</code></a>.
+</p><p>
+<code class="symbol">TOAST_TUPLE_TARGET</code> can be adjusted for each table using
+<a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET (toast_tuple_target = N)</code></a>
+</p><p>
+This scheme has a number of advantages compared to a more straightforward
+approach such as allowing row values to span pages. Assuming that queries are
+usually qualified by comparisons against relatively small key values, most of
+the work of the executor will be done using the main row entry. The big values
+of <acronym class="acronym">TOAST</acronym>ed attributes will only be pulled out (if selected at all)
+at the time the result set is sent to the client. Thus, the main table is much
+smaller and more of its rows fit in the shared buffer cache than would be the
+case without any out-of-line storage. Sort sets shrink also, and sorts will
+more often be done entirely in memory. A little test showed that a table
+containing typical HTML pages and their URLs was stored in about half of the
+raw data size including the <acronym class="acronym">TOAST</acronym> table, and that the main table
+contained only about 10% of the entire data (the URLs and some small HTML
+pages). There was no run time difference compared to an un-<acronym class="acronym">TOAST</acronym>ed
+comparison table, in which all the HTML pages were cut down to 7 kB to fit.
+</p></div><div class="sect2" id="STORAGE-TOAST-INMEMORY"><div class="titlepage"><div><div><h3 class="title">69.2.2. Out-of-Line, In-Memory TOAST Storage</h3></div></div></div><p>
+<acronym class="acronym">TOAST</acronym> pointers can point to data that is not on disk, but is
+elsewhere in the memory of the current server process. Such pointers
+obviously cannot be long-lived, but they are nonetheless useful. There
+are currently two sub-cases:
+pointers to <em class="firstterm">indirect</em> data and
+pointers to <em class="firstterm">expanded</em> data.
+</p><p>
+Indirect <acronym class="acronym">TOAST</acronym> pointers simply point at a non-indirect varlena
+value stored somewhere in memory. This case was originally created merely
+as a proof of concept, but it is currently used during logical decoding to
+avoid possibly having to create physical tuples exceeding 1 GB (as pulling
+all out-of-line field values into the tuple might do). The case is of
+limited use since the creator of the pointer datum is entirely responsible
+that the referenced data survives for as long as the pointer could exist,
+and there is no infrastructure to help with this.
+</p><p>
+Expanded <acronym class="acronym">TOAST</acronym> pointers are useful for complex data types
+whose on-disk representation is not especially suited for computational
+purposes. As an example, the standard varlena representation of a
+<span class="productname">PostgreSQL</span> array includes dimensionality information, a
+nulls bitmap if there are any null elements, then the values of all the
+elements in order. When the element type itself is variable-length, the
+only way to find the <em class="replaceable"><code>N</code></em>'th element is to scan through all the
+preceding elements. This representation is appropriate for on-disk storage
+because of its compactness, but for computations with the array it's much
+nicer to have an <span class="quote">“<span class="quote">expanded</span>”</span> or <span class="quote">“<span class="quote">deconstructed</span>”</span>
+representation in which all the element starting locations have been
+identified. The <acronym class="acronym">TOAST</acronym> pointer mechanism supports this need by
+allowing a pass-by-reference Datum to point to either a standard varlena
+value (the on-disk representation) or a <acronym class="acronym">TOAST</acronym> pointer that
+points to an expanded representation somewhere in memory. The details of
+this expanded representation are up to the data type, though it must have
+a standard header and meet the other API requirements given
+in <code class="filename">src/include/utils/expandeddatum.h</code>. C-level functions
+working with the data type can choose to handle either representation.
+Functions that do not know about the expanded representation, but simply
+apply <code class="function">PG_DETOAST_DATUM</code> to their inputs, will automatically
+receive the traditional varlena representation; so support for an expanded
+representation can be introduced incrementally, one function at a time.
+</p><p>
+<acronym class="acronym">TOAST</acronym> pointers to expanded values are further broken down
+into <em class="firstterm">read-write</em> and <em class="firstterm">read-only</em> pointers.
+The pointed-to representation is the same either way, but a function that
+receives a read-write pointer is allowed to modify the referenced value
+in-place, whereas one that receives a read-only pointer must not; it must
+first create a copy if it wants to make a modified version of the value.
+This distinction and some associated conventions make it possible to avoid
+unnecessary copying of expanded values during query execution.
+</p><p>
+For all types of in-memory <acronym class="acronym">TOAST</acronym> pointer, the <acronym class="acronym">TOAST</acronym>
+management code ensures that no such pointer datum can accidentally get
+stored on disk. In-memory <acronym class="acronym">TOAST</acronym> pointers are automatically
+expanded to normal in-line varlena values before storage — and then
+possibly converted to on-disk <acronym class="acronym">TOAST</acronym> pointers, if the containing
+tuple would otherwise be too big.
+</p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="storage-file-layout.html" title="69.1. Database File Layout">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="storage.html" title="Chapter 69. Database Physical Storage">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="storage-fsm.html" title="69.3. Free Space Map">Next</a></td></tr><tr><td width="40%" align="left" valign="top">69.1. Database File Layout </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 69.3. Free Space Map</td></tr></table></div></body></html> \ No newline at end of file