diff options
Diffstat (limited to 'doc/src/sgml/html/storage-page-layout.html')
-rw-r--r-- | doc/src/sgml/html/storage-page-layout.html | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/doc/src/sgml/html/storage-page-layout.html b/doc/src/sgml/html/storage-page-layout.html new file mode 100644 index 0000000..e6fb04c --- /dev/null +++ b/doc/src/sgml/html/storage-page-layout.html @@ -0,0 +1,157 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>73.6. Database Page Layout</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="storage-init.html" title="73.5. The Initialization Fork" /><link rel="next" href="storage-hot.html" title="73.7. Heap-Only Tuples (HOT)" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">73.6. Database Page Layout</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="storage-init.html" title="73.5. The Initialization Fork">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="storage.html" title="Chapter 73. Database Physical Storage">Up</a></td><th width="60%" align="center">Chapter 73. Database Physical Storage</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="storage-hot.html" title="73.7. Heap-Only Tuples (HOT)">Next</a></td></tr></table><hr /></div><div class="sect1" id="STORAGE-PAGE-LAYOUT"><div class="titlepage"><div><div><h2 class="title" style="clear: both">73.6. Database Page Layout</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="storage-page-layout.html#STORAGE-TUPLE-LAYOUT">73.6.1. Table Row Layout</a></span></dt></dl></div><p> +This section provides an overview of the page format used within +<span class="productname">PostgreSQL</span> tables and indexes.<a href="#ftn.id-1.10.24.8.2.2" class="footnote"><sup class="footnote" id="id-1.10.24.8.2.2">[17]</sup></a> +Sequences and <acronym class="acronym">TOAST</acronym> tables are formatted just like a regular table. +</p><p> +In the following explanation, a +<em class="firstterm">byte</em> +is assumed to contain 8 bits. In addition, the term +<em class="firstterm">item</em> +refers to an individual data value that is stored on a page. In a table, +an item is a row; in an index, an item is an index entry. +</p><p> +Every table and index is stored as an array of <em class="firstterm">pages</em> of a +fixed size (usually 8 kB, although a different page size can be selected +when compiling the server). In a table, all the pages are logically +equivalent, so a particular item (row) can be stored in any page. In +indexes, the first page is generally reserved as a <em class="firstterm">metapage</em> +holding control information, and there can be different types of pages +within the index, depending on the index access method. +</p><p> +<a class="xref" href="storage-page-layout.html#PAGE-TABLE" title="Table 73.2. Overall Page Layout">Table 73.2</a> shows the overall layout of a page. +There are five parts to each page. +</p><div class="table" id="PAGE-TABLE"><p class="title"><strong>Table 73.2. Overall Page Layout</strong></p><div class="table-contents"><table class="table" summary="Overall Page Layout" border="1"><colgroup><col /><col /></colgroup><thead><tr><th> +Item +</th><th>Description</th></tr></thead><tbody><tr><td>PageHeaderData</td><td>24 bytes long. Contains general information about the page, including +free space pointers.</td></tr><tr><td>ItemIdData</td><td>Array of item identifiers pointing to the actual items. Each +entry is an (offset,length) pair. 4 bytes per item.</td></tr><tr><td>Free space</td><td>The unallocated space. New item identifiers are allocated from +the start of this area, new items from the end.</td></tr><tr><td>Items</td><td>The actual items themselves.</td></tr><tr><td>Special space</td><td>Index access method specific data. Different methods store different +data. Empty in ordinary tables.</td></tr></tbody></table></div></div><br class="table-break" /><p> + + The first 24 bytes of each page consists of a page header + (<code class="structname">PageHeaderData</code>). Its format is detailed in <a class="xref" href="storage-page-layout.html#PAGEHEADERDATA-TABLE" title="Table 73.3. PageHeaderData Layout">Table 73.3</a>. The first field tracks the most + recent WAL entry related to this page. The second field contains + the page checksum if <a class="xref" href="app-initdb.html#APP-INITDB-DATA-CHECKSUMS">data checksums</a> are + enabled. Next is a 2-byte field containing flag bits. This is followed + by three 2-byte integer fields (<code class="structfield">pd_lower</code>, + <code class="structfield">pd_upper</code>, and + <code class="structfield">pd_special</code>). These contain byte offsets + from the page start to the start of unallocated space, to the end of + unallocated space, and to the start of the special space. The next 2 + bytes of the page header, <code class="structfield">pd_pagesize_version</code>, + store both the page size and a version indicator. Beginning with + <span class="productname">PostgreSQL</span> 8.3 the version number is 4; + <span class="productname">PostgreSQL</span> 8.1 and 8.2 used version number 3; + <span class="productname">PostgreSQL</span> 8.0 used version number 2; + <span class="productname">PostgreSQL</span> 7.3 and 7.4 used version number 1; + prior releases used version number 0. + (The basic page layout and header format has not changed in most of these + versions, but the layout of heap row headers has.) The page size + is basically only present as a cross-check; there is no support for having + more than one page size in an installation. + The last field is a hint that shows whether pruning the page is likely + to be profitable: it tracks the oldest un-pruned XMAX on the page. + + </p><div class="table" id="PAGEHEADERDATA-TABLE"><p class="title"><strong>Table 73.3. PageHeaderData Layout</strong></p><div class="table-contents"><table class="table" summary="PageHeaderData Layout" border="1"><colgroup><col /><col /><col /><col /></colgroup><thead><tr><th>Field</th><th>Type</th><th>Length</th><th>Description</th></tr></thead><tbody><tr><td>pd_lsn</td><td>PageXLogRecPtr</td><td>8 bytes</td><td>LSN: next byte after last byte of WAL record for last change + to this page</td></tr><tr><td>pd_checksum</td><td>uint16</td><td>2 bytes</td><td>Page checksum</td></tr><tr><td>pd_flags</td><td>uint16</td><td>2 bytes</td><td>Flag bits</td></tr><tr><td>pd_lower</td><td>LocationIndex</td><td>2 bytes</td><td>Offset to start of free space</td></tr><tr><td>pd_upper</td><td>LocationIndex</td><td>2 bytes</td><td>Offset to end of free space</td></tr><tr><td>pd_special</td><td>LocationIndex</td><td>2 bytes</td><td>Offset to start of special space</td></tr><tr><td>pd_pagesize_version</td><td>uint16</td><td>2 bytes</td><td>Page size and layout version number information</td></tr><tr><td>pd_prune_xid</td><td>TransactionId</td><td>4 bytes</td><td>Oldest unpruned XMAX on page, or zero if none</td></tr></tbody></table></div></div><br class="table-break" /><p> + All the details can be found in + <code class="filename">src/include/storage/bufpage.h</code>. + </p><p> + Following the page header are item identifiers + (<code class="type">ItemIdData</code>), each requiring four bytes. + An item identifier contains a byte-offset to + the start of an item, its length in bytes, and a few attribute bits + which affect its interpretation. + New item identifiers are allocated + as needed from the beginning of the unallocated space. + The number of item identifiers present can be determined by looking at + <code class="structfield">pd_lower</code>, which is increased to allocate a new identifier. + Because an item + identifier is never moved until it is freed, its index can be used on a + long-term basis to reference an item, even when the item itself is moved + around on the page to compact free space. In fact, every pointer to an + item (<code class="type">ItemPointer</code>, also known as + <code class="type">CTID</code>) created by + <span class="productname">PostgreSQL</span> consists of a page number and the + index of an item identifier. + + </p><p> + + The items themselves are stored in space allocated backwards from the end + of unallocated space. The exact structure varies depending on what the + table is to contain. Tables and sequences both use a structure named + <code class="type">HeapTupleHeaderData</code>, described below. + + </p><p> + + The final section is the <span class="quote">“<span class="quote">special section</span>”</span> which can + contain anything the access method wishes to store. For example, + b-tree indexes store links to the page's left and right siblings, + as well as some other data relevant to the index structure. + Ordinary tables do not use a special section at all (indicated by setting + <code class="structfield">pd_special</code> to equal the page size). + + </p><p> + <a class="xref" href="storage-page-layout.html#STORAGE-PAGE-LAYOUT-FIGURE" title="Figure 73.1. Page Layout">Figure 73.1</a> illustrates how these parts are + laid out in a page. + </p><div class="figure" id="STORAGE-PAGE-LAYOUT-FIGURE"><p class="title"><strong>Figure 73.1. Page Layout</strong></p><div class="figure-contents"><div class="mediaobject"><object type="image/svg+xml" data="pagelayout.svg" width="100%"></object></div></div></div><br class="figure-break" /><div class="sect2" id="STORAGE-TUPLE-LAYOUT"><div class="titlepage"><div><div><h3 class="title">73.6.1. Table Row Layout</h3></div></div></div><p> + + All table rows are structured in the same way. There is a fixed-size + header (occupying 23 bytes on most machines), followed by an optional null + bitmap, an optional object ID field, and the user data. The header is + detailed + in <a class="xref" href="storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE" title="Table 73.4. HeapTupleHeaderData Layout">Table 73.4</a>. The actual user data + (columns of the row) begins at the offset indicated by + <code class="structfield">t_hoff</code>, which must always be a multiple of the MAXALIGN + distance for the platform. + The null bitmap is + only present if the <em class="firstterm">HEAP_HASNULL</em> bit is set in + <code class="structfield">t_infomask</code>. If it is present it begins just after + the fixed header and occupies enough bytes to have one bit per data column + (that is, the number of bits that equals the attribute count in + <code class="structfield">t_infomask2</code>). In this list of bits, a + 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not + present, all columns are assumed not-null. + The object ID is only present if the <em class="firstterm">HEAP_HASOID_OLD</em> bit + is set in <code class="structfield">t_infomask</code>. If present, it appears just + before the <code class="structfield">t_hoff</code> boundary. Any padding needed to make + <code class="structfield">t_hoff</code> a MAXALIGN multiple will appear between the null + bitmap and the object ID. (This in turn ensures that the object ID is + suitably aligned.) + + </p><div class="table" id="HEAPTUPLEHEADERDATA-TABLE"><p class="title"><strong>Table 73.4. HeapTupleHeaderData Layout</strong></p><div class="table-contents"><table class="table" summary="HeapTupleHeaderData Layout" border="1"><colgroup><col /><col /><col /><col /></colgroup><thead><tr><th>Field</th><th>Type</th><th>Length</th><th>Description</th></tr></thead><tbody><tr><td>t_xmin</td><td>TransactionId</td><td>4 bytes</td><td>insert XID stamp</td></tr><tr><td>t_xmax</td><td>TransactionId</td><td>4 bytes</td><td>delete XID stamp</td></tr><tr><td>t_cid</td><td>CommandId</td><td>4 bytes</td><td>insert and/or delete CID stamp (overlays with t_xvac)</td></tr><tr><td>t_xvac</td><td>TransactionId</td><td>4 bytes</td><td>XID for VACUUM operation moving a row version</td></tr><tr><td>t_ctid</td><td>ItemPointerData</td><td>6 bytes</td><td>current TID of this or newer row version</td></tr><tr><td>t_infomask2</td><td>uint16</td><td>2 bytes</td><td>number of attributes, plus various flag bits</td></tr><tr><td>t_infomask</td><td>uint16</td><td>2 bytes</td><td>various flag bits</td></tr><tr><td>t_hoff</td><td>uint8</td><td>1 byte</td><td>offset to user data</td></tr></tbody></table></div></div><br class="table-break" /><p> + All the details can be found in + <code class="filename">src/include/access/htup_details.h</code>. + </p><p> + + Interpreting the actual data can only be done with information obtained + from other tables, mostly <code class="structname">pg_attribute</code>. The + key values needed to identify field locations are + <code class="structfield">attlen</code> and <code class="structfield">attalign</code>. + There is no way to directly get a + particular attribute, except when there are only fixed width fields and no + null values. All this trickery is wrapped up in the functions + <em class="firstterm">heap_getattr</em>, <em class="firstterm">fastgetattr</em> + and <em class="firstterm">heap_getsysattr</em>. + + </p><p> + + To read the data you need to examine each attribute in turn. First check + whether the field is NULL according to the null bitmap. If it is, go to + the next. Then make sure you have the right alignment. If the field is a + fixed width field, then all the bytes are simply placed. If it's a + variable length field (attlen = -1) then it's a bit more complicated. + All variable-length data types share the common header structure + <code class="type">struct varlena</code>, which includes the total length of the stored + value and some flag bits. Depending on the flags, the data can be either + inline or in a <acronym class="acronym">TOAST</acronym> table; + it might be compressed, too (see <a class="xref" href="storage-toast.html" title="73.2. TOAST">Section 73.2</a>). + + </p></div><div class="footnotes"><br /><hr style="width:100; text-align:left;margin-left: 0" /><div id="ftn.id-1.10.24.8.2.2" class="footnote"><p><a href="#id-1.10.24.8.2.2" class="para"><sup class="para">[17] </sup></a> + Actually, use of this page format is not required for either table or + index access methods. The <code class="literal">heap</code> table access method + always uses this format. All the existing index methods also use the + basic format, but the data kept on index metapages usually doesn't follow + the item layout rules. + </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="storage-init.html" title="73.5. The Initialization Fork">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="storage.html" title="Chapter 73. Database Physical Storage">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="storage-hot.html" title="73.7. Heap-Only Tuples (HOT)">Next</a></td></tr><tr><td width="40%" align="left" valign="top">73.5. The Initialization Fork </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 73.7. Heap-Only Tuples (<acronym class="acronym">HOT</acronym>)</td></tr></table></div></body></html>
\ No newline at end of file |