summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/hash-intro.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/hash-intro.html')
-rw-r--r--doc/src/sgml/html/hash-intro.html77
1 files changed, 77 insertions, 0 deletions
diff --git a/doc/src/sgml/html/hash-intro.html b/doc/src/sgml/html/hash-intro.html
new file mode 100644
index 0000000..ce9232b
--- /dev/null
+++ b/doc/src/sgml/html/hash-intro.html
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>72.1. Overview</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="hash-index.html" title="Chapter 72. Hash Indexes" /><link rel="next" href="hash-implementation.html" title="72.2. Implementation" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">72.1. Overview</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><th width="60%" align="center">Chapter 72. Hash Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr></table><hr /></div><div class="sect1" id="HASH-INTRO"><div class="titlepage"><div><div><h2 class="title" style="clear: both">72.1. Overview</h2></div></div></div><p>
+ <span class="productname">PostgreSQL</span>
+ includes an implementation of persistent on-disk hash indexes,
+ which are fully crash recoverable. Any data type can be indexed by a
+ hash index, including data types that do not have a well-defined linear
+ ordering. Hash indexes store only the hash value of the data being
+ indexed, thus there are no restrictions on the size of the data column
+ being indexed.
+ </p><p>
+ Hash indexes support only single-column indexes and do not allow
+ uniqueness checking.
+ </p><p>
+ Hash indexes support only the <code class="literal">=</code> operator,
+ so WHERE clauses that specify range operations will not be able to take
+ advantage of hash indexes.
+ </p><p>
+ Each hash index tuple stores just the 4-byte hash value, not the actual
+ column value. As a result, hash indexes may be much smaller than B-trees
+ when indexing longer data items such as UUIDs, URLs, etc. The absence of
+ the column value also makes all hash index scans lossy. Hash indexes may
+ take part in bitmap index scans and backward scans.
+ </p><p>
+ Hash indexes are best optimized for SELECT and UPDATE-heavy workloads
+ that use equality scans on larger tables. In a B-tree index, searches must
+ descend through the tree until the leaf page is found. In tables with
+ millions of rows, this descent can increase access time to data. The
+ equivalent of a leaf page in a hash index is referred to as a bucket page. In
+ contrast, a hash index allows accessing the bucket pages directly,
+ thereby potentially reducing index access time in larger tables. This
+ reduction in "logical I/O" becomes even more pronounced on indexes/data
+ larger than shared_buffers/RAM.
+ </p><p>
+ Hash indexes have been designed to cope with uneven distributions of
+ hash values. Direct access to the bucket pages works well if the hash
+ values are evenly distributed. When inserts mean that the bucket page
+ becomes full, additional overflow pages are chained to that specific
+ bucket page, locally expanding the storage for index tuples that match
+ that hash value. When scanning a hash bucket during queries, we need to
+ scan through all of the overflow pages. Thus an unbalanced hash index
+ might actually be worse than a B-tree in terms of number of block
+ accesses required, for some data.
+ </p><p>
+ As a result of the overflow cases, we can say that hash indexes are
+ most suitable for unique, nearly unique data or data with a low number
+ of rows per hash bucket.
+ One possible way to avoid problems is to exclude highly non-unique
+ values from the index using a partial index condition, but this may
+ not be suitable in many cases.
+ </p><p>
+ Like B-Trees, hash indexes perform simple index tuple deletion. This
+ is a deferred maintenance operation that deletes index tuples that are
+ known to be safe to delete (those whose item identifier's LP_DEAD bit
+ is already set). If an insert finds no space is available on a page we
+ try to avoid creating a new overflow page by attempting to remove dead
+ index tuples. Removal cannot occur if the page is pinned at that time.
+ Deletion of dead index pointers also occurs during VACUUM.
+ </p><p>
+ If it can, VACUUM will also try to squeeze the index tuples onto as
+ few overflow pages as possible, minimizing the overflow chain. If an
+ overflow page becomes empty, overflow pages can be recycled for reuse
+ in other buckets, though we never return them to the operating system.
+ There is currently no provision to shrink a hash index, other than by
+ rebuilding it with REINDEX.
+ There is no provision for reducing the number of buckets, either.
+ </p><p>
+ Hash indexes may expand the number of bucket pages as the number of
+ rows indexed grows. The hash key-to-bucket-number mapping is chosen so that
+ the index can be incrementally expanded. When a new bucket is to be added to
+ the index, exactly one existing bucket will need to be "split", with some of
+ its tuples being transferred to the new bucket according to the updated
+ key-to-bucket-number mapping.
+ </p><p>
+ The expansion occurs in the foreground, which could increase execution
+ time for user inserts. Thus, hash indexes may not be suitable for tables
+ with rapidly increasing number of rows.
+ </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 72. Hash Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 72.2. Implementation</td></tr></table></div></body></html> \ No newline at end of file