1 files changed, 77 insertions, 0 deletions
diff --git a/doc/src/sgml/html/hash-intro.html b/doc/src/sgml/html/hash-intro.html
new file mode 100644
index 0000000..cb29c74
--- /dev/null
+++ b/doc/src/sgml/html/hash-intro.html
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>69.1. Overview</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="hash-index.html" title="Chapter 69. Hash Indexes" /><link rel="next" href="hash-implementation.html" title="69.2. Implementation" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">69.1. Overview</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 69. Hash Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="hash-index.html" title="Chapter 69. Hash Indexes">Up</a></td><th width="60%" align="center">Chapter 69. Hash Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hash-implementation.html" title="69.2. Implementation">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="HASH-INTRO"><div class="titlepage"><div><div><h2 class="title" style="clear: both">69.1. Overview</h2></div></div></div><p>
+  <span class="productname">PostgreSQL</span>
+  includes an implementation of persistent on-disk hash indexes,
+  which are fully crash recoverable. Any data type can be indexed by a
+  hash index, including data types that do not have a well-defined linear
+  ordering. Hash indexes store only the hash value of the data being
+  indexed, thus there are no restrictions on the size of the data column
+  being indexed.
+ </p><p>
+  Hash indexes support only single-column indexes and do not allow
+  uniqueness checking.
+ </p><p>
+  Hash indexes support only the <code class="literal">=</code> operator,
+  so WHERE clauses that specify range operations will not be able to take
+  advantage of hash indexes.
+ </p><p>
+  Each hash index tuple stores just the 4-byte hash value, not the actual
+  column value. As a result, hash indexes may be much smaller than B-trees
+  when indexing longer data items such as UUIDs, URLs, etc. The absence of
+  the column value also makes all hash index scans lossy. Hash indexes may
+  take part in bitmap index scans and backward scans.
+ </p><p>
+  Hash indexes are best optimized for SELECT and UPDATE-heavy workloads
+  that use equality scans on larger tables. In a B-tree index, searches must
+  descend through the tree until the leaf page is found. In tables with
+  millions of rows, this descent can increase access time to data. The
+  equivalent of a leaf page in a hash index is referred to as a bucket page. In
+  contrast, a hash index allows accessing the bucket pages directly,
+  thereby potentially reducing index access time in larger tables. This
+  reduction in "logical I/O" becomes even more pronounced on indexes/data
+  larger than shared_buffers/RAM. 
+ </p><p>
+  Hash indexes have been designed to cope with uneven distributions of
+  hash values. Direct access to the bucket pages works well if the hash
+  values are evenly distributed. When inserts mean that the bucket page
+  becomes full, additional overflow pages are chained to that specific
+  bucket page, locally expanding the storage for index tuples that match
+  that hash value. When scanning a hash bucket during queries, we need to
+  scan through all of the overflow pages. Thus an unbalanced hash index
+  might actually be worse than a B-tree in terms of number of block
+  accesses required, for some data.
+ </p><p>
+  As a result of the overflow cases, we can say that hash indexes are
+  most suitable for unique, nearly unique data or data with a low number
+  of rows per hash bucket.
+  One possible way to avoid problems is to exclude highly non-unique
+  values from the index using a partial index condition, but this may
+  not be suitable in many cases.
+ </p><p>
+  Like B-Trees, hash indexes perform simple index tuple deletion. This
+  is a deferred maintenance operation that deletes index tuples that are
+  known to be safe to delete (those whose item identifier's LP_DEAD bit
+  is already set). If an insert finds no space is available on a page we
+  try to avoid creating a new overflow page by attempting to remove dead
+  index tuples. Removal cannot occur if the page is pinned at that time.
+  Deletion of dead index pointers also occurs during VACUUM.
+ </p><p>
+  If it can, VACUUM will also try to squeeze the index tuples onto as
+  few overflow pages as possible, minimizing the overflow chain. If an
+  overflow page becomes empty, overflow pages can be recycled for reuse
+  in other buckets, though we never return them to the operating system.
+  There is currently no provision to shrink a hash index, other than by
+  rebuilding it with REINDEX.  
+  There is no provision for reducing the number of buckets, either.
+ </p><p>
+  Hash indexes may expand the number of bucket pages as the number of
+  rows indexed grows. The hash key-to-bucket-number mapping is chosen so that
+  the index can be incrementally expanded. When a new bucket is to be added to
+  the index, exactly one existing bucket will need to be "split", with some of
+  its tuples being transferred to the new bucket according to the updated
+  key-to-bucket-number mapping.
+ </p><p>
+  The expansion occurs in the foreground, which could increase execution
+  time for user inserts. Thus, hash indexes may not be suitable for tables
+  with rapidly increasing number of rows.
+ </p></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 69. Hash Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="hash-index.html" title="Chapter 69. Hash Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hash-implementation.html" title="69.2. Implementation">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 69. Hash Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 69.2. Implementation</td></tr></table></div></body></html>
+\ No newline at end of file