summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/hash-intro.html
blob: 2415791b0ff7f36755a5d6ca98cc0c966930894d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>72.1. Overview</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="hash-index.html" title="Chapter 72. Hash Indexes" /><link rel="next" href="hash-implementation.html" title="72.2. Implementation" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">72.1. Overview</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><th width="60%" align="center">Chapter 72. Hash Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.7 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr></table><hr /></div><div class="sect1" id="HASH-INTRO"><div class="titlepage"><div><div><h2 class="title" style="clear: both">72.1. Overview</h2></div></div></div><p>
  <span class="productname">PostgreSQL</span>
  includes an implementation of persistent on-disk hash indexes,
  which are fully crash recoverable. Any data type can be indexed by a
  hash index, including data types that do not have a well-defined linear
  ordering. Hash indexes store only the hash value of the data being
  indexed, thus there are no restrictions on the size of the data column
  being indexed.
 </p><p>
  Hash indexes support only single-column indexes and do not allow
  uniqueness checking.
 </p><p>
  Hash indexes support only the <code class="literal">=</code> operator,
  so WHERE clauses that specify range operations will not be able to take
  advantage of hash indexes.
 </p><p>
  Each hash index tuple stores just the 4-byte hash value, not the actual
  column value. As a result, hash indexes may be much smaller than B-trees
  when indexing longer data items such as UUIDs, URLs, etc. The absence of
  the column value also makes all hash index scans lossy. Hash indexes may
  take part in bitmap index scans and backward scans.
 </p><p>
  Hash indexes are best optimized for SELECT and UPDATE-heavy workloads
  that use equality scans on larger tables. In a B-tree index, searches must
  descend through the tree until the leaf page is found. In tables with
  millions of rows, this descent can increase access time to data. The
  equivalent of a leaf page in a hash index is referred to as a bucket page. In
  contrast, a hash index allows accessing the bucket pages directly,
  thereby potentially reducing index access time in larger tables. This
  reduction in "logical I/O" becomes even more pronounced on indexes/data
  larger than shared_buffers/RAM.
 </p><p>
  Hash indexes have been designed to cope with uneven distributions of
  hash values. Direct access to the bucket pages works well if the hash
  values are evenly distributed. When inserts mean that the bucket page
  becomes full, additional overflow pages are chained to that specific
  bucket page, locally expanding the storage for index tuples that match
  that hash value. When scanning a hash bucket during queries, we need to
  scan through all of the overflow pages. Thus an unbalanced hash index
  might actually be worse than a B-tree in terms of number of block
  accesses required, for some data.
 </p><p>
  As a result of the overflow cases, we can say that hash indexes are
  most suitable for unique, nearly unique data or data with a low number
  of rows per hash bucket.
  One possible way to avoid problems is to exclude highly non-unique
  values from the index using a partial index condition, but this may
  not be suitable in many cases.
 </p><p>
  Like B-Trees, hash indexes perform simple index tuple deletion. This
  is a deferred maintenance operation that deletes index tuples that are
  known to be safe to delete (those whose item identifier's LP_DEAD bit
  is already set). If an insert finds no space is available on a page we
  try to avoid creating a new overflow page by attempting to remove dead
  index tuples. Removal cannot occur if the page is pinned at that time.
  Deletion of dead index pointers also occurs during VACUUM.
 </p><p>
  If it can, VACUUM will also try to squeeze the index tuples onto as
  few overflow pages as possible, minimizing the overflow chain. If an
  overflow page becomes empty, overflow pages can be recycled for reuse
  in other buckets, though we never return them to the operating system.
  There is currently no provision to shrink a hash index, other than by
  rebuilding it with REINDEX.
  There is no provision for reducing the number of buckets, either.
 </p><p>
  Hash indexes may expand the number of bucket pages as the number of
  rows indexed grows. The hash key-to-bucket-number mapping is chosen so that
  the index can be incrementally expanded. When a new bucket is to be added to
  the index, exactly one existing bucket will need to be "split", with some of
  its tuples being transferred to the new bucket according to the updated
  key-to-bucket-number mapping.
 </p><p>
  The expansion occurs in the foreground, which could increase execution
  time for user inserts. Thus, hash indexes may not be suitable for tables
  with rapidly increasing number of rows.
 </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 72. Hash Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.7 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 72.2. Implementation</td></tr></table></div></body></html>