diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
commit | 293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch) | |
tree | fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/hash-intro.html | |
parent | Initial commit. (diff) | |
download | postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip |
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/hash-intro.html')
-rw-r--r-- | doc/src/sgml/html/hash-intro.html | 77 |
1 files changed, 77 insertions, 0 deletions
diff --git a/doc/src/sgml/html/hash-intro.html b/doc/src/sgml/html/hash-intro.html new file mode 100644 index 0000000..f3a6968 --- /dev/null +++ b/doc/src/sgml/html/hash-intro.html @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>72.1. Overview</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="hash-index.html" title="Chapter 72. Hash Indexes" /><link rel="next" href="hash-implementation.html" title="72.2. Implementation" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">72.1. Overview</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><th width="60%" align="center">Chapter 72. Hash Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr></table><hr /></div><div class="sect1" id="HASH-INTRO"><div class="titlepage"><div><div><h2 class="title" style="clear: both">72.1. Overview <a href="#HASH-INTRO" class="id_link">#</a></h2></div></div></div><p> + <span class="productname">PostgreSQL</span> + includes an implementation of persistent on-disk hash indexes, + which are fully crash recoverable. Any data type can be indexed by a + hash index, including data types that do not have a well-defined linear + ordering. Hash indexes store only the hash value of the data being + indexed, thus there are no restrictions on the size of the data column + being indexed. + </p><p> + Hash indexes support only single-column indexes and do not allow + uniqueness checking. + </p><p> + Hash indexes support only the <code class="literal">=</code> operator, + so WHERE clauses that specify range operations will not be able to take + advantage of hash indexes. + </p><p> + Each hash index tuple stores just the 4-byte hash value, not the actual + column value. As a result, hash indexes may be much smaller than B-trees + when indexing longer data items such as UUIDs, URLs, etc. The absence of + the column value also makes all hash index scans lossy. Hash indexes may + take part in bitmap index scans and backward scans. + </p><p> + Hash indexes are best optimized for SELECT and UPDATE-heavy workloads + that use equality scans on larger tables. In a B-tree index, searches must + descend through the tree until the leaf page is found. In tables with + millions of rows, this descent can increase access time to data. The + equivalent of a leaf page in a hash index is referred to as a bucket page. In + contrast, a hash index allows accessing the bucket pages directly, + thereby potentially reducing index access time in larger tables. This + reduction in "logical I/O" becomes even more pronounced on indexes/data + larger than shared_buffers/RAM. + </p><p> + Hash indexes have been designed to cope with uneven distributions of + hash values. Direct access to the bucket pages works well if the hash + values are evenly distributed. When inserts mean that the bucket page + becomes full, additional overflow pages are chained to that specific + bucket page, locally expanding the storage for index tuples that match + that hash value. When scanning a hash bucket during queries, we need to + scan through all of the overflow pages. Thus an unbalanced hash index + might actually be worse than a B-tree in terms of number of block + accesses required, for some data. + </p><p> + As a result of the overflow cases, we can say that hash indexes are + most suitable for unique, nearly unique data or data with a low number + of rows per hash bucket. + One possible way to avoid problems is to exclude highly non-unique + values from the index using a partial index condition, but this may + not be suitable in many cases. + </p><p> + Like B-Trees, hash indexes perform simple index tuple deletion. This + is a deferred maintenance operation that deletes index tuples that are + known to be safe to delete (those whose item identifier's LP_DEAD bit + is already set). If an insert finds no space is available on a page we + try to avoid creating a new overflow page by attempting to remove dead + index tuples. Removal cannot occur if the page is pinned at that time. + Deletion of dead index pointers also occurs during VACUUM. + </p><p> + If it can, VACUUM will also try to squeeze the index tuples onto as + few overflow pages as possible, minimizing the overflow chain. If an + overflow page becomes empty, overflow pages can be recycled for reuse + in other buckets, though we never return them to the operating system. + There is currently no provision to shrink a hash index, other than by + rebuilding it with REINDEX. + There is no provision for reducing the number of buckets, either. + </p><p> + Hash indexes may expand the number of bucket pages as the number of + rows indexed grows. The hash key-to-bucket-number mapping is chosen so that + the index can be incrementally expanded. When a new bucket is to be added to + the index, exactly one existing bucket will need to be "split", with some of + its tuples being transferred to the new bucket according to the updated + key-to-bucket-number mapping. + </p><p> + The expansion occurs in the foreground, which could increase execution + time for user inserts. Thus, hash indexes may not be suitable for tables + with rapidly increasing number of rows. + </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="hash-index.html" title="Chapter 72. Hash Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="hash-index.html" title="Chapter 72. Hash Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hash-implementation.html" title="72.2. Implementation">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 72. Hash Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 72.2. Implementation</td></tr></table></div></body></html>
\ No newline at end of file |