1 files changed, 190 insertions, 0 deletions
diff --git a/doc/src/sgml/html/bloom.html b/doc/src/sgml/html/bloom.html
new file mode 100644
index 0000000..18f6e95
--- /dev/null
+++ b/doc/src/sgml/html/bloom.html
@@ -0,0 +1,190 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.5. bloom</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="auto-explain.html" title="F.4. auto_explain" /><link rel="next" href="btree-gin.html" title="F.6. btree_gin" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.5. bloom</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="auto-explain.html" title="F.4. auto_explain">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.6. btree_gin">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="BLOOM"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.5. bloom</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.7">F.5.1. Parameters</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.8">F.5.2. Examples</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.9">F.5.3. Operator Class Interface</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.10">F.5.4. Limitations</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.11">F.5.5. Authors</a></span></dt></dl></div><a id="id-1.11.7.14.2" class="indexterm"></a><p>
+  <code class="literal">bloom</code> provides an index access method based on
+  <a class="ulink" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_top">Bloom filters</a>.
+ </p><p>
+  A Bloom filter is a space-efficient data structure that is used to test
+  whether an element is a member of a set.  In the case of an index access
+  method, it allows fast exclusion of non-matching tuples via signatures
+  whose size is determined at index creation.
+ </p><p>
+  A signature is a lossy representation of the indexed attribute(s), and as
+  such is prone to reporting false positives; that is, it may be reported
+  that an element is in the set, when it is not.  So index search results
+  must always be rechecked using the actual attribute values from the heap
+  entry.  Larger signatures reduce the odds of a false positive and thus
+  reduce the number of useless heap visits, but of course also make the index
+  larger and hence slower to scan.
+ </p><p>
+  This type of index is most useful when a table has many attributes and
+  queries test arbitrary combinations of them.  A traditional btree index is
+  faster than a bloom index, but it can require many btree indexes to support
+  all possible queries where one needs only a single bloom index.  Note
+  however that bloom indexes only support equality queries, whereas btree
+  indexes can also perform inequality and range searches.
+ </p><div class="sect2" id="id-1.11.7.14.7"><div class="titlepage"><div><div><h3 class="title">F.5.1. Parameters</h3></div></div></div><p>
+   A <code class="literal">bloom</code> index accepts the following parameters in its
+   <code class="literal">WITH</code> clause:
+  </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">length</code></span></dt><dd><p>
+      Length of each signature (index entry) in bits. It is rounded up to the
+      nearest multiple of <code class="literal">16</code>. The default is
+      <code class="literal">80</code> bits and the maximum is <code class="literal">4096</code>.
+     </p></dd></dl></div><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">col1 — col32</code></span></dt><dd><p>
+      Number of bits generated for each index column. Each parameter's name
+      refers to the number of the index column that it controls.  The default
+      is <code class="literal">2</code> bits and the maximum is <code class="literal">4095</code>.
+      Parameters for index columns not actually used are ignored.
+     </p></dd></dl></div></div><div class="sect2" id="id-1.11.7.14.8"><div class="titlepage"><div><div><h3 class="title">F.5.2. Examples</h3></div></div></div><p>
+   This is an example of creating a bloom index:
+  </p><pre class="programlisting">
+CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
+       WITH (length=80, col1=2, col2=2, col3=4);
+</pre><p>
+   The index is created with a signature length of 80 bits, with attributes
+   i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.  We could
+   have omitted the <code class="literal">length</code>, <code class="literal">col1</code>,
+   and <code class="literal">col2</code> specifications since those have the default values.
+  </p><p>
+   Here is a more complete example of bloom index definition and usage, as
+   well as a comparison with equivalent btree indexes.  The bloom index is
+   considerably smaller than the btree index, and can perform better.
+  </p><pre class="programlisting">
+=# CREATE TABLE tbloom AS
+   SELECT
+     (random() * 1000000)::int as i1,
+     (random() * 1000000)::int as i2,
+     (random() * 1000000)::int as i3,
+     (random() * 1000000)::int as i4,
+     (random() * 1000000)::int as i5,
+     (random() * 1000000)::int as i6
+   FROM
+  generate_series(1,10000000);
+SELECT 10000000
+</pre><p>
+   A sequential scan over this large table takes a long time:
+</p><pre class="programlisting">
+=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Seq Scan on tbloom  (cost=0.00..2137.14 rows=3 width=24) (actual time=16.971..16.971 rows=0 loops=1)
+   Filter: ((i2 = 898732) AND (i5 = 123451))
+   Rows Removed by Filter: 100000
+ Planning Time: 0.346 ms
+ Execution Time: 16.988 ms
+(5 rows)
+</pre><p>
+  </p><p>
+   Even with the btree index defined the result will still be a
+   sequential scan:
+</p><pre class="programlisting">
+=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
+CREATE INDEX
+=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
+ pg_size_pretty
+----------------
+ 3976 kB
+(1 row)
+=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Seq Scan on tbloom  (cost=0.00..2137.00 rows=2 width=24) (actual time=12.805..12.805 rows=0 loops=1)
+   Filter: ((i2 = 898732) AND (i5 = 123451))
+   Rows Removed by Filter: 100000
+ Planning Time: 0.138 ms
+ Execution Time: 12.817 ms
+(5 rows)
+</pre><p>
+  </p><p>
+   Having the bloom index defined on the table is better than btree in
+   handling this type of search:
+</p><pre class="programlisting">
+=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
+CREATE INDEX
+=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
+ pg_size_pretty
+----------------
+ 1584 kB
+(1 row)
+=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                                     QUERY PLAN                                                      
+---------------------------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on tbloom  (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1)
+   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
+   Rows Removed by Index Recheck: 29
+   Heap Blocks: exact=28
+   -&gt;  Bitmap Index Scan on bloomidx  (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1)
+         Index Cond: ((i2 = 898732) AND (i5 = 123451))
+ Planning Time: 0.099 ms
+ Execution Time: 0.408 ms
+(8 rows)
+</pre><p>
+  </p><p>
+   Now, the main problem with the btree search is that btree is inefficient
+   when the search conditions do not constrain the leading index column(s).
+   A better strategy for btree is to create a separate index on each column.
+   Then the planner will choose something like this:
+</p><pre class="programlisting">
+=# CREATE INDEX btreeidx1 ON tbloom (i1);
+CREATE INDEX
+=# CREATE INDEX btreeidx2 ON tbloom (i2);
+CREATE INDEX
+=# CREATE INDEX btreeidx3 ON tbloom (i3);
+CREATE INDEX
+=# CREATE INDEX btreeidx4 ON tbloom (i4);
+CREATE INDEX
+=# CREATE INDEX btreeidx5 ON tbloom (i5);
+CREATE INDEX
+=# CREATE INDEX btreeidx6 ON tbloom (i6);
+CREATE INDEX
+=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                                        QUERY PLAN                                                         
+---------------------------------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on tbloom  (cost=24.34..32.03 rows=2 width=24) (actual time=0.028..0.029 rows=0 loops=1)
+   Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
+   -&gt;  BitmapAnd  (cost=24.34..24.34 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
+         -&gt;  Bitmap Index Scan on btreeidx5  (cost=0.00..12.04 rows=500 width=0) (actual time=0.026..0.026 rows=0 loops=1)
+               Index Cond: (i5 = 123451)
+         -&gt;  Bitmap Index Scan on btreeidx2  (cost=0.00..12.04 rows=500 width=0) (never executed)
+               Index Cond: (i2 = 898732)
+ Planning Time: 0.491 ms
+ Execution Time: 0.055 ms
+(9 rows)
+</pre><p>
+   Although this query runs much faster than with either of the single
+   indexes, we pay a penalty in index size.  Each of the single-column
+   btree indexes occupies 2 MB, so the total space needed is 12 MB,
+   eight times the space used by the bloom index.
+  </p></div><div class="sect2" id="id-1.11.7.14.9"><div class="titlepage"><div><div><h3 class="title">F.5.3. Operator Class Interface</h3></div></div></div><p>
+   An operator class for bloom indexes requires only a hash function for the
+   indexed data type and an equality operator for searching. This example
+   shows the operator class definition for the <code class="type">text</code> data type:
+  </p><pre class="programlisting">
+CREATE OPERATOR CLASS text_ops
+DEFAULT FOR TYPE text USING bloom AS
+    OPERATOR    1   =(text, text),
+    FUNCTION    1   hashtext(text);
+</pre></div><div class="sect2" id="id-1.11.7.14.10"><div class="titlepage"><div><div><h3 class="title">F.5.4. Limitations</h3></div></div></div><p>
+   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+      Only operator classes for <code class="type">int4</code> and <code class="type">text</code> are
+      included with the module.
+     </p></li><li class="listitem"><p>
+      Only the <code class="literal">=</code> operator is supported for search.  But
+      it is possible to add support for arrays with union and intersection
+      operations in the future.
+     </p></li><li class="listitem"><p>
+       <code class="literal">bloom</code> access method doesn't support
+       <code class="literal">UNIQUE</code> indexes.
+     </p></li><li class="listitem"><p>
+       <code class="literal">bloom</code> access method doesn't support searching for
+       <code class="literal">NULL</code> values.
+     </p></li></ul></div><p>
+  </p></div><div class="sect2" id="id-1.11.7.14.11"><div class="titlepage"><div><div><h3 class="title">F.5.5. Authors</h3></div></div></div><p>
+   Teodor Sigaev <code class="email">&lt;<a class="email" href="mailto:teodor@postgrespro.ru">teodor@postgrespro.ru</a>&gt;</code>,
+   Postgres Professional, Moscow, Russia
+  </p><p>
+   Alexander Korotkov <code class="email">&lt;<a class="email" href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>&gt;</code>,
+   Postgres Professional, Moscow, Russia
+  </p><p>
+   Oleg Bartunov <code class="email">&lt;<a class="email" href="mailto:obartunov@postgrespro.ru">obartunov@postgrespro.ru</a>&gt;</code>,
+   Postgres Professional, Moscow, Russia
+  </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="auto-explain.html" title="F.4. auto_explain">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.6. btree_gin">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.4. auto_explain </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> F.6. btree_gin</td></tr></table></div></body></html>
+\ No newline at end of file