summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/intagg.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/intagg.sgml')
-rw-r--r--doc/src/sgml/intagg.sgml131
1 files changed, 131 insertions, 0 deletions
diff --git a/doc/src/sgml/intagg.sgml b/doc/src/sgml/intagg.sgml
new file mode 100644
index 0000000..c410f64
--- /dev/null
+++ b/doc/src/sgml/intagg.sgml
@@ -0,0 +1,131 @@
+<!-- doc/src/sgml/intagg.sgml -->
+
+<sect1 id="intagg" xreflabel="intagg">
+ <title>intagg</title>
+
+ <indexterm zone="intagg">
+ <primary>intagg</primary>
+ </indexterm>
+
+ <para>
+ The <filename>intagg</filename> module provides an integer aggregator and an
+ enumerator. <filename>intagg</filename> is now obsolete, because there
+ are built-in functions that provide a superset of its capabilities.
+ However, the module is still provided as a compatibility wrapper around
+ the built-in functions.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <indexterm>
+ <primary>int_array_aggregate</primary>
+ </indexterm>
+
+ <indexterm>
+ <primary>array_agg</primary>
+ </indexterm>
+
+ <para>
+ The aggregator is an aggregate function
+ <function>int_array_aggregate(integer)</function>
+ that produces an integer array
+ containing exactly the integers it is fed.
+ This is a wrapper around <function>array_agg</function>,
+ which does the same thing for any array type.
+ </para>
+
+ <indexterm>
+ <primary>int_array_enum</primary>
+ </indexterm>
+
+ <para>
+ The enumerator is a function
+ <function>int_array_enum(integer[])</function>
+ that returns <type>setof integer</type>. It is essentially the reverse
+ operation of the aggregator: given an array of integers, expand it
+ into a set of rows. This is a wrapper around <function>unnest</function>,
+ which does the same thing for any array type.
+ </para>
+
+ </sect2>
+
+ <sect2>
+ <title>Sample Uses</title>
+
+ <para>
+ Many database systems have the notion of a one to many table. Such a table
+ usually sits between two indexed tables, for example:
+
+<programlisting>
+CREATE TABLE left (id INT PRIMARY KEY, ...);
+CREATE TABLE right (id INT PRIMARY KEY, ...);
+CREATE TABLE one_to_many(left INT REFERENCES left, right INT REFERENCES right);
+</programlisting>
+
+ It is typically used like this:
+
+<programlisting>
+SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
+ WHERE one_to_many.left = <replaceable>item</replaceable>;
+</programlisting>
+
+ This will return all the items in the right hand table for an entry
+ in the left hand table. This is a very common construct in SQL.
+ </para>
+
+ <para>
+ Now, this methodology can be cumbersome with a very large number of
+ entries in the <structname>one_to_many</structname> table. Often,
+ a join like this would result in an index scan
+ and a fetch for each right hand entry in the table for a particular
+ left hand entry. If you have a very dynamic system, there is not much you
+ can do. However, if you have some data which is fairly static, you can
+ create a summary table with the aggregator.
+
+<programlisting>
+CREATE TABLE summary AS
+ SELECT left, int_array_aggregate(right) AS right
+ FROM one_to_many
+ GROUP BY left;
+</programlisting>
+
+ This will create a table with one row per left item, and an array
+ of right items. Now this is pretty useless without some way of using
+ the array; that's why there is an array enumerator. You can do
+
+<programlisting>
+SELECT left, int_array_enum(right) FROM summary WHERE left = <replaceable>item</replaceable>;
+</programlisting>
+
+ The above query using <function>int_array_enum</function> produces the same results
+ as
+
+<programlisting>
+SELECT left, right FROM one_to_many WHERE left = <replaceable>item</replaceable>;
+</programlisting>
+
+ The difference is that the query against the summary table has to get
+ only one row from the table, whereas the direct query against
+ <structname>one_to_many</structname> must index scan and fetch a row for each entry.
+ </para>
+
+ <para>
+ On one system, an <command>EXPLAIN</command> showed a query with a cost of 8488 was
+ reduced to a cost of 329. The original query was a join involving the
+ <structname>one_to_many</structname> table, which was replaced by:
+
+<programlisting>
+SELECT right, count(right) FROM
+ ( SELECT left, int_array_enum(right) AS right
+ FROM summary JOIN (SELECT left FROM left_table WHERE left = <replaceable>item</replaceable>) AS lefts
+ ON (summary.left = lefts.left)
+ ) AS list
+ GROUP BY right
+ ORDER BY count DESC;
+</programlisting>
+ </para>
+
+ </sect2>
+
+</sect1>