summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/intagg.sgml
blob: c410f64f3ef51b45131acf6b654edc50ff4d49a5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!-- doc/src/sgml/intagg.sgml -->

<sect1 id="intagg" xreflabel="intagg">
 <title>intagg</title>

 <indexterm zone="intagg">
  <primary>intagg</primary>
 </indexterm>

 <para>
  The <filename>intagg</filename> module provides an integer aggregator and an
  enumerator.  <filename>intagg</filename> is now obsolete, because there
  are built-in functions that provide a superset of its capabilities.
  However, the module is still provided as a compatibility wrapper around
  the built-in functions.
 </para>

 <sect2>
  <title>Functions</title>

 <indexterm>
  <primary>int_array_aggregate</primary>
 </indexterm>

 <indexterm>
  <primary>array_agg</primary>
 </indexterm>

 <para>
  The aggregator is an aggregate function
  <function>int_array_aggregate(integer)</function>
  that produces an integer array
  containing exactly the integers it is fed.
  This is a wrapper around <function>array_agg</function>,
  which does the same thing for any array type.
 </para>

 <indexterm>
  <primary>int_array_enum</primary>
 </indexterm>

 <para>
  The enumerator is a function
  <function>int_array_enum(integer[])</function>
  that returns <type>setof integer</type>.  It is essentially the reverse
  operation of the aggregator: given an array of integers, expand it
  into a set of rows.  This is a wrapper around <function>unnest</function>,
  which does the same thing for any array type.
 </para>

 </sect2>

 <sect2>
  <title>Sample Uses</title>

  <para>
   Many database systems have the notion of a one to many table. Such a table
   usually sits between two indexed tables, for example:

<programlisting>
CREATE TABLE left (id INT PRIMARY KEY, ...);
CREATE TABLE right (id INT PRIMARY KEY, ...);
CREATE TABLE one_to_many(left INT REFERENCES left, right INT REFERENCES right);
</programlisting>

  It is typically used like this:

<programlisting>
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
  WHERE one_to_many.left = <replaceable>item</replaceable>;
</programlisting>

  This will return all the items in the right hand table for an entry
  in the left hand table. This is a very common construct in SQL.
 </para>

 <para>
  Now, this methodology can be cumbersome with a very large number of
  entries in the <structname>one_to_many</structname> table.  Often,
  a join like this would result in an index scan
  and a fetch for each right hand entry in the table for a particular
  left hand entry. If you have a very dynamic system, there is not much you
  can do. However, if you have some data which is fairly static, you can
  create a summary table with the aggregator.

<programlisting>
CREATE TABLE summary AS
  SELECT left, int_array_aggregate(right) AS right
  FROM one_to_many
  GROUP BY left;
</programlisting>

  This will create a table with one row per left item, and an array
  of right items. Now this is pretty useless without some way of using
  the array; that's why there is an array enumerator.  You can do

<programlisting>
SELECT left, int_array_enum(right) FROM summary WHERE left = <replaceable>item</replaceable>;
</programlisting>

  The above query using <function>int_array_enum</function> produces the same results
  as

<programlisting>
SELECT left, right FROM one_to_many WHERE left = <replaceable>item</replaceable>;
</programlisting>

  The difference is that the query against the summary table has to get
  only one row from the table, whereas the direct query against
  <structname>one_to_many</structname> must index scan and fetch a row for each entry.
 </para>

 <para>
  On one system, an <command>EXPLAIN</command> showed a query with a cost of 8488 was
  reduced to a cost of 329.  The original query was a join involving the
  <structname>one_to_many</structname> table, which was replaced by:

<programlisting>
SELECT right, count(right) FROM
  ( SELECT left, int_array_enum(right) AS right
    FROM summary JOIN (SELECT left FROM left_table WHERE left = <replaceable>item</replaceable>) AS lefts
         ON (summary.left = lefts.left)
  ) AS list
  GROUP BY right
  ORDER BY count DESC;
</programlisting>
 </para>

 </sect2>

</sect1>