summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/xaggr.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/html/xaggr.html
parentInitial commit. (diff)
downloadpostgresql-14-upstream.tar.xz
postgresql-14-upstream.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/xaggr.html')
-rw-r--r--doc/src/sgml/html/xaggr.html528
1 files changed, 528 insertions, 0 deletions
diff --git a/doc/src/sgml/html/xaggr.html b/doc/src/sgml/html/xaggr.html
new file mode 100644
index 0000000..f418f51
--- /dev/null
+++ b/doc/src/sgml/html/xaggr.html
@@ -0,0 +1,528 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>38.12. User-Defined Aggregates</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="xfunc-optimization.html" title="38.11. Function Optimization Information" /><link rel="next" href="xtypes.html" title="38.13. User-Defined Types" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">38.12. User-Defined Aggregates</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="xfunc-optimization.html" title="38.11. Function Optimization Information">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><th width="60%" align="center">Chapter 38. Extending <acronym xmlns="http://www.w3.org/1999/xhtml" class="acronym">SQL</acronym></th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="xtypes.html" title="38.13. User-Defined Types">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="XAGGR"><div class="titlepage"><div><div><h2 class="title" style="clear: both">38.12. User-Defined Aggregates</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="xaggr.html#XAGGR-MOVING-AGGREGATES">38.12.1. Moving-Aggregate Mode</a></span></dt><dt><span class="sect2"><a href="xaggr.html#XAGGR-POLYMORPHIC-AGGREGATES">38.12.2. Polymorphic and Variadic Aggregates</a></span></dt><dt><span class="sect2"><a href="xaggr.html#XAGGR-ORDERED-SET-AGGREGATES">38.12.3. Ordered-Set Aggregates</a></span></dt><dt><span class="sect2"><a href="xaggr.html#XAGGR-PARTIAL-AGGREGATES">38.12.4. Partial Aggregation</a></span></dt><dt><span class="sect2"><a href="xaggr.html#XAGGR-SUPPORT-FUNCTIONS">38.12.5. Support Functions for Aggregates</a></span></dt></dl></div><a id="id-1.8.3.15.2" class="indexterm"></a><p>
+ Aggregate functions in <span class="productname">PostgreSQL</span>
+ are defined in terms of <em class="firstterm">state values</em>
+ and <em class="firstterm">state transition functions</em>.
+ That is, an aggregate operates using a state value that is updated
+ as each successive input row is processed.
+ To define a new aggregate
+ function, one selects a data type for the state value,
+ an initial value for the state, and a state transition
+ function. The state transition function takes the previous state
+ value and the aggregate's input value(s) for the current row, and
+ returns a new state value.
+ A <em class="firstterm">final function</em>
+ can also be specified, in case the desired result of the aggregate
+ is different from the data that needs to be kept in the running
+ state value. The final function takes the ending state value
+ and returns whatever is wanted as the aggregate result.
+ In principle, the transition and final functions are just ordinary
+ functions that could also be used outside the context of the
+ aggregate. (In practice, it's often helpful for performance reasons
+ to create specialized transition functions that can only work when
+ called as part of an aggregate.)
+ </p><p>
+ Thus, in addition to the argument and result data types seen by a user
+ of the aggregate, there is an internal state-value data type that
+ might be different from both the argument and result types.
+ </p><p>
+ If we define an aggregate that does not use a final function,
+ we have an aggregate that computes a running function of
+ the column values from each row. <code class="function">sum</code> is an
+ example of this kind of aggregate. <code class="function">sum</code> starts at
+ zero and always adds the current row's value to
+ its running total. For example, if we want to make a <code class="function">sum</code>
+ aggregate to work on a data type for complex numbers,
+ we only need the addition function for that data type.
+ The aggregate definition would be:
+
+</p><pre class="programlisting">
+CREATE AGGREGATE sum (complex)
+(
+ sfunc = complex_add,
+ stype = complex,
+ initcond = '(0,0)'
+);
+</pre><p>
+
+ which we might use like this:
+
+</p><pre class="programlisting">
+SELECT sum(a) FROM test_complex;
+
+ sum
+-----------
+ (34,53.9)
+</pre><p>
+
+ (Notice that we are relying on function overloading: there is more than
+ one aggregate named <code class="function">sum</code>, but
+ <span class="productname">PostgreSQL</span> can figure out which kind
+ of sum applies to a column of type <code class="type">complex</code>.)
+ </p><p>
+ The above definition of <code class="function">sum</code> will return zero
+ (the initial state value) if there are no nonnull input values.
+ Perhaps we want to return null in that case instead — the SQL standard
+ expects <code class="function">sum</code> to behave that way. We can do this simply by
+ omitting the <code class="literal">initcond</code> phrase, so that the initial state
+ value is null. Ordinarily this would mean that the <code class="literal">sfunc</code>
+ would need to check for a null state-value input. But for
+ <code class="function">sum</code> and some other simple aggregates like
+ <code class="function">max</code> and <code class="function">min</code>,
+ it is sufficient to insert the first nonnull input value into
+ the state variable and then start applying the transition function
+ at the second nonnull input value. <span class="productname">PostgreSQL</span>
+ will do that automatically if the initial state value is null and
+ the transition function is marked <span class="quote">“<span class="quote">strict</span>”</span> (i.e., not to be called
+ for null inputs).
+ </p><p>
+ Another bit of default behavior for a <span class="quote">“<span class="quote">strict</span>”</span> transition function
+ is that the previous state value is retained unchanged whenever a
+ null input value is encountered. Thus, null values are ignored. If you
+ need some other behavior for null inputs, do not declare your
+ transition function as strict; instead code it to test for null inputs and
+ do whatever is needed.
+ </p><p>
+ <code class="function">avg</code> (average) is a more complex example of an aggregate.
+ It requires
+ two pieces of running state: the sum of the inputs and the count
+ of the number of inputs. The final result is obtained by dividing
+ these quantities. Average is typically implemented by using an
+ array as the state value. For example,
+ the built-in implementation of <code class="function">avg(float8)</code>
+ looks like:
+
+</p><pre class="programlisting">
+CREATE AGGREGATE avg (float8)
+(
+ sfunc = float8_accum,
+ stype = float8[],
+ finalfunc = float8_avg,
+ initcond = '{0,0,0}'
+);
+</pre><p>
+ </p><div class="note"><h3 class="title">Note</h3><p>
+ <code class="function">float8_accum</code> requires a three-element array, not just
+ two elements, because it accumulates the sum of squares as well as
+ the sum and count of the inputs. This is so that it can be used for
+ some other aggregates as well as <code class="function">avg</code>.
+ </p></div><p>
+ Aggregate function calls in SQL allow <code class="literal">DISTINCT</code>
+ and <code class="literal">ORDER BY</code> options that control which rows are fed
+ to the aggregate's transition function and in what order. These
+ options are implemented behind the scenes and are not the concern
+ of the aggregate's support functions.
+ </p><p>
+ For further details see the
+ <a class="xref" href="sql-createaggregate.html" title="CREATE AGGREGATE"><span class="refentrytitle">CREATE AGGREGATE</span></a>
+ command.
+ </p><div class="sect2" id="XAGGR-MOVING-AGGREGATES"><div class="titlepage"><div><div><h3 class="title">38.12.1. Moving-Aggregate Mode</h3></div></div></div><a id="id-1.8.3.15.12.2" class="indexterm"></a><a id="id-1.8.3.15.12.3" class="indexterm"></a><p>
+ Aggregate functions can optionally support <em class="firstterm">moving-aggregate
+ mode</em>, which allows substantially faster execution of aggregate
+ functions within windows with moving frame starting points.
+ (See <a class="xref" href="tutorial-window.html" title="3.5. Window Functions">Section 3.5</a>
+ and <a class="xref" href="sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS" title="4.2.8. Window Function Calls">Section 4.2.8</a> for information about use of
+ aggregate functions as window functions.)
+ The basic idea is that in addition to a normal <span class="quote">“<span class="quote">forward</span>”</span>
+ transition function, the aggregate provides an <em class="firstterm">inverse
+ transition function</em>, which allows rows to be removed from the
+ aggregate's running state value when they exit the window frame.
+ For example a <code class="function">sum</code> aggregate, which uses addition as the
+ forward transition function, would use subtraction as the inverse
+ transition function. Without an inverse transition function, the window
+ function mechanism must recalculate the aggregate from scratch each time
+ the frame starting point moves, resulting in run time proportional to the
+ number of input rows times the average frame length. With an inverse
+ transition function, the run time is only proportional to the number of
+ input rows.
+ </p><p>
+ The inverse transition function is passed the current state value and the
+ aggregate input value(s) for the earliest row included in the current
+ state. It must reconstruct what the state value would have been if the
+ given input row had never been aggregated, but only the rows following
+ it. This sometimes requires that the forward transition function keep
+ more state than is needed for plain aggregation mode. Therefore, the
+ moving-aggregate mode uses a completely separate implementation from the
+ plain mode: it has its own state data type, its own forward transition
+ function, and its own final function if needed. These can be the same as
+ the plain mode's data type and functions, if there is no need for extra
+ state.
+ </p><p>
+ As an example, we could extend the <code class="function">sum</code> aggregate given above
+ to support moving-aggregate mode like this:
+
+</p><pre class="programlisting">
+CREATE AGGREGATE sum (complex)
+(
+ sfunc = complex_add,
+ stype = complex,
+ initcond = '(0,0)',
+ msfunc = complex_add,
+ minvfunc = complex_sub,
+ mstype = complex,
+ minitcond = '(0,0)'
+);
+</pre><p>
+
+ The parameters whose names begin with <code class="literal">m</code> define the
+ moving-aggregate implementation. Except for the inverse transition
+ function <code class="literal">minvfunc</code>, they correspond to the plain-aggregate
+ parameters without <code class="literal">m</code>.
+ </p><p>
+ The forward transition function for moving-aggregate mode is not allowed
+ to return null as the new state value. If the inverse transition
+ function returns null, this is taken as an indication that the inverse
+ function cannot reverse the state calculation for this particular input,
+ and so the aggregate calculation will be redone from scratch for the
+ current frame starting position. This convention allows moving-aggregate
+ mode to be used in situations where there are some infrequent cases that
+ are impractical to reverse out of the running state value. The inverse
+ transition function can <span class="quote">“<span class="quote">punt</span>”</span> on these cases, and yet still come
+ out ahead so long as it can work for most cases. As an example, an
+ aggregate working with floating-point numbers might choose to punt when
+ a <code class="literal">NaN</code> (not a number) input has to be removed from the running
+ state value.
+ </p><p>
+ When writing moving-aggregate support functions, it is important to be
+ sure that the inverse transition function can reconstruct the correct
+ state value exactly. Otherwise there might be user-visible differences
+ in results depending on whether the moving-aggregate mode is used.
+ An example of an aggregate for which adding an inverse transition
+ function seems easy at first, yet where this requirement cannot be met
+ is <code class="function">sum</code> over <code class="type">float4</code> or <code class="type">float8</code> inputs. A
+ naive declaration of <code class="function">sum(<code class="type">float8</code>)</code> could be
+
+</p><pre class="programlisting">
+CREATE AGGREGATE unsafe_sum (float8)
+(
+ stype = float8,
+ sfunc = float8pl,
+ mstype = float8,
+ msfunc = float8pl,
+ minvfunc = float8mi
+);
+</pre><p>
+
+ This aggregate, however, can give wildly different results than it would
+ have without the inverse transition function. For example, consider
+
+</p><pre class="programlisting">
+SELECT
+ unsafe_sum(x) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
+FROM (VALUES (1, 1.0e20::float8),
+ (2, 1.0::float8)) AS v (n,x);
+</pre><p>
+
+ This query returns <code class="literal">0</code> as its second result, rather than the
+ expected answer of <code class="literal">1</code>. The cause is the limited precision of
+ floating-point values: adding <code class="literal">1</code> to <code class="literal">1e20</code> results
+ in <code class="literal">1e20</code> again, and so subtracting <code class="literal">1e20</code> from that
+ yields <code class="literal">0</code>, not <code class="literal">1</code>. Note that this is a limitation
+ of floating-point arithmetic in general, not a limitation
+ of <span class="productname">PostgreSQL</span>.
+ </p></div><div class="sect2" id="XAGGR-POLYMORPHIC-AGGREGATES"><div class="titlepage"><div><div><h3 class="title">38.12.2. Polymorphic and Variadic Aggregates</h3></div></div></div><a id="id-1.8.3.15.13.2" class="indexterm"></a><a id="id-1.8.3.15.13.3" class="indexterm"></a><p>
+ Aggregate functions can use polymorphic
+ state transition functions or final functions, so that the same functions
+ can be used to implement multiple aggregates.
+ See <a class="xref" href="extend-type-system.html#EXTEND-TYPES-POLYMORPHIC" title="38.2.5. Polymorphic Types">Section 38.2.5</a>
+ for an explanation of polymorphic functions.
+ Going a step further, the aggregate function itself can be specified
+ with polymorphic input type(s) and state type, allowing a single
+ aggregate definition to serve for multiple input data types.
+ Here is an example of a polymorphic aggregate:
+
+</p><pre class="programlisting">
+CREATE AGGREGATE array_accum (anycompatible)
+(
+ sfunc = array_append,
+ stype = anycompatiblearray,
+ initcond = '{}'
+);
+</pre><p>
+
+ Here, the actual state type for any given aggregate call is the array type
+ having the actual input type as elements. The behavior of the aggregate
+ is to concatenate all the inputs into an array of that type.
+ (Note: the built-in aggregate <code class="function">array_agg</code> provides similar
+ functionality, with better performance than this definition would have.)
+ </p><p>
+ Here's the output using two different actual data types as arguments:
+
+</p><pre class="programlisting">
+SELECT attrelid::regclass, array_accum(attname)
+ FROM pg_attribute
+ WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
+ GROUP BY attrelid;
+
+ attrelid | array_accum
+---------------+---------------------------------------
+ pg_tablespace | {spcname,spcowner,spcacl,spcoptions}
+(1 row)
+
+SELECT attrelid::regclass, array_accum(atttypid::regtype)
+ FROM pg_attribute
+ WHERE attnum &gt; 0 AND attrelid = 'pg_tablespace'::regclass
+ GROUP BY attrelid;
+
+ attrelid | array_accum
+---------------+---------------------------
+ pg_tablespace | {name,oid,aclitem[],text[]}
+(1 row)
+</pre><p>
+ </p><p>
+ Ordinarily, an aggregate function with a polymorphic result type has a
+ polymorphic state type, as in the above example. This is necessary
+ because otherwise the final function cannot be declared sensibly: it
+ would need to have a polymorphic result type but no polymorphic argument
+ type, which <code class="command">CREATE FUNCTION</code> will reject on the grounds that
+ the result type cannot be deduced from a call. But sometimes it is
+ inconvenient to use a polymorphic state type. The most common case is
+ where the aggregate support functions are to be written in C and the
+ state type should be declared as <code class="type">internal</code> because there is
+ no SQL-level equivalent for it. To address this case, it is possible to
+ declare the final function as taking extra <span class="quote">“<span class="quote">dummy</span>”</span> arguments
+ that match the input arguments of the aggregate. Such dummy arguments
+ are always passed as null values since no specific value is available when the
+ final function is called. Their only use is to allow a polymorphic
+ final function's result type to be connected to the aggregate's input
+ type(s). For example, the definition of the built-in
+ aggregate <code class="function">array_agg</code> is equivalent to
+
+</p><pre class="programlisting">
+CREATE FUNCTION array_agg_transfn(internal, anynonarray)
+ RETURNS internal ...;
+CREATE FUNCTION array_agg_finalfn(internal, anynonarray)
+ RETURNS anyarray ...;
+
+CREATE AGGREGATE array_agg (anynonarray)
+(
+ sfunc = array_agg_transfn,
+ stype = internal,
+ finalfunc = array_agg_finalfn,
+ finalfunc_extra
+);
+</pre><p>
+
+ Here, the <code class="literal">finalfunc_extra</code> option specifies that the final
+ function receives, in addition to the state value, extra dummy
+ argument(s) corresponding to the aggregate's input argument(s).
+ The extra <code class="type">anynonarray</code> argument allows the declaration
+ of <code class="function">array_agg_finalfn</code> to be valid.
+ </p><p>
+ An aggregate function can be made to accept a varying number of arguments
+ by declaring its last argument as a <code class="literal">VARIADIC</code> array, in much
+ the same fashion as for regular functions; see
+ <a class="xref" href="xfunc-sql.html#XFUNC-SQL-VARIADIC-FUNCTIONS" title="38.5.6. SQL Functions with Variable Numbers of Arguments">Section 38.5.6</a>. The aggregate's transition
+ function(s) must have the same array type as their last argument. The
+ transition function(s) typically would also be marked <code class="literal">VARIADIC</code>,
+ but this is not strictly required.
+ </p><div class="note"><h3 class="title">Note</h3><p>
+ Variadic aggregates are easily misused in connection with
+ the <code class="literal">ORDER BY</code> option (see <a class="xref" href="sql-expressions.html#SYNTAX-AGGREGATES" title="4.2.7. Aggregate Expressions">Section 4.2.7</a>),
+ since the parser cannot tell whether the wrong number of actual arguments
+ have been given in such a combination. Keep in mind that everything to
+ the right of <code class="literal">ORDER BY</code> is a sort key, not an argument to the
+ aggregate. For example, in
+</p><pre class="programlisting">
+SELECT myaggregate(a ORDER BY a, b, c) FROM ...
+</pre><p>
+ the parser will see this as a single aggregate function argument and
+ three sort keys. However, the user might have intended
+</p><pre class="programlisting">
+SELECT myaggregate(a, b, c ORDER BY a) FROM ...
+</pre><p>
+ If <code class="literal">myaggregate</code> is variadic, both these calls could be
+ perfectly valid.
+ </p><p>
+ For the same reason, it's wise to think twice before creating aggregate
+ functions with the same names and different numbers of regular arguments.
+ </p></div></div><div class="sect2" id="XAGGR-ORDERED-SET-AGGREGATES"><div class="titlepage"><div><div><h3 class="title">38.12.3. Ordered-Set Aggregates</h3></div></div></div><a id="id-1.8.3.15.14.2" class="indexterm"></a><p>
+ The aggregates we have been describing so far are <span class="quote">“<span class="quote">normal</span>”</span>
+ aggregates. <span class="productname">PostgreSQL</span> also
+ supports <em class="firstterm">ordered-set aggregates</em>, which differ from
+ normal aggregates in two key ways. First, in addition to ordinary
+ aggregated arguments that are evaluated once per input row, an
+ ordered-set aggregate can have <span class="quote">“<span class="quote">direct</span>”</span> arguments that are
+ evaluated only once per aggregation operation. Second, the syntax
+ for the ordinary aggregated arguments specifies a sort ordering
+ for them explicitly. An ordered-set aggregate is usually
+ used to implement a computation that depends on a specific row
+ ordering, for instance rank or percentile, so that the sort ordering
+ is a required aspect of any call. For example, the built-in
+ definition of <code class="function">percentile_disc</code> is equivalent to:
+
+</p><pre class="programlisting">
+CREATE FUNCTION ordered_set_transition(internal, anyelement)
+ RETURNS internal ...;
+CREATE FUNCTION percentile_disc_final(internal, float8, anyelement)
+ RETURNS anyelement ...;
+
+CREATE AGGREGATE percentile_disc (float8 ORDER BY anyelement)
+(
+ sfunc = ordered_set_transition,
+ stype = internal,
+ finalfunc = percentile_disc_final,
+ finalfunc_extra
+);
+</pre><p>
+
+ This aggregate takes a <code class="type">float8</code> direct argument (the percentile
+ fraction) and an aggregated input that can be of any sortable data type.
+ It could be used to obtain a median household income like this:
+
+</p><pre class="programlisting">
+SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
+ percentile_disc
+-----------------
+ 50489
+</pre><p>
+
+ Here, <code class="literal">0.5</code> is a direct argument; it would make no sense
+ for the percentile fraction to be a value varying across rows.
+ </p><p>
+ Unlike the case for normal aggregates, the sorting of input rows for
+ an ordered-set aggregate is <span class="emphasis"><em>not</em></span> done behind the scenes,
+ but is the responsibility of the aggregate's support functions.
+ The typical implementation approach is to keep a reference to
+ a <span class="quote">“<span class="quote">tuplesort</span>”</span> object in the aggregate's state value, feed the
+ incoming rows into that object, and then complete the sorting and
+ read out the data in the final function. This design allows the
+ final function to perform special operations such as injecting
+ additional <span class="quote">“<span class="quote">hypothetical</span>”</span> rows into the data to be sorted.
+ While normal aggregates can often be implemented with support
+ functions written in <span class="application">PL/pgSQL</span> or another
+ PL language, ordered-set aggregates generally have to be written in
+ C, since their state values aren't definable as any SQL data type.
+ (In the above example, notice that the state value is declared as
+ type <code class="type">internal</code> — this is typical.)
+ Also, because the final function performs the sort, it is not possible
+ to continue adding input rows by executing the transition function again
+ later. This means the final function is not <code class="literal">READ_ONLY</code>;
+ it must be declared in <a class="link" href="sql-createaggregate.html" title="CREATE AGGREGATE"><code class="command">CREATE AGGREGATE</code></a>
+ as <code class="literal">READ_WRITE</code>, or as <code class="literal">SHAREABLE</code> if
+ it's possible for additional final-function calls to make use of the
+ already-sorted state.
+ </p><p>
+ The state transition function for an ordered-set aggregate receives
+ the current state value plus the aggregated input values for
+ each row, and returns the updated state value. This is the
+ same definition as for normal aggregates, but note that the direct
+ arguments (if any) are not provided. The final function receives
+ the last state value, the values of the direct arguments if any,
+ and (if <code class="literal">finalfunc_extra</code> is specified) null values
+ corresponding to the aggregated input(s). As with normal
+ aggregates, <code class="literal">finalfunc_extra</code> is only really useful if the
+ aggregate is polymorphic; then the extra dummy argument(s) are needed
+ to connect the final function's result type to the aggregate's input
+ type(s).
+ </p><p>
+ Currently, ordered-set aggregates cannot be used as window functions,
+ and therefore there is no need for them to support moving-aggregate mode.
+ </p></div><div class="sect2" id="XAGGR-PARTIAL-AGGREGATES"><div class="titlepage"><div><div><h3 class="title">38.12.4. Partial Aggregation</h3></div></div></div><a id="id-1.8.3.15.15.2" class="indexterm"></a><p>
+ Optionally, an aggregate function can support <em class="firstterm">partial
+ aggregation</em>. The idea of partial aggregation is to run the aggregate's
+ state transition function over different subsets of the input data
+ independently, and then to combine the state values resulting from those
+ subsets to produce the same state value that would have resulted from
+ scanning all the input in a single operation. This mode can be used for
+ parallel aggregation by having different worker processes scan different
+ portions of a table. Each worker produces a partial state value, and at
+ the end those state values are combined to produce a final state value.
+ (In the future this mode might also be used for purposes such as combining
+ aggregations over local and remote tables; but that is not implemented
+ yet.)
+ </p><p>
+ To support partial aggregation, the aggregate definition must provide
+ a <em class="firstterm">combine function</em>, which takes two values of the
+ aggregate's state type (representing the results of aggregating over two
+ subsets of the input rows) and produces a new value of the state type,
+ representing what the state would have been after aggregating over the
+ combination of those sets of rows. It is unspecified what the relative
+ order of the input rows from the two sets would have been. This means
+ that it's usually impossible to define a useful combine function for
+ aggregates that are sensitive to input row order.
+ </p><p>
+ As simple examples, <code class="literal">MAX</code> and <code class="literal">MIN</code> aggregates can be
+ made to support partial aggregation by specifying the combine function as
+ the same greater-of-two or lesser-of-two comparison function that is used
+ as their transition function. <code class="literal">SUM</code> aggregates just need an
+ addition function as combine function. (Again, this is the same as their
+ transition function, unless the state value is wider than the input data
+ type.)
+ </p><p>
+ The combine function is treated much like a transition function that
+ happens to take a value of the state type, not of the underlying input
+ type, as its second argument. In particular, the rules for dealing
+ with null values and strict functions are similar. Also, if the aggregate
+ definition specifies a non-null <code class="literal">initcond</code>, keep in mind that
+ that will be used not only as the initial state for each partial
+ aggregation run, but also as the initial state for the combine function,
+ which will be called to combine each partial result into that state.
+ </p><p>
+ If the aggregate's state type is declared as <code class="type">internal</code>, it is
+ the combine function's responsibility that its result is allocated in
+ the correct memory context for aggregate state values. This means in
+ particular that when the first input is <code class="literal">NULL</code> it's invalid
+ to simply return the second input, as that value will be in the wrong
+ context and will not have sufficient lifespan.
+ </p><p>
+ When the aggregate's state type is declared as <code class="type">internal</code>, it is
+ usually also appropriate for the aggregate definition to provide a
+ <em class="firstterm">serialization function</em> and a <em class="firstterm">deserialization
+ function</em>, which allow such a state value to be copied from one process
+ to another. Without these functions, parallel aggregation cannot be
+ performed, and future applications such as local/remote aggregation will
+ probably not work either.
+ </p><p>
+ A serialization function must take a single argument of
+ type <code class="type">internal</code> and return a result of type <code class="type">bytea</code>, which
+ represents the state value packaged up into a flat blob of bytes.
+ Conversely, a deserialization function reverses that conversion. It must
+ take two arguments of types <code class="type">bytea</code> and <code class="type">internal</code>, and
+ return a result of type <code class="type">internal</code>. (The second argument is unused
+ and is always zero, but it is required for type-safety reasons.) The
+ result of the deserialization function should simply be allocated in the
+ current memory context, as unlike the combine function's result, it is not
+ long-lived.
+ </p><p>
+ Worth noting also is that for an aggregate to be executed in parallel,
+ the aggregate itself must be marked <code class="literal">PARALLEL SAFE</code>. The
+ parallel-safety markings on its support functions are not consulted.
+ </p></div><div class="sect2" id="XAGGR-SUPPORT-FUNCTIONS"><div class="titlepage"><div><div><h3 class="title">38.12.5. Support Functions for Aggregates</h3></div></div></div><a id="id-1.8.3.15.16.2" class="indexterm"></a><p>
+ A function written in C can detect that it is being called as an
+ aggregate support function by calling
+ <code class="function">AggCheckCallContext</code>, for example:
+</p><pre class="programlisting">
+if (AggCheckCallContext(fcinfo, NULL))
+</pre><p>
+ One reason for checking this is that when it is true, the first input
+ must be a temporary state value and can therefore safely be modified
+ in-place rather than allocating a new copy.
+ See <code class="function">int8inc()</code> for an example.
+ (While aggregate transition functions are always allowed to modify
+ the transition value in-place, aggregate final functions are generally
+ discouraged from doing so; if they do so, the behavior must be declared
+ when creating the aggregate. See <a class="xref" href="sql-createaggregate.html" title="CREATE AGGREGATE"><span class="refentrytitle">CREATE AGGREGATE</span></a>
+ for more detail.)
+ </p><p>
+ The second argument of <code class="function">AggCheckCallContext</code> can be used to
+ retrieve the memory context in which aggregate state values are being kept.
+ This is useful for transition functions that wish to use <span class="quote">“<span class="quote">expanded</span>”</span>
+ objects (see <a class="xref" href="xtypes.html#XTYPES-TOAST" title="38.13.1. TOAST Considerations">Section 38.13.1</a>) as their state values.
+ On first call, the transition function should return an expanded object
+ whose memory context is a child of the aggregate state context, and then
+ keep returning the same expanded object on subsequent calls. See
+ <code class="function">array_append()</code> for an example. (<code class="function">array_append()</code>
+ is not the transition function of any built-in aggregate, but it is written
+ to behave efficiently when used as transition function of a custom
+ aggregate.)
+ </p><p>
+ Another support routine available to aggregate functions written in C
+ is <code class="function">AggGetAggref</code>, which returns the <code class="literal">Aggref</code>
+ parse node that defines the aggregate call. This is mainly useful
+ for ordered-set aggregates, which can inspect the substructure of
+ the <code class="literal">Aggref</code> node to find out what sort ordering they are
+ supposed to implement. Examples can be found
+ in <code class="filename">orderedsetaggs.c</code> in the <span class="productname">PostgreSQL</span>
+ source code.
+ </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="xfunc-optimization.html" title="38.11. Function Optimization Information">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="xtypes.html" title="38.13. User-Defined Types">Next</a></td></tr><tr><td width="40%" align="left" valign="top">38.11. Function Optimization Information </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 38.13. User-Defined Types</td></tr></table></div></body></html> \ No newline at end of file