1 files changed, 486 insertions, 0 deletions
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml
new file mode 100644
index 0000000..98f4c5c
--- /dev/null
+++ b/doc/src/sgml/xoper.sgml
@@ -0,0 +1,486 @@
+<!-- doc/src/sgml/xoper.sgml -->
+
+ <sect1 id="xoper">
+  <title>User-Defined Operators</title>
+
+  <indexterm zone="xoper">
+   <primary>operator</primary>
+   <secondary>user-defined</secondary>
+  </indexterm>
+
+  <para>
+   Every operator is <quote>syntactic sugar</quote> for a call to an
+   underlying function that does the real work; so you must
+   first create the underlying function before you can create
+   the operator.  However, an operator is <emphasis>not merely</emphasis>
+   syntactic sugar, because it carries additional information
+   that helps the query planner optimize queries that use the
+   operator.  The next section will be devoted to explaining
+   that additional information.
+  </para>
+
+  <para>
+   <productname>PostgreSQL</productname> supports prefix
+   and infix operators.  Operators can be
+   overloaded;<indexterm><primary>overloading</primary><secondary>operators</secondary></indexterm>
+   that is, the same operator name can be used for different operators
+   that have different numbers and types of operands.  When a query is
+   executed, the system determines the operator to call from the
+   number and types of the provided operands.
+  </para>
+
+  <para>
+   Here is an example of creating an operator for adding two complex
+   numbers.  We assume we've already created the definition of type
+   <type>complex</type> (see <xref linkend="xtypes"/>).  First we need a
+   function that does the work, then we can define the operator:
+
+<programlisting>
+CREATE FUNCTION complex_add(complex, complex)
+    RETURNS complex
+    AS '<replaceable>filename</replaceable>', 'complex_add'
+    LANGUAGE C IMMUTABLE STRICT;
+
+CREATE OPERATOR + (
+    leftarg = complex,
+    rightarg = complex,
+    function = complex_add,
+    commutator = +
+);
+</programlisting>
+  </para>
+
+  <para>
+   Now we could execute a query like this:
+
+<screen>
+SELECT (a + b) AS c FROM test_complex;
+
+        c
+-----------------
+ (5.2,6.05)
+ (133.42,144.95)
+</screen>
+  </para>
+
+  <para>
+   We've shown how to create a binary operator here.  To create a prefix
+   operator, just omit the <literal>leftarg</literal>.
+   The <literal>function</literal>
+   clause and the argument clauses are the only required items in
+   <command>CREATE OPERATOR</command>.  The <literal>commutator</literal>
+   clause shown in the example is an optional hint to the query
+   optimizer.  Further details about <literal>commutator</literal> and other
+   optimizer hints appear in the next section.
+  </para>
+ </sect1>
+
+  <sect1 id="xoper-optimization">
+   <title>Operator Optimization Information</title>
+
+  <indexterm zone="xoper-optimization">
+   <primary>optimization information</primary>
+   <secondary>for operators</secondary>
+  </indexterm>
+
+   <para>
+    A <productname>PostgreSQL</productname> operator definition can include
+    several optional clauses that tell the system useful things about how
+    the operator behaves.  These clauses should be provided whenever
+    appropriate, because they can make for considerable speedups in execution
+    of queries that use the operator.  But if you provide them, you must be
+    sure that they are right!  Incorrect use of an optimization clause can
+    result in slow queries, subtly wrong output, or other Bad Things.
+    You can always leave out an optimization clause if you are not sure
+    about it; the only consequence is that queries might run slower than
+    they need to.
+   </para>
+
+   <para>
+    Additional optimization clauses might be added in future versions of
+    <productname>PostgreSQL</productname>.  The ones described here are all
+    the ones that release &version; understands.
+   </para>
+
+   <para>
+    It is also possible to attach a planner support function to the function
+    that underlies an operator, providing another way of telling the system
+    about the behavior of the operator.
+    See <xref linkend="xfunc-optimization"/> for more information.
+   </para>
+
+   <sect2>
+    <title><literal>COMMUTATOR</literal></title>
+
+    <para>
+     The <literal>COMMUTATOR</literal> clause, if provided, names an operator that is the
+     commutator of the operator being defined.  We say that operator A is the
+     commutator of operator B if (x A y) equals (y B x) for all possible input
+     values x, y.  Notice that B is also the commutator of A.  For example,
+     operators <literal>&lt;</literal> and <literal>&gt;</literal> for a particular data type are usually each others'
+     commutators, and operator <literal>+</literal> is usually commutative with itself.
+     But operator <literal>-</literal> is usually not commutative with anything.
+    </para>
+
+    <para>
+     The left operand type of a commutable operator is the same as the
+     right operand type of its commutator, and vice versa.  So the name of
+     the commutator operator is all that <productname>PostgreSQL</productname>
+     needs to be given to look up the commutator, and that's all that needs to
+     be provided in the <literal>COMMUTATOR</literal> clause.
+    </para>
+
+    <para>
+     It's critical to provide commutator information for operators that
+     will be used in indexes and join clauses, because this allows the
+     query optimizer to <quote>flip around</quote> such a clause to the forms
+     needed for different plan types.  For example, consider a query with
+     a WHERE clause like <literal>tab1.x = tab2.y</literal>, where <literal>tab1.x</literal>
+     and <literal>tab2.y</literal> are of a user-defined type, and suppose that
+     <literal>tab2.y</literal> is indexed.  The optimizer cannot generate an
+     index scan unless it can determine how to flip the clause around to
+     <literal>tab2.y = tab1.x</literal>, because the index-scan machinery expects
+     to see the indexed column on the left of the operator it is given.
+     <productname>PostgreSQL</productname> will <emphasis>not</emphasis> simply
+     assume that this is a valid transformation &mdash; the creator of the
+     <literal>=</literal> operator must specify that it is valid, by marking the
+     operator with commutator information.
+    </para>
+
+    <para>
+     When you are defining a self-commutative operator, you just do it.
+     When you are defining a pair of commutative operators, things are
+     a little trickier: how can the first one to be defined refer to the
+     other one, which you haven't defined yet?  There are two solutions
+     to this problem:
+
+     <itemizedlist>
+      <listitem>
+       <para>
+        One way is to omit the <literal>COMMUTATOR</literal> clause in the first operator that
+        you define, and then provide one in the second operator's definition.
+        Since <productname>PostgreSQL</productname> knows that commutative
+        operators come in pairs, when it sees the second definition it will
+        automatically go back and fill in the missing <literal>COMMUTATOR</literal> clause in
+        the first definition.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        The other, more straightforward way is just to include <literal>COMMUTATOR</literal> clauses
+        in both definitions.  When <productname>PostgreSQL</productname> processes
+        the first definition and realizes that <literal>COMMUTATOR</literal> refers to a nonexistent
+        operator, the system will make a dummy entry for that operator in the
+        system catalog.  This dummy entry will have valid data only
+        for the operator name, left and right operand types, and result type,
+        since that's all that <productname>PostgreSQL</productname> can deduce
+        at this point.  The first operator's catalog entry will link to this
+        dummy entry.  Later, when you define the second operator, the system
+        updates the dummy entry with the additional information from the second
+        definition.  If you try to use the dummy operator before it's been filled
+        in, you'll just get an error message.
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+   </sect2>
+
+   <sect2>
+    <title><literal>NEGATOR</literal></title>
+
+    <para>
+     The <literal>NEGATOR</literal> clause, if provided, names an operator that is the
+     negator of the operator being defined.  We say that operator A
+     is the negator of operator B if both return Boolean results and
+     (x A y) equals NOT (x B y) for all possible inputs x, y.
+     Notice that B is also the negator of A.
+     For example, <literal>&lt;</literal> and <literal>&gt;=</literal> are a negator pair for most data types.
+     An operator can never validly be its own negator.
+    </para>
+
+   <para>
+    Unlike commutators, a pair of unary operators could validly be marked
+    as each other's negators; that would mean (A x) equals NOT (B x)
+    for all x.
+   </para>
+
+   <para>
+    An operator's negator must have the same left and/or right operand types
+    as the operator to be defined, so just as with <literal>COMMUTATOR</literal>, only the operator
+    name need be given in the <literal>NEGATOR</literal> clause.
+   </para>
+
+   <para>
+    Providing a negator is very helpful to the query optimizer since
+    it allows expressions like <literal>NOT (x = y)</literal> to be simplified into
+    <literal>x &lt;&gt; y</literal>.  This comes up more often than you might think, because
+    <literal>NOT</literal> operations can be inserted as a consequence of other rearrangements.
+   </para>
+
+   <para>
+    Pairs of negator operators can be defined using the same methods
+    explained above for commutator pairs.
+   </para>
+
+  </sect2>
+
+  <sect2>
+   <title><literal>RESTRICT</literal></title>
+
+   <para>
+    The <literal>RESTRICT</literal> clause, if provided, names a restriction selectivity
+    estimation function for the operator.  (Note that this is a function
+    name, not an operator name.)  <literal>RESTRICT</literal> clauses only make sense for
+    binary operators that return <type>boolean</type>.  The idea behind a restriction
+    selectivity estimator is to guess what fraction of the rows in a
+    table will satisfy a <literal>WHERE</literal>-clause condition of the form:
+<programlisting>
+column OP constant
+</programlisting>
+    for the current operator and a particular constant value.
+    This assists the optimizer by
+    giving it some idea of how many rows will be eliminated by <literal>WHERE</literal>
+    clauses that have this form.  (What happens if the constant is on
+    the left, you might be wondering?  Well, that's one of the things that
+    <literal>COMMUTATOR</literal> is for...)
+   </para>
+
+   <para>
+    Writing new restriction selectivity estimation functions is far beyond
+    the scope of this chapter, but fortunately you can usually just use
+    one of the system's standard estimators for many of your own operators.
+    These are the standard restriction estimators:
+    <simplelist>
+     <member><function>eqsel</function> for <literal>=</literal></member>
+     <member><function>neqsel</function> for <literal>&lt;&gt;</literal></member>
+     <member><function>scalarltsel</function> for <literal>&lt;</literal></member>
+     <member><function>scalarlesel</function> for <literal>&lt;=</literal></member>
+     <member><function>scalargtsel</function> for <literal>&gt;</literal></member>
+     <member><function>scalargesel</function> for <literal>&gt;=</literal></member>
+    </simplelist>
+   </para>
+
+   <para>
+    You can frequently get away with using either <function>eqsel</function> or <function>neqsel</function> for
+    operators that have very high or very low selectivity, even if they
+    aren't really equality or inequality.  For example, the
+    approximate-equality geometric operators use <function>eqsel</function> on the assumption that
+    they'll usually only match a small fraction of the entries in a table.
+   </para>
+
+   <para>
+    You can use <function>scalarltsel</function>, <function>scalarlesel</function>,
+    <function>scalargtsel</function> and <function>scalargesel</function> for comparisons on
+    data types that have some sensible means of being converted into numeric
+    scalars for range comparisons.  If possible, add the data type to those
+    understood by the function <function>convert_to_scalar()</function> in
+    <filename>src/backend/utils/adt/selfuncs.c</filename>.
+    (Eventually, this function should be replaced by per-data-type functions
+    identified through a column of the <classname>pg_type</classname> system catalog; but that hasn't happened
+    yet.)  If you do not do this, things will still work, but the optimizer's
+    estimates won't be as good as they could be.
+   </para>
+
+   <para>
+    Another useful built-in selectivity estimation function
+    is <function>matchingsel</function>, which will work for almost any
+    binary operator, if standard MCV and/or histogram statistics are
+    collected for the input data type(s).  Its default estimate is set to
+    twice the default estimate used in <function>eqsel</function>, making
+    it most suitable for comparison operators that are somewhat less
+    strict than equality.  (Or you could call the
+    underlying <function>generic_restriction_selectivity</function>
+    function, providing a different default estimate.)
+   </para>
+
+   <para>
+    There are additional selectivity estimation functions designed for geometric
+    operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>,
+    and <function>contsel</function>.  At this writing these are just stubs, but you might want
+    to use them (or even better, improve them) anyway.
+   </para>
+   </sect2>
+
+   <sect2>
+    <title><literal>JOIN</literal></title>
+
+    <para>
+     The <literal>JOIN</literal> clause, if provided, names a join selectivity
+     estimation function for the operator.  (Note that this is a function
+     name, not an operator name.)  <literal>JOIN</literal> clauses only make sense for
+     binary operators that return <type>boolean</type>.  The idea behind a join
+     selectivity estimator is to guess what fraction of the rows in a
+     pair of tables will satisfy a <literal>WHERE</literal>-clause condition of the form:
+<programlisting>
+table1.column1 OP table2.column2
+</programlisting>
+     for the current operator.  As with the <literal>RESTRICT</literal> clause, this helps
+     the optimizer very substantially by letting it figure out which
+     of several possible join sequences is likely to take the least work.
+    </para>
+
+    <para>
+     As before, this chapter will make no attempt to explain how to write
+     a join selectivity estimator function, but will just suggest that
+     you use one of the standard estimators if one is applicable:
+     <simplelist>
+      <member><function>eqjoinsel</function> for <literal>=</literal></member>
+      <member><function>neqjoinsel</function> for <literal>&lt;&gt;</literal></member>
+      <member><function>scalarltjoinsel</function> for <literal>&lt;</literal></member>
+      <member><function>scalarlejoinsel</function> for <literal>&lt;=</literal></member>
+      <member><function>scalargtjoinsel</function> for <literal>&gt;</literal></member>
+      <member><function>scalargejoinsel</function> for <literal>&gt;=</literal></member>
+      <member><function>matchingjoinsel</function> for generic matching operators</member>
+      <member><function>areajoinsel</function> for 2D area-based comparisons</member>
+      <member><function>positionjoinsel</function> for 2D position-based comparisons</member>
+      <member><function>contjoinsel</function> for 2D containment-based comparisons</member>
+     </simplelist>
+    </para>
+   </sect2>
+
+   <sect2>
+    <title><literal>HASHES</literal></title>
+
+    <para>
+     The <literal>HASHES</literal> clause, if present, tells the system that
+     it is permissible to use the hash join method for a join based on this
+     operator.  <literal>HASHES</literal> only makes sense for a binary operator that
+     returns <literal>boolean</literal>, and in practice the operator must represent
+     equality for some data type or pair of data types.
+    </para>
+
+    <para>
+     The assumption underlying hash join is that the join operator can
+     only return true for pairs of left and right values that hash to the
+     same hash code.  If two values get put in different hash buckets, the
+     join will never compare them at all, implicitly assuming that the
+     result of the join operator must be false.  So it never makes sense
+     to specify <literal>HASHES</literal> for operators that do not represent
+     some form of equality.  In most cases it is only practical to support
+     hashing for operators that take the same data type on both sides.
+     However, sometimes it is possible to design compatible hash functions
+     for two or more data types; that is, functions that will generate the
+     same hash codes for <quote>equal</quote> values, even though the values
+     have different representations.  For example, it's fairly simple
+     to arrange this property when hashing integers of different widths.
+    </para>
+
+    <para>
+     To be marked <literal>HASHES</literal>, the join operator must appear
+     in a hash index operator family.  This is not enforced when you create
+     the operator, since of course the referencing operator family couldn't
+     exist yet.  But attempts to use the operator in hash joins will fail
+     at run time if no such operator family exists.  The system needs the
+     operator family to find the data-type-specific hash function(s) for the
+     operator's input data type(s).  Of course, you must also create suitable
+     hash functions before you can create the operator family.
+    </para>
+
+    <para>
+     Care should be exercised when preparing a hash function, because there
+     are machine-dependent ways in which it might fail to do the right thing.
+     For example, if your data type is a structure in which there might be
+     uninteresting pad bits, you cannot simply pass the whole structure to
+     <function>hash_any</function>.  (Unless you write your other operators and
+     functions to ensure that the unused bits are always zero, which is the
+     recommended strategy.)
+     Another example is that on machines that meet the <acronym>IEEE</acronym>
+     floating-point standard, negative zero and positive zero are different
+     values (different bit patterns) but they are defined to compare equal.
+     If a float value might contain negative zero then extra steps are needed
+     to ensure it generates the same hash value as positive zero.
+    </para>
+
+    <para>
+     A hash-joinable operator must have a commutator (itself if the two
+     operand data types are the same, or a related equality operator
+     if they are different) that appears in the same operator family.
+     If this is not the case, planner errors might occur when the operator
+     is used.  Also, it is a good idea (but not strictly required) for
+     a hash operator family that supports multiple data types to provide
+     equality operators for every combination of the data types; this
+     allows better optimization.
+    </para>
+
+    <note>
+    <para>
+     The function underlying a hash-joinable operator must be marked
+     immutable or stable.  If it is volatile, the system will never
+     attempt to use the operator for a hash join.
+    </para>
+    </note>
+
+    <note>
+    <para>
+     If a hash-joinable operator has an underlying function that is marked
+     strict, the
+     function must also be complete: that is, it should return true or
+     false, never null, for any two nonnull inputs.  If this rule is
+     not followed, hash-optimization of <literal>IN</literal> operations might
+     generate wrong results.  (Specifically, <literal>IN</literal> might return
+     false where the correct answer according to the standard would be null;
+     or it might yield an error complaining that it wasn't prepared for a
+     null result.)
+    </para>
+    </note>
+
+   </sect2>
+
+   <sect2>
+    <title><literal>MERGES</literal></title>
+
+    <para>
+     The <literal>MERGES</literal> clause, if present, tells the system that
+     it is permissible to use the merge-join method for a join based on this
+     operator.  <literal>MERGES</literal> only makes sense for a binary operator that
+     returns <literal>boolean</literal>, and in practice the operator must represent
+     equality for some data type or pair of data types.
+    </para>
+
+    <para>
+     Merge join is based on the idea of sorting the left- and right-hand tables
+     into order and then scanning them in parallel.  So, both data types must
+     be capable of being fully ordered, and the join operator must be one
+     that can only succeed for pairs of values that fall at the
+     <quote>same place</quote>
+     in the sort order.  In practice this means that the join operator must
+     behave like equality.  But it is possible to merge-join two
+     distinct data types so long as they are logically compatible.  For
+     example, the <type>smallint</type>-versus-<type>integer</type>
+     equality operator is merge-joinable.
+     We only need sorting operators that will bring both data types into a
+     logically compatible sequence.
+    </para>
+
+    <para>
+     To be marked <literal>MERGES</literal>, the join operator must appear
+     as an equality member of a <literal>btree</literal> index operator family.
+     This is not enforced when you create
+     the operator, since of course the referencing operator family couldn't
+     exist yet.  But the operator will not actually be used for merge joins
+     unless a matching operator family can be found.  The
+     <literal>MERGES</literal> flag thus acts as a hint to the planner that
+     it's worth looking for a matching operator family.
+    </para>
+
+    <para>
+     A merge-joinable operator must have a commutator (itself if the two
+     operand data types are the same, or a related equality operator
+     if they are different) that appears in the same operator family.
+     If this is not the case, planner errors might occur when the operator
+     is used.  Also, it is a good idea (but not strictly required) for
+     a <literal>btree</literal> operator family that supports multiple data types to provide
+     equality operators for every combination of the data types; this
+     allows better optimization.
+    </para>
+
+    <note>
+    <para>
+     The function underlying a merge-joinable operator must be marked
+     immutable or stable.  If it is volatile, the system will never
+     attempt to use the operator for a merge join.
+    </para>
+    </note>
+   </sect2>
+  </sect1>