1 files changed, 415 insertions, 0 deletions
diff --git a/doc/src/sgml/seg.sgml b/doc/src/sgml/seg.sgml
new file mode 100644
index 0000000..dc66e24
--- /dev/null
+++ b/doc/src/sgml/seg.sgml
@@ -0,0 +1,415 @@
+<!-- doc/src/sgml/seg.sgml -->
+
+<sect1 id="seg" xreflabel="seg">
+ <title>seg &mdash; a datatype for line segments or floating point intervals</title>
+
+ <indexterm zone="seg">
+  <primary>seg</primary>
+ </indexterm>
+
+ <para>
+  This module implements a data type <type>seg</type> for
+  representing line segments, or floating point intervals.
+  <type>seg</type> can represent uncertainty in the interval endpoints,
+  making it especially useful for representing laboratory measurements.
+ </para>
+
+ <para>
+  This module is considered <quote>trusted</quote>, that is, it can be
+  installed by non-superusers who have <literal>CREATE</literal> privilege
+  on the current database.
+ </para>
+
+ <sect2 id="seg-rationale">
+  <title>Rationale</title>
+
+  <para>
+   The geometry of measurements is usually more complex than that of a
+   point in a numeric continuum. A measurement is usually a segment of
+   that continuum with somewhat fuzzy limits. The measurements come out
+   as intervals because of uncertainty and randomness, as well as because
+   the value being measured may naturally be an interval indicating some
+   condition, such as the temperature range of stability of a protein.
+  </para>
+
+  <para>
+   Using just common sense, it appears more convenient to store such data
+   as intervals, rather than pairs of numbers. In practice, it even turns
+   out more efficient in most applications.
+  </para>
+
+  <para>
+   Further along the line of common sense, the fuzziness of the limits
+   suggests that the use of traditional numeric data types leads to a
+   certain loss of information. Consider this: your instrument reads
+   6.50, and you input this reading into the database. What do you get
+   when you fetch it? Watch:
+
+<screen>
+test=&gt; select 6.50 :: float8 as "pH";
+ pH
+---
+6.5
+(1 row)
+</screen>
+
+   In the world of measurements, 6.50 is not the same as 6.5. It may
+   sometimes be critically different. The experimenters usually write
+   down (and publish) the digits they trust. 6.50 is actually a fuzzy
+   interval contained within a bigger and even fuzzier interval, 6.5,
+   with their center points being (probably) the only common feature they
+   share. We definitely do not want such different data items to appear the
+   same.
+  </para>
+
+  <para>
+   Conclusion? It is nice to have a special data type that can record the
+   limits of an interval with arbitrarily variable precision. Variable in
+   the sense that each data element records its own precision.
+  </para>
+
+  <para>
+   Check this out:
+
+<screen>
+test=&gt; select '6.25 .. 6.50'::seg as "pH";
+          pH
+------------
+6.25 .. 6.50
+(1 row)
+</screen>
+  </para>
+ </sect2>
+
+ <sect2 id="seg-syntax">
+  <title>Syntax</title>
+
+  <para>
+   The external representation of an interval is formed using one or two
+   floating-point numbers joined by the range operator (<literal>..</literal>
+   or <literal>...</literal>).  Alternatively, it can be specified as a
+   center point plus or minus a deviation.
+   Optional certainty indicators (<literal>&lt;</literal>,
+   <literal>&gt;</literal> or <literal>~</literal>) can be stored as well.
+   (Certainty indicators are ignored by all the built-in operators, however.)
+   <xref linkend="seg-repr-table"/> gives an overview of allowed
+   representations; <xref linkend="seg-input-examples"/> shows some
+   examples.
+  </para>
+
+  <para>
+   In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and
+   <replaceable>delta</replaceable> denote
+   floating-point numbers.  <replaceable>x</replaceable> and <replaceable>y</replaceable>, but
+   not <replaceable>delta</replaceable>, can be preceded by a certainty indicator.
+  </para>
+
+  <table id="seg-repr-table">
+   <title><type>seg</type> External Representations</title>
+   <tgroup cols="2">
+    <tbody>
+     <row>
+      <entry><literal><replaceable>x</replaceable></literal></entry>
+      <entry>Single value (zero-length interval)
+      </entry>
+     </row>
+     <row>
+      <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry>
+      <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable>
+      </entry>
+     </row>
+     <row>
+      <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry>
+      <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to
+      <replaceable>x</replaceable> + <replaceable>delta</replaceable>
+      </entry>
+     </row>
+     <row>
+      <entry><literal><replaceable>x</replaceable> ..</literal></entry>
+      <entry>Open interval with lower bound <replaceable>x</replaceable>
+      </entry>
+     </row>
+     <row>
+      <entry><literal>.. <replaceable>x</replaceable></literal></entry>
+      <entry>Open interval with upper bound <replaceable>x</replaceable>
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="seg-input-examples">
+   <title>Examples of Valid <type>seg</type> Input</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <tbody>
+     <row>
+      <entry><literal>5.0</literal></entry>
+      <entry>
+       Creates a zero-length segment (a point, if you will)
+      </entry>
+     </row>
+     <row>
+      <entry><literal>~5.0</literal></entry>
+      <entry>
+       Creates a zero-length segment and records
+       <literal>~</literal> in the data.  <literal>~</literal> is ignored
+       by <type>seg</type> operations, but
+       is preserved as a comment.
+      </entry>
+     </row>
+     <row>
+      <entry><literal>&lt;5.0</literal></entry>
+      <entry>
+       Creates a point at 5.0.  <literal>&lt;</literal> is ignored but
+       is preserved as a comment.
+      </entry>
+     </row>
+     <row>
+      <entry><literal>&gt;5.0</literal></entry>
+      <entry>
+       Creates a point at 5.0.  <literal>&gt;</literal> is ignored but
+       is preserved as a comment.
+      </entry>
+     </row>
+     <row>
+      <entry><literal>5(+-)0.3</literal></entry>
+      <entry>
+        Creates an interval <literal>4.7 .. 5.3</literal>.
+        Note that the <literal>(+-)</literal> notation isn't preserved.
+      </entry>
+     </row>
+     <row>
+      <entry><literal>50 .. </literal></entry>
+      <entry>Everything that is greater than or equal to 50</entry>
+     </row>
+     <row>
+      <entry><literal>.. 0</literal></entry>
+      <entry>Everything that is less than or equal to 0</entry>
+     </row>
+     <row>
+      <entry><literal>1.5e-2 .. 2E-2 </literal></entry>
+      <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
+     </row>
+     <row>
+      <entry><literal>1 ... 2</literal></entry>
+      <entry>
+       The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
+       or <literal>1..2</literal>
+       (spaces around the range operator are ignored)
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   Because the <literal>...</literal> operator is widely used in data sources, it is allowed
+   as an alternative spelling of the <literal>..</literal> operator.  Unfortunately, this
+   creates a parsing ambiguity: it is not clear whether the upper bound
+   in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>.
+   This is resolved by requiring at least one digit before the decimal
+   point in all numbers in <type>seg</type> input.
+  </para>
+
+  <para>
+   As a sanity check, <type>seg</type> rejects intervals with the lower bound
+   greater than the upper, for example <literal>5 .. 2</literal>.
+  </para>
+
+ </sect2>
+
+ <sect2 id="seg-precision">
+  <title>Precision</title>
+
+  <para>
+   <type>seg</type> values are stored internally as pairs of 32-bit floating point
+   numbers. This means that numbers with more than 7 significant digits
+   will be truncated.
+  </para>
+
+  <para>
+   Numbers with 7 or fewer significant digits retain their
+   original precision. That is, if your query returns 0.00, you will be
+   sure that the trailing zeroes are not the artifacts of formatting: they
+   reflect the precision of the original data. The number of leading
+   zeroes does not affect precision: the value 0.0067 is considered to
+   have just 2 significant digits.
+  </para>
+ </sect2>
+
+ <sect2 id="seg-usage">
+  <title>Usage</title>
+
+  <para>
+   The <filename>seg</filename> module includes a GiST index operator class for
+   <type>seg</type> values.
+   The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>.
+  </para>
+
+  <table id="seg-gist-operators">
+   <title>Seg GiST Operators</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Operator
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&lt;&lt;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Is the first <type>seg</type> entirely to the left of the second?
+        [a, b] &lt;&lt; [c, d] is true if b &lt; c.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&gt;&gt;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Is the first <type>seg</type> entirely to the right of the second?
+        [a, b] &gt;&gt; [c, d] is true if a &gt; d.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&amp;&lt;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Does the first <type>seg</type> not extend to the right of the
+        second?
+        [a, b] &amp;&lt; [c, d] is true if b &lt;= d.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&amp;&gt;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Does the first <type>seg</type> not extend to the left of the
+        second?
+        [a, b] &amp;&gt; [c, d] is true if a &gt;= c.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>=</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Are the two <type>seg</type>s equal?
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&amp;&amp;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Do the two <type>seg</type>s overlap?
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>@&gt;</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Does the first <type>seg</type> contain the second?
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <type>seg</type> <literal>&lt;@</literal> <type>seg</type>
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Is the first <type>seg</type> contained in the second?
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+  </table>
+
+  <para>
+   In addition to the above operators, the usual comparison
+   operators shown in <xref linkend="functions-comparison-op-table"/> are
+   available for type <type>seg</type>.  These operators
+   first compare (a) to (c),
+   and if these are equal, compare (b) to (d). That results in
+   reasonably good sorting in most cases, which is useful if
+   you want to use ORDER BY with this type.
+  </para>
+ </sect2>
+
+ <sect2 id="seg-notes">
+  <title>Notes</title>
+
+  <para>
+   For examples of usage, see the regression test <filename>sql/seg.sql</filename>.
+  </para>
+
+  <para>
+   The mechanism that converts <literal>(+-)</literal> to regular ranges
+   isn't completely accurate in determining the number of significant digits
+   for the boundaries.  For example, it adds an extra digit to the lower
+   boundary if the resulting interval includes a power of ten:
+
+<screen>
+postgres=&gt; select '10(+-)1'::seg as seg;
+      seg
+---------
+9.0 .. 11             -- should be: 9 .. 11
+</screen>
+  </para>
+
+  <para>
+   The performance of an R-tree index can largely depend on the initial
+   order of input values. It may be very helpful to sort the input table
+   on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename>
+   for an example.
+  </para>
+ </sect2>
+
+ <sect2 id="seg-credits">
+  <title>Credits</title>
+
+  <para>
+   Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
+   Mathematics and Computer Science Division, Argonne National Laboratory.
+  </para>
+
+  <para>
+   My thanks are primarily to Prof. Joe Hellerstein
+   (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the
+   gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
+   also grateful to all Postgres developers, present and past, for enabling
+   myself to create my own world and live undisturbed in it. And I would like
+   to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
+   Energy for the years of faithful support of my database research.
+  </para>
+
+ </sect2>
+
+</sect1>