summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/seg.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/seg.sgml')
-rw-r--r--doc/src/sgml/seg.sgml415
1 files changed, 415 insertions, 0 deletions
diff --git a/doc/src/sgml/seg.sgml b/doc/src/sgml/seg.sgml
new file mode 100644
index 0000000..dc66e24
--- /dev/null
+++ b/doc/src/sgml/seg.sgml
@@ -0,0 +1,415 @@
+<!-- doc/src/sgml/seg.sgml -->
+
+<sect1 id="seg" xreflabel="seg">
+ <title>seg &mdash; a datatype for line segments or floating point intervals</title>
+
+ <indexterm zone="seg">
+ <primary>seg</primary>
+ </indexterm>
+
+ <para>
+ This module implements a data type <type>seg</type> for
+ representing line segments, or floating point intervals.
+ <type>seg</type> can represent uncertainty in the interval endpoints,
+ making it especially useful for representing laboratory measurements.
+ </para>
+
+ <para>
+ This module is considered <quote>trusted</quote>, that is, it can be
+ installed by non-superusers who have <literal>CREATE</literal> privilege
+ on the current database.
+ </para>
+
+ <sect2 id="seg-rationale">
+ <title>Rationale</title>
+
+ <para>
+ The geometry of measurements is usually more complex than that of a
+ point in a numeric continuum. A measurement is usually a segment of
+ that continuum with somewhat fuzzy limits. The measurements come out
+ as intervals because of uncertainty and randomness, as well as because
+ the value being measured may naturally be an interval indicating some
+ condition, such as the temperature range of stability of a protein.
+ </para>
+
+ <para>
+ Using just common sense, it appears more convenient to store such data
+ as intervals, rather than pairs of numbers. In practice, it even turns
+ out more efficient in most applications.
+ </para>
+
+ <para>
+ Further along the line of common sense, the fuzziness of the limits
+ suggests that the use of traditional numeric data types leads to a
+ certain loss of information. Consider this: your instrument reads
+ 6.50, and you input this reading into the database. What do you get
+ when you fetch it? Watch:
+
+<screen>
+test=&gt; select 6.50 :: float8 as "pH";
+ pH
+---
+6.5
+(1 row)
+</screen>
+
+ In the world of measurements, 6.50 is not the same as 6.5. It may
+ sometimes be critically different. The experimenters usually write
+ down (and publish) the digits they trust. 6.50 is actually a fuzzy
+ interval contained within a bigger and even fuzzier interval, 6.5,
+ with their center points being (probably) the only common feature they
+ share. We definitely do not want such different data items to appear the
+ same.
+ </para>
+
+ <para>
+ Conclusion? It is nice to have a special data type that can record the
+ limits of an interval with arbitrarily variable precision. Variable in
+ the sense that each data element records its own precision.
+ </para>
+
+ <para>
+ Check this out:
+
+<screen>
+test=&gt; select '6.25 .. 6.50'::seg as "pH";
+ pH
+------------
+6.25 .. 6.50
+(1 row)
+</screen>
+ </para>
+ </sect2>
+
+ <sect2 id="seg-syntax">
+ <title>Syntax</title>
+
+ <para>
+ The external representation of an interval is formed using one or two
+ floating-point numbers joined by the range operator (<literal>..</literal>
+ or <literal>...</literal>). Alternatively, it can be specified as a
+ center point plus or minus a deviation.
+ Optional certainty indicators (<literal>&lt;</literal>,
+ <literal>&gt;</literal> or <literal>~</literal>) can be stored as well.
+ (Certainty indicators are ignored by all the built-in operators, however.)
+ <xref linkend="seg-repr-table"/> gives an overview of allowed
+ representations; <xref linkend="seg-input-examples"/> shows some
+ examples.
+ </para>
+
+ <para>
+ In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and
+ <replaceable>delta</replaceable> denote
+ floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but
+ not <replaceable>delta</replaceable>, can be preceded by a certainty indicator.
+ </para>
+
+ <table id="seg-repr-table">
+ <title><type>seg</type> External Representations</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal><replaceable>x</replaceable></literal></entry>
+ <entry>Single value (zero-length interval)
+ </entry>
+ </row>
+ <row>
+ <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry>
+ <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable>
+ </entry>
+ </row>
+ <row>
+ <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry>
+ <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to
+ <replaceable>x</replaceable> + <replaceable>delta</replaceable>
+ </entry>
+ </row>
+ <row>
+ <entry><literal><replaceable>x</replaceable> ..</literal></entry>
+ <entry>Open interval with lower bound <replaceable>x</replaceable>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>.. <replaceable>x</replaceable></literal></entry>
+ <entry>Open interval with upper bound <replaceable>x</replaceable>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="seg-input-examples">
+ <title>Examples of Valid <type>seg</type> Input</title>
+ <tgroup cols="2">
+ <colspec colname="col1" colwidth="1*"/>
+ <colspec colname="col2" colwidth="2*"/>
+ <tbody>
+ <row>
+ <entry><literal>5.0</literal></entry>
+ <entry>
+ Creates a zero-length segment (a point, if you will)
+ </entry>
+ </row>
+ <row>
+ <entry><literal>~5.0</literal></entry>
+ <entry>
+ Creates a zero-length segment and records
+ <literal>~</literal> in the data. <literal>~</literal> is ignored
+ by <type>seg</type> operations, but
+ is preserved as a comment.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>&lt;5.0</literal></entry>
+ <entry>
+ Creates a point at 5.0. <literal>&lt;</literal> is ignored but
+ is preserved as a comment.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>&gt;5.0</literal></entry>
+ <entry>
+ Creates a point at 5.0. <literal>&gt;</literal> is ignored but
+ is preserved as a comment.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>5(+-)0.3</literal></entry>
+ <entry>
+ Creates an interval <literal>4.7 .. 5.3</literal>.
+ Note that the <literal>(+-)</literal> notation isn't preserved.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>50 .. </literal></entry>
+ <entry>Everything that is greater than or equal to 50</entry>
+ </row>
+ <row>
+ <entry><literal>.. 0</literal></entry>
+ <entry>Everything that is less than or equal to 0</entry>
+ </row>
+ <row>
+ <entry><literal>1.5e-2 .. 2E-2 </literal></entry>
+ <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
+ </row>
+ <row>
+ <entry><literal>1 ... 2</literal></entry>
+ <entry>
+ The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
+ or <literal>1..2</literal>
+ (spaces around the range operator are ignored)
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Because the <literal>...</literal> operator is widely used in data sources, it is allowed
+ as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this
+ creates a parsing ambiguity: it is not clear whether the upper bound
+ in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>.
+ This is resolved by requiring at least one digit before the decimal
+ point in all numbers in <type>seg</type> input.
+ </para>
+
+ <para>
+ As a sanity check, <type>seg</type> rejects intervals with the lower bound
+ greater than the upper, for example <literal>5 .. 2</literal>.
+ </para>
+
+ </sect2>
+
+ <sect2 id="seg-precision">
+ <title>Precision</title>
+
+ <para>
+ <type>seg</type> values are stored internally as pairs of 32-bit floating point
+ numbers. This means that numbers with more than 7 significant digits
+ will be truncated.
+ </para>
+
+ <para>
+ Numbers with 7 or fewer significant digits retain their
+ original precision. That is, if your query returns 0.00, you will be
+ sure that the trailing zeroes are not the artifacts of formatting: they
+ reflect the precision of the original data. The number of leading
+ zeroes does not affect precision: the value 0.0067 is considered to
+ have just 2 significant digits.
+ </para>
+ </sect2>
+
+ <sect2 id="seg-usage">
+ <title>Usage</title>
+
+ <para>
+ The <filename>seg</filename> module includes a GiST index operator class for
+ <type>seg</type> values.
+ The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>.
+ </para>
+
+ <table id="seg-gist-operators">
+ <title>Seg GiST Operators</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ Operator
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&lt;&lt;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Is the first <type>seg</type> entirely to the left of the second?
+ [a, b] &lt;&lt; [c, d] is true if b &lt; c.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&gt;&gt;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Is the first <type>seg</type> entirely to the right of the second?
+ [a, b] &gt;&gt; [c, d] is true if a &gt; d.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&amp;&lt;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Does the first <type>seg</type> not extend to the right of the
+ second?
+ [a, b] &amp;&lt; [c, d] is true if b &lt;= d.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&amp;&gt;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Does the first <type>seg</type> not extend to the left of the
+ second?
+ [a, b] &amp;&gt; [c, d] is true if a &gt;= c.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>=</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Are the two <type>seg</type>s equal?
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&amp;&amp;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Do the two <type>seg</type>s overlap?
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>@&gt;</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Does the first <type>seg</type> contain the second?
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <type>seg</type> <literal>&lt;@</literal> <type>seg</type>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Is the first <type>seg</type> contained in the second?
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ In addition to the above operators, the usual comparison
+ operators shown in <xref linkend="functions-comparison-op-table"/> are
+ available for type <type>seg</type>. These operators
+ first compare (a) to (c),
+ and if these are equal, compare (b) to (d). That results in
+ reasonably good sorting in most cases, which is useful if
+ you want to use ORDER BY with this type.
+ </para>
+ </sect2>
+
+ <sect2 id="seg-notes">
+ <title>Notes</title>
+
+ <para>
+ For examples of usage, see the regression test <filename>sql/seg.sql</filename>.
+ </para>
+
+ <para>
+ The mechanism that converts <literal>(+-)</literal> to regular ranges
+ isn't completely accurate in determining the number of significant digits
+ for the boundaries. For example, it adds an extra digit to the lower
+ boundary if the resulting interval includes a power of ten:
+
+<screen>
+postgres=&gt; select '10(+-)1'::seg as seg;
+ seg
+---------
+9.0 .. 11 -- should be: 9 .. 11
+</screen>
+ </para>
+
+ <para>
+ The performance of an R-tree index can largely depend on the initial
+ order of input values. It may be very helpful to sort the input table
+ on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename>
+ for an example.
+ </para>
+ </sect2>
+
+ <sect2 id="seg-credits">
+ <title>Credits</title>
+
+ <para>
+ Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
+ Mathematics and Computer Science Division, Argonne National Laboratory.
+ </para>
+
+ <para>
+ My thanks are primarily to Prof. Joe Hellerstein
+ (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the
+ gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
+ also grateful to all Postgres developers, present and past, for enabling
+ myself to create my own world and live undisturbed in it. And I would like
+ to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
+ Energy for the years of faithful support of my database research.
+ </para>
+
+ </sect2>
+
+</sect1>