diff options
Diffstat (limited to 'doc/src/sgml/seg.sgml')
-rw-r--r-- | doc/src/sgml/seg.sgml | 415 |
1 files changed, 415 insertions, 0 deletions
diff --git a/doc/src/sgml/seg.sgml b/doc/src/sgml/seg.sgml new file mode 100644 index 0000000..9be69e3 --- /dev/null +++ b/doc/src/sgml/seg.sgml @@ -0,0 +1,415 @@ +<!-- doc/src/sgml/seg.sgml --> + +<sect1 id="seg" xreflabel="seg"> + <title>seg</title> + + <indexterm zone="seg"> + <primary>seg</primary> + </indexterm> + + <para> + This module implements a data type <type>seg</type> for + representing line segments, or floating point intervals. + <type>seg</type> can represent uncertainty in the interval endpoints, + making it especially useful for representing laboratory measurements. + </para> + + <para> + This module is considered <quote>trusted</quote>, that is, it can be + installed by non-superusers who have <literal>CREATE</literal> privilege + on the current database. + </para> + + <sect2> + <title>Rationale</title> + + <para> + The geometry of measurements is usually more complex than that of a + point in a numeric continuum. A measurement is usually a segment of + that continuum with somewhat fuzzy limits. The measurements come out + as intervals because of uncertainty and randomness, as well as because + the value being measured may naturally be an interval indicating some + condition, such as the temperature range of stability of a protein. + </para> + + <para> + Using just common sense, it appears more convenient to store such data + as intervals, rather than pairs of numbers. In practice, it even turns + out more efficient in most applications. + </para> + + <para> + Further along the line of common sense, the fuzziness of the limits + suggests that the use of traditional numeric data types leads to a + certain loss of information. Consider this: your instrument reads + 6.50, and you input this reading into the database. What do you get + when you fetch it? Watch: + +<screen> +test=> select 6.50 :: float8 as "pH"; + pH +--- +6.5 +(1 row) +</screen> + + In the world of measurements, 6.50 is not the same as 6.5. It may + sometimes be critically different. The experimenters usually write + down (and publish) the digits they trust. 6.50 is actually a fuzzy + interval contained within a bigger and even fuzzier interval, 6.5, + with their center points being (probably) the only common feature they + share. We definitely do not want such different data items to appear the + same. + </para> + + <para> + Conclusion? It is nice to have a special data type that can record the + limits of an interval with arbitrarily variable precision. Variable in + the sense that each data element records its own precision. + </para> + + <para> + Check this out: + +<screen> +test=> select '6.25 .. 6.50'::seg as "pH"; + pH +------------ +6.25 .. 6.50 +(1 row) +</screen> + </para> + </sect2> + + <sect2> + <title>Syntax</title> + + <para> + The external representation of an interval is formed using one or two + floating-point numbers joined by the range operator (<literal>..</literal> + or <literal>...</literal>). Alternatively, it can be specified as a + center point plus or minus a deviation. + Optional certainty indicators (<literal><</literal>, + <literal>></literal> or <literal>~</literal>) can be stored as well. + (Certainty indicators are ignored by all the built-in operators, however.) + <xref linkend="seg-repr-table"/> gives an overview of allowed + representations; <xref linkend="seg-input-examples"/> shows some + examples. + </para> + + <para> + In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and + <replaceable>delta</replaceable> denote + floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but + not <replaceable>delta</replaceable>, can be preceded by a certainty indicator. + </para> + + <table id="seg-repr-table"> + <title><type>seg</type> External Representations</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal><replaceable>x</replaceable></literal></entry> + <entry>Single value (zero-length interval) + </entry> + </row> + <row> + <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry> + <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable> + </entry> + </row> + <row> + <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry> + <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to + <replaceable>x</replaceable> + <replaceable>delta</replaceable> + </entry> + </row> + <row> + <entry><literal><replaceable>x</replaceable> ..</literal></entry> + <entry>Open interval with lower bound <replaceable>x</replaceable> + </entry> + </row> + <row> + <entry><literal>.. <replaceable>x</replaceable></literal></entry> + <entry>Open interval with upper bound <replaceable>x</replaceable> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <table id="seg-input-examples"> + <title>Examples of Valid <type>seg</type> Input</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="2*"/> + <tbody> + <row> + <entry><literal>5.0</literal></entry> + <entry> + Creates a zero-length segment (a point, if you will) + </entry> + </row> + <row> + <entry><literal>~5.0</literal></entry> + <entry> + Creates a zero-length segment and records + <literal>~</literal> in the data. <literal>~</literal> is ignored + by <type>seg</type> operations, but + is preserved as a comment. + </entry> + </row> + <row> + <entry><literal><5.0</literal></entry> + <entry> + Creates a point at 5.0. <literal><</literal> is ignored but + is preserved as a comment. + </entry> + </row> + <row> + <entry><literal>>5.0</literal></entry> + <entry> + Creates a point at 5.0. <literal>></literal> is ignored but + is preserved as a comment. + </entry> + </row> + <row> + <entry><literal>5(+-)0.3</literal></entry> + <entry> + Creates an interval <literal>4.7 .. 5.3</literal>. + Note that the <literal>(+-)</literal> notation isn't preserved. + </entry> + </row> + <row> + <entry><literal>50 .. </literal></entry> + <entry>Everything that is greater than or equal to 50</entry> + </row> + <row> + <entry><literal>.. 0</literal></entry> + <entry>Everything that is less than or equal to 0</entry> + </row> + <row> + <entry><literal>1.5e-2 .. 2E-2 </literal></entry> + <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry> + </row> + <row> + <entry><literal>1 ... 2</literal></entry> + <entry> + The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>, + or <literal>1..2</literal> + (spaces around the range operator are ignored) + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Because the <literal>...</literal> operator is widely used in data sources, it is allowed + as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this + creates a parsing ambiguity: it is not clear whether the upper bound + in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>. + This is resolved by requiring at least one digit before the decimal + point in all numbers in <type>seg</type> input. + </para> + + <para> + As a sanity check, <type>seg</type> rejects intervals with the lower bound + greater than the upper, for example <literal>5 .. 2</literal>. + </para> + + </sect2> + + <sect2> + <title>Precision</title> + + <para> + <type>seg</type> values are stored internally as pairs of 32-bit floating point + numbers. This means that numbers with more than 7 significant digits + will be truncated. + </para> + + <para> + Numbers with 7 or fewer significant digits retain their + original precision. That is, if your query returns 0.00, you will be + sure that the trailing zeroes are not the artifacts of formatting: they + reflect the precision of the original data. The number of leading + zeroes does not affect precision: the value 0.0067 is considered to + have just 2 significant digits. + </para> + </sect2> + + <sect2> + <title>Usage</title> + + <para> + The <filename>seg</filename> module includes a GiST index operator class for + <type>seg</type> values. + The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>. + </para> + + <table id="seg-gist-operators"> + <title>Seg GiST Operators</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="func_table_entry"><para role="func_signature"> + Operator + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal><<</literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Is the first <type>seg</type> entirely to the left of the second? + [a, b] << [c, d] is true if b < c. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>>></literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Is the first <type>seg</type> entirely to the right of the second? + [a, b] >> [c, d] is true if a > d. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>&<</literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Does the first <type>seg</type> not extend to the right of the + second? + [a, b] &< [c, d] is true if b <= d. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>&></literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Does the first <type>seg</type> not extend to the left of the + second? + [a, b] &> [c, d] is true if a >= c. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>=</literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Are the two <type>seg</type>s equal? + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>&&</literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Do the two <type>seg</type>s overlap? + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal>@></literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Does the first <type>seg</type> contain the second? + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <type>seg</type> <literal><@</literal> <type>seg</type> + <returnvalue>boolean</returnvalue> + </para> + <para> + Is the first <type>seg</type> contained in the second? + </para></entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + In addition to the above operators, the usual comparison + operators shown in <xref linkend="functions-comparison-op-table"/> are + available for type <type>seg</type>. These operators + first compare (a) to (c), + and if these are equal, compare (b) to (d). That results in + reasonably good sorting in most cases, which is useful if + you want to use ORDER BY with this type. + </para> + </sect2> + + <sect2> + <title>Notes</title> + + <para> + For examples of usage, see the regression test <filename>sql/seg.sql</filename>. + </para> + + <para> + The mechanism that converts <literal>(+-)</literal> to regular ranges + isn't completely accurate in determining the number of significant digits + for the boundaries. For example, it adds an extra digit to the lower + boundary if the resulting interval includes a power of ten: + +<screen> +postgres=> select '10(+-)1'::seg as seg; + seg +--------- +9.0 .. 11 -- should be: 9 .. 11 +</screen> + </para> + + <para> + The performance of an R-tree index can largely depend on the initial + order of input values. It may be very helpful to sort the input table + on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename> + for an example. + </para> + </sect2> + + <sect2> + <title>Credits</title> + + <para> + Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>, + Mathematics and Computer Science Division, Argonne National Laboratory. + </para> + + <para> + My thanks are primarily to Prof. Joe Hellerstein + (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the + gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am + also grateful to all Postgres developers, present and past, for enabling + myself to create my own world and live undisturbed in it. And I would like + to acknowledge my gratitude to Argonne Lab and to the U.S. Department of + Energy for the years of faithful support of my database research. + </para> + + </sect2> + +</sect1> |