diff options
Diffstat (limited to 'doc/src/sgml/xml2.sgml')
-rw-r--r-- | doc/src/sgml/xml2.sgml | 443 |
1 files changed, 443 insertions, 0 deletions
diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml new file mode 100644 index 0000000..9fd613f --- /dev/null +++ b/doc/src/sgml/xml2.sgml @@ -0,0 +1,443 @@ +<!-- doc/src/sgml/xml2.sgml --> + +<sect1 id="xml2" xreflabel="xml2"> + <title>xml2 — XPath querying and XSLT functionality</title> + + <indexterm zone="xml2"> + <primary>xml2</primary> + </indexterm> + + <para> + The <filename>xml2</filename> module provides XPath querying and + XSLT functionality. + </para> + + <sect2 id="xml2-deprecation"> + <title>Deprecation Notice</title> + + <para> + From <productname>PostgreSQL</productname> 8.3 on, there is XML-related + functionality based on the SQL/XML standard in the core server. + That functionality covers XML syntax checking and XPath queries, + which is what this module does, and more, but the API is + not at all compatible. It is planned that this module will be + removed in a future version of PostgreSQL in favor of the newer standard API, so + you are encouraged to try converting your applications. If you + find that some of the functionality of this module is not + available in an adequate form with the newer API, please explain + your issue to <email>pgsql-hackers@lists.postgresql.org</email> so that the deficiency + can be addressed. + </para> + </sect2> + + <sect2 id="xml2-functions"> + <title>Description of Functions</title> + + <para> + <xref linkend="xml2-functions-table"/> shows the functions provided by this module. + These functions provide straightforward XML parsing and XPath queries. + </para> + + <table id="xml2-functions-table"> + <title><filename>xml2</filename> Functions</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="func_table_entry"><para role="func_signature"> + Function + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xml_valid</function> ( <parameter>document</parameter> <type>text</type> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Parses the given document and returns true if the + document is well-formed XML. (Note: this is an alias for the standard + PostgreSQL function <function>xml_is_well_formed()</function>. The + name <function>xml_valid()</function> is technically incorrect since validity + and well-formedness have different meanings in XML.) + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_string</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + Evaluates the XPath query on the supplied document, and + casts the result to <type>text</type>. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_number</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> ) + <returnvalue>real</returnvalue> + </para> + <para> + Evaluates the XPath query on the supplied document, and + casts the result to <type>real</type>. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_bool</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Evaluates the XPath query on the supplied document, and + casts the result to <type>boolean</type>. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>toptag</parameter> <type>text</type>, <parameter>itemtag</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + Evaluates the query on the document and wraps the result in XML + tags. If the result is multivalued, the output will look like: +<synopsis> +<toptag> +<itemtag>Value 1 which could be an XML fragment</itemtag> +<itemtag>Value 2....</itemtag> +</toptag> +</synopsis> + If either <parameter>toptag</parameter> + or <parameter>itemtag</parameter> is an empty string, the relevant tag + is omitted. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>itemtag</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + Like <function>xpath_nodeset(document, query, toptag, itemtag)</function> but result omits <parameter>toptag</parameter>. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_nodeset</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + Like <function>xpath_nodeset(document, query, toptag, itemtag)</function> but result omits both tags. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_list</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type>, <parameter>separator</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + Evaluates the query on the document and returns multiple values + separated by the specified separator, for example <literal>Value + 1,Value 2,Value 3</literal> if <parameter>separator</parameter> + is <literal>,</literal>. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <function>xpath_list</function> ( <parameter>document</parameter> <type>text</type>, <parameter>query</parameter> <type>text</type> ) + <returnvalue>text</returnvalue> + </para> + <para> + This is a wrapper for the above function that uses <literal>,</literal> + as the separator. + </para></entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2 id="xml2-xpath-table"> + <title><literal>xpath_table</literal></title> + + <indexterm> + <primary>xpath_table</primary> + </indexterm> + +<synopsis> +xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record +</synopsis> + + <para> + <function>xpath_table</function> is a table function that evaluates a set of XPath + queries on each of a set of documents and returns the results as a + table. The primary key field from the original document table is returned + as the first column of the result so that the result set + can readily be used in joins. The parameters are described in + <xref linkend="xml2-xpath-table-parameters"/>. + </para> + + <table id="xml2-xpath-table-parameters"> + <title><function>xpath_table</function> Parameters</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="2*"/> + <thead> + <row> + <entry>Parameter</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><parameter>key</parameter></entry> + <entry> + <para> + the name of the <quote>key</quote> field — this is just a field to be used as + the first column of the output table, i.e., it identifies the record from + which each output row came (see note below about multiple values) + </para> + </entry> + </row> + <row> + <entry><parameter>document</parameter></entry> + <entry> + <para> + the name of the field containing the XML document + </para> + </entry> + </row> + <row> + <entry><parameter>relation</parameter></entry> + <entry> + <para> + the name of the table or view containing the documents + </para> + </entry> + </row> + <row> + <entry><parameter>xpaths</parameter></entry> + <entry> + <para> + one or more XPath expressions, separated by <literal>|</literal> + </para> + </entry> + </row> + <row> + <entry><parameter>criteria</parameter></entry> + <entry> + <para> + the contents of the WHERE clause. This cannot be omitted, so use + <literal>true</literal> or <literal>1=1</literal> if you want to + process all the rows in the relation + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + These parameters (except the XPath strings) are just substituted + into a plain SQL SELECT statement, so you have some flexibility — the + statement is + </para> + + <para> + <literal> + SELECT <key>, <document> FROM <relation> WHERE <criteria> + </literal> + </para> + + <para> + so those parameters can be <emphasis>anything</emphasis> valid in those particular + locations. The result from this SELECT needs to return exactly two + columns (which it will unless you try to list multiple fields for key + or document). Beware that this simplistic approach requires that you + validate any user-supplied values to avoid SQL injection attacks. + </para> + + <para> + The function has to be used in a <literal>FROM</literal> expression, with an + <literal>AS</literal> clause to specify the output columns; for example +<programlisting> +SELECT * FROM +xpath_table('article_id', + 'article_xml', + 'articles', + '/article/author|/article/pages|/article/title', + 'date_entered > ''2003-01-01'' ') +AS t(article_id integer, author text, page_count integer, title text); +</programlisting> + The <literal>AS</literal> clause defines the names and types of the columns in the + output table. The first is the <quote>key</quote> field and the rest correspond + to the XPath queries. + If there are more XPath queries than result columns, + the extra queries will be ignored. If there are more result columns + than XPath queries, the extra columns will be NULL. + </para> + + <para> + Notice that this example defines the <structname>page_count</structname> result + column as an integer. The function deals internally with string + representations, so when you say you want an integer in the output, it will + take the string representation of the XPath result and use PostgreSQL input + functions to transform it into an integer (or whatever type the <type>AS</type> + clause requests). An error will result if it can't do this — for + example if the result is empty — so you may wish to just stick to + <type>text</type> as the column type if you think your data has any problems. + </para> + + <para> + The calling <command>SELECT</command> statement doesn't necessarily have to be + just <literal>SELECT *</literal> — it can reference the output + columns by name or join them to other tables. The function produces a + virtual table with which you can perform any operation you wish (e.g., + aggregation, joining, sorting etc.). So we could also have: +<programlisting> +SELECT t.title, p.fullname, p.email +FROM xpath_table('article_id', 'article_xml', 'articles', + '/article/title|/article/author/@id', + 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ') + AS t(article_id integer, title text, author_id integer), + tblPeopleInfo AS p +WHERE t.author_id = p.person_id; +</programlisting> + as a more complicated example. Of course, you could wrap all + of this in a view for convenience. + </para> + + <sect3 id="xml2-xpath-table-multivalued-results"> + <title>Multivalued Results</title> + + <para> + The <function>xpath_table</function> function assumes that the results of each XPath query + might be multivalued, so the number of rows returned by the function + may not be the same as the number of input documents. The first row + returned contains the first result from each query, the second row the + second result from each query. If one of the queries has fewer values + than the others, null values will be returned instead. + </para> + + <para> + In some cases, a user will know that a given XPath query will return + only a single result (perhaps a unique document identifier) — if used + alongside an XPath query returning multiple results, the single-valued + result will appear only on the first row of the result. The solution + to this is to use the key field as part of a join against a simpler + XPath query. As an example: + +<programlisting> +CREATE TABLE test ( + id int PRIMARY KEY, + xml text +); + +INSERT INTO test VALUES (1, '<doc num="C1"> +<line num="L1"><a>1</a><b>2</b><c>3</c></line> +<line num="L2"><a>11</a><b>22</b><c>33</c></line> +</doc>'); + +INSERT INTO test VALUES (2, '<doc num="C2"> +<line num="L1"><a>111</a><b>222</b><c>333</c></line> +<line num="L2"><a>111</a><b>222</b><c>333</c></line> +</doc>'); + +SELECT * FROM + xpath_table('id','xml','test', + '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c', + 'true') + AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int, val3 int) +WHERE id = 1 ORDER BY doc_num, line_num + + id | doc_num | line_num | val1 | val2 | val3 +----+---------+----------+------+------+------ + 1 | C1 | L1 | 1 | 2 | 3 + 1 | | L2 | 11 | 22 | 33 +</programlisting> + </para> + + <para> + To get <literal>doc_num</literal> on every line, the solution is to use two invocations + of <function>xpath_table</function> and join the results: + +<programlisting> +SELECT t.*,i.doc_num FROM + xpath_table('id', 'xml', 'test', + '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c', + 'true') + AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int), + xpath_table('id', 'xml', 'test', '/doc/@num', 'true') + AS i(id int, doc_num varchar(10)) +WHERE i.id=t.id AND i.id=1 +ORDER BY doc_num, line_num; + + id | line_num | val1 | val2 | val3 | doc_num +----+----------+------+------+------+--------- + 1 | L1 | 1 | 2 | 3 | C1 + 1 | L2 | 11 | 22 | 33 | C1 +(2 rows) +</programlisting> + </para> + </sect3> + </sect2> + + <sect2 id="xml2-xslt"> + <title>XSLT Functions</title> + + <para> + The following functions are available if libxslt is installed: + </para> + + <sect3 id="xml2-xslt-xslt-process"> + <title><literal>xslt_process</literal></title> + + <indexterm> + <primary>xslt_process</primary> + </indexterm> + +<synopsis> +xslt_process(text document, text stylesheet, text paramlist) returns text +</synopsis> + + <para> + This function applies the XSL stylesheet to the document and returns + the transformed result. The <literal>paramlist</literal> is a list of parameter + assignments to be used in the transformation, specified in the form + <literal>a=1,b=2</literal>. Note that the + parameter parsing is very simple-minded: parameter values cannot + contain commas! + </para> + + <para> + There is also a two-parameter version of <function>xslt_process</function> which + does not pass any parameters to the transformation. + </para> + </sect3> + </sect2> + + <sect2 id="xml2-author"> + <title>Author</title> + + <para> + John Gray <email>jgray@azuli.co.uk</email> + </para> + + <para> + Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com). + It has the same BSD license as PostgreSQL. + </para> + </sect2> + +</sect1> |