diff options
Diffstat (limited to '')
-rw-r--r-- | doc/src/sgml/json.sgml | 1016 |
1 files changed, 1016 insertions, 0 deletions
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml new file mode 100644 index 0000000..c421d4b --- /dev/null +++ b/doc/src/sgml/json.sgml @@ -0,0 +1,1016 @@ +<!-- doc/src/sgml/json.sgml --> + +<sect1 id="datatype-json"> + <title><acronym>JSON</acronym> Types</title> + + <indexterm zone="datatype-json"> + <primary>JSON</primary> + </indexterm> + + <indexterm zone="datatype-json"> + <primary>JSONB</primary> + </indexterm> + + <para> + JSON data types are for storing JSON (JavaScript Object Notation) + data, as specified in <ulink url="https://tools.ietf.org/html/rfc7159">RFC + 7159</ulink>. Such data can also be stored as <type>text</type>, but + the JSON data types have the advantage of enforcing that each + stored value is valid according to the JSON rules. There are also + assorted JSON-specific functions and operators available for data stored + in these data types; see <xref linkend="functions-json"/>. + </para> + + <para> + <productname>PostgreSQL</productname> offers two types for storing JSON + data: <type>json</type> and <type>jsonb</type>. To implement efficient query + mechanisms for these data types, <productname>PostgreSQL</productname> + also provides the <type>jsonpath</type> data type described in + <xref linkend="datatype-jsonpath"/>. + </para> + + <para> + The <type>json</type> and <type>jsonb</type> data types + accept <emphasis>almost</emphasis> identical sets of values as + input. The major practical difference is one of efficiency. The + <type>json</type> data type stores an exact copy of the input text, + which processing functions must reparse on each execution; while + <type>jsonb</type> data is stored in a decomposed binary format that + makes it slightly slower to input due to added conversion + overhead, but significantly faster to process, since no reparsing + is needed. <type>jsonb</type> also supports indexing, which can be a + significant advantage. + </para> + + <para> + Because the <type>json</type> type stores an exact copy of the input text, it + will preserve semantically-insignificant white space between tokens, as + well as the order of keys within JSON objects. Also, if a JSON object + within the value contains the same key more than once, all the key/value + pairs are kept. (The processing functions consider the last value as the + operative one.) By contrast, <type>jsonb</type> does not preserve white + space, does not preserve the order of object keys, and does not keep + duplicate object keys. If duplicate keys are specified in the input, + only the last value is kept. + </para> + + <para> + In general, most applications should prefer to store JSON data as + <type>jsonb</type>, unless there are quite specialized needs, such as + legacy assumptions about ordering of object keys. + </para> + + <para> + <acronym>RFC</acronym> 7159 specifies that JSON strings should be encoded in UTF8. + It is therefore not possible for the JSON + types to conform rigidly to the JSON specification unless the database + encoding is UTF8. Attempts to directly include characters that + cannot be represented in the database encoding will fail; conversely, + characters that can be represented in the database encoding but not + in UTF8 will be allowed. + </para> + + <para> + <acronym>RFC</acronym> 7159 permits JSON strings to contain Unicode escape sequences + denoted by <literal>\u<replaceable>XXXX</replaceable></literal>. In the input + function for the <type>json</type> type, Unicode escapes are allowed + regardless of the database encoding, and are checked only for syntactic + correctness (that is, that four hex digits follow <literal>\u</literal>). + However, the input function for <type>jsonb</type> is stricter: it disallows + Unicode escapes for characters that cannot be represented in the database + encoding. The <type>jsonb</type> type also + rejects <literal>\u0000</literal> (because that cannot be represented in + <productname>PostgreSQL</productname>'s <type>text</type> type), and it insists + that any use of Unicode surrogate pairs to designate characters outside + the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes + are converted to the equivalent single character for storage; + this includes folding surrogate pairs into a single character. + </para> + + <note> + <para> + Many of the JSON processing functions described + in <xref linkend="functions-json"/> will convert Unicode escapes to + regular characters, and will therefore throw the same types of errors + just described even if their input is of type <type>json</type> + not <type>jsonb</type>. The fact that the <type>json</type> input function does + not make these checks may be considered a historical artifact, although + it does allow for simple storage (without processing) of JSON Unicode + escapes in a database encoding that does not support the represented + characters. + </para> + </note> + + <para> + When converting textual JSON input into <type>jsonb</type>, the primitive + types described by <acronym>RFC</acronym> 7159 are effectively mapped onto + native <productname>PostgreSQL</productname> types, as shown + in <xref linkend="json-type-mapping-table"/>. + Therefore, there are some minor additional constraints on what + constitutes valid <type>jsonb</type> data that do not apply to + the <type>json</type> type, nor to JSON in the abstract, corresponding + to limits on what can be represented by the underlying data type. + Notably, <type>jsonb</type> will reject numbers that are outside the + range of the <productname>PostgreSQL</productname> <type>numeric</type> data + type, while <type>json</type> will not. Such implementation-defined + restrictions are permitted by <acronym>RFC</acronym> 7159. However, in + practice such problems are far more likely to occur in other + implementations, as it is common to represent JSON's <type>number</type> + primitive type as IEEE 754 double precision floating point + (which <acronym>RFC</acronym> 7159 explicitly anticipates and allows for). + When using JSON as an interchange format with such systems, the danger + of losing numeric precision compared to data originally stored + by <productname>PostgreSQL</productname> should be considered. + </para> + + <para> + Conversely, as noted in the table there are some minor restrictions on + the input format of JSON primitive types that do not apply to + the corresponding <productname>PostgreSQL</productname> types. + </para> + + <table id="json-type-mapping-table"> + <title>JSON Primitive Types and Corresponding <productname>PostgreSQL</productname> Types</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="1*"/> + <colspec colname="col3" colwidth="2*"/> + <thead> + <row> + <entry>JSON primitive type</entry> + <entry><productname>PostgreSQL</productname> type</entry> + <entry>Notes</entry> + </row> + </thead> + <tbody> + <row> + <entry><type>string</type></entry> + <entry><type>text</type></entry> + <entry><literal>\u0000</literal> is disallowed, as are Unicode escapes + representing characters not available in the database encoding</entry> + </row> + <row> + <entry><type>number</type></entry> + <entry><type>numeric</type></entry> + <entry><literal>NaN</literal> and <literal>infinity</literal> values are disallowed</entry> + </row> + <row> + <entry><type>boolean</type></entry> + <entry><type>boolean</type></entry> + <entry>Only lowercase <literal>true</literal> and <literal>false</literal> spellings are accepted</entry> + </row> + <row> + <entry><type>null</type></entry> + <entry>(none)</entry> + <entry>SQL <literal>NULL</literal> is a different concept</entry> + </row> + </tbody> + </tgroup> + </table> + + <sect2 id="json-keys-elements"> + <title>JSON Input and Output Syntax</title> + <para> + The input/output syntax for the JSON data types is as specified in + <acronym>RFC</acronym> 7159. + </para> + <para> + The following are all valid <type>json</type> (or <type>jsonb</type>) expressions: +<programlisting> +-- Simple scalar/primitive value +-- Primitive values can be numbers, quoted strings, true, false, or null +SELECT '5'::json; + +-- Array of zero or more elements (elements need not be of same type) +SELECT '[1, 2, "foo", null]'::json; + +-- Object containing pairs of keys and values +-- Note that object keys must always be quoted strings +SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json; + +-- Arrays and objects can be nested arbitrarily +SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json; +</programlisting> + </para> + + <para> + As previously stated, when a JSON value is input and then printed without + any additional processing, <type>json</type> outputs the same text that was + input, while <type>jsonb</type> does not preserve semantically-insignificant + details such as whitespace. For example, note the differences here: +<programlisting> +SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json; + json +------------------------------------------------- + {"bar": "baz", "balance": 7.77, "active":false} +(1 row) + +SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb; + jsonb +-------------------------------------------------- + {"bar": "baz", "active": false, "balance": 7.77} +(1 row) +</programlisting> + One semantically-insignificant detail worth noting is that + in <type>jsonb</type>, numbers will be printed according to the behavior of the + underlying <type>numeric</type> type. In practice this means that numbers + entered with <literal>E</literal> notation will be printed without it, for + example: +<programlisting> +SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb; + json | jsonb +-----------------------+------------------------- + {"reading": 1.230e-5} | {"reading": 0.00001230} +(1 row) +</programlisting> + However, <type>jsonb</type> will preserve trailing fractional zeroes, as seen + in this example, even though those are semantically insignificant for + purposes such as equality checks. + </para> + + <para> + For the list of built-in functions and operators available for + constructing and processing JSON values, see <xref linkend="functions-json"/>. + </para> + </sect2> + + <sect2 id="json-doc-design"> + <title>Designing JSON Documents</title> + <para> + Representing data as JSON can be considerably more flexible than + the traditional relational data model, which is compelling in + environments where requirements are fluid. It is quite possible + for both approaches to co-exist and complement each other within + the same application. However, even for applications where maximal + flexibility is desired, it is still recommended that JSON documents + have a somewhat fixed structure. The structure is typically + unenforced (though enforcing some business rules declaratively is + possible), but having a predictable structure makes it easier to write + queries that usefully summarize a set of <quote>documents</quote> (datums) + in a table. + </para> + <para> + JSON data is subject to the same concurrency-control + considerations as any other data type when stored in a table. + Although storing large documents is practicable, keep in mind that + any update acquires a row-level lock on the whole row. + Consider limiting JSON documents to a + manageable size in order to decrease lock contention among updating + transactions. Ideally, JSON documents should each + represent an atomic datum that business rules dictate cannot + reasonably be further subdivided into smaller datums that + could be modified independently. + </para> + </sect2> + + <sect2 id="json-containment"> + <title><type>jsonb</type> Containment and Existence</title> + <indexterm> + <primary>jsonb</primary> + <secondary>containment</secondary> + </indexterm> + <indexterm> + <primary>jsonb</primary> + <secondary>existence</secondary> + </indexterm> + <para> + Testing <firstterm>containment</firstterm> is an important capability of + <type>jsonb</type>. There is no parallel set of facilities for the + <type>json</type> type. Containment tests whether + one <type>jsonb</type> document has contained within it another one. + These examples return true except as noted: + </para> +<programlisting> +-- Simple scalar/primitive values contain only the identical value: +SELECT '"foo"'::jsonb @> '"foo"'::jsonb; + +-- The array on the right side is contained within the one on the left: +SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb; + +-- Order of array elements is not significant, so this is also true: +SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb; + +-- Duplicate array elements don't matter either: +SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb; + +-- The object with a single pair on the right side is contained +-- within the object on the left side: +SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @> '{"version": 9.4}'::jsonb; + +-- The array on the right side is <emphasis>not</emphasis> considered contained within the +-- array on the left, even though a similar array is nested within it: +SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false + +-- But with a layer of nesting, it is contained: +SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb; + +-- Similarly, containment is not reported here: +SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- yields false + +-- A top-level key and an empty object is contained: +SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb; +</programlisting> + + <para> + The general principle is that the contained object must match the + containing object as to structure and data contents, possibly after + discarding some non-matching array elements or object key/value pairs + from the containing object. + But remember that the order of array elements is not significant when + doing a containment match, and duplicate array elements are effectively + considered only once. + </para> + + <para> + As a special exception to the general principle that the structures + must match, an array may contain a primitive value: + </para> +<programlisting> +-- This array contains the primitive string value: +SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb; + +-- This exception is not reciprocal -- non-containment is reported here: +SELECT '"bar"'::jsonb @> '["bar"]'::jsonb; -- yields false +</programlisting> + + <para> + <type>jsonb</type> also has an <firstterm>existence</firstterm> operator, which is + a variation on the theme of containment: it tests whether a string + (given as a <type>text</type> value) appears as an object key or array + element at the top level of the <type>jsonb</type> value. + These examples return true except as noted: + </para> +<programlisting> +-- String exists as array element: +SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar'; + +-- String exists as object key: +SELECT '{"foo": "bar"}'::jsonb ? 'foo'; + +-- Object values are not considered: +SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false + +-- As with containment, existence must match at the top level: +SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false + +-- A string is considered to exist if it matches a primitive JSON string: +SELECT '"foo"'::jsonb ? 'foo'; +</programlisting> + + <para> + JSON objects are better suited than arrays for testing containment or + existence when there are many keys or elements involved, because + unlike arrays they are internally optimized for searching, and do not + need to be searched linearly. + </para> + + <tip> + <para> + Because JSON containment is nested, an appropriate query can skip + explicit selection of sub-objects. As an example, suppose that we have + a <structfield>doc</structfield> column containing objects at the top level, with + most objects containing <literal>tags</literal> fields that contain arrays of + sub-objects. This query finds entries in which sub-objects containing + both <literal>"term":"paris"</literal> and <literal>"term":"food"</literal> appear, + while ignoring any such keys outside the <literal>tags</literal> array: +<programlisting> +SELECT doc->'site_name' FROM websites + WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}'; +</programlisting> + One could accomplish the same thing with, say, +<programlisting> +SELECT doc->'site_name' FROM websites + WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]'; +</programlisting> + but that approach is less flexible, and often less efficient as well. + </para> + + <para> + On the other hand, the JSON existence operator is not nested: it will + only look for the specified key or array element at top level of the + JSON value. + </para> + </tip> + + <para> + The various containment and existence operators, along with all other + JSON operators and functions are documented + in <xref linkend="functions-json"/>. + </para> + </sect2> + + <sect2 id="json-indexing"> + <title><type>jsonb</type> Indexing</title> + <indexterm> + <primary>jsonb</primary> + <secondary>indexes on</secondary> + </indexterm> + + <para> + GIN indexes can be used to efficiently search for + keys or key/value pairs occurring within a large number of + <type>jsonb</type> documents (datums). + Two GIN <quote>operator classes</quote> are provided, offering different + performance and flexibility trade-offs. + </para> + <para> + The default GIN operator class for <type>jsonb</type> supports queries with + the key-exists operators <literal>?</literal>, <literal>?|</literal> + and <literal>?&</literal>, the containment operator + <literal>@></literal>, and the <type>jsonpath</type> match + operators <literal>@?</literal> and <literal>@@</literal>. + (For details of the semantics that these operators + implement, see <xref linkend="functions-jsonb-op-table"/>.) + An example of creating an index with this operator class is: +<programlisting> +CREATE INDEX idxgin ON api USING GIN (jdoc); +</programlisting> + The non-default GIN operator class <literal>jsonb_path_ops</literal> + does not support the key-exists operators, but it does support + <literal>@></literal>, <literal>@?</literal> and <literal>@@</literal>. + An example of creating an index with this operator class is: +<programlisting> +CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops); +</programlisting> + </para> + + <para> + Consider the example of a table that stores JSON documents + retrieved from a third-party web service, with a documented schema + definition. A typical document is: +<programlisting> +{ + "guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a", + "name": "Angela Barton", + "is_active": true, + "company": "Magnafone", + "address": "178 Howard Place, Gulf, Washington, 702", + "registered": "2009-11-07T08:53:22 +08:00", + "latitude": 19.793713, + "longitude": 86.513373, + "tags": [ + "enim", + "aliquip", + "qui" + ] +} +</programlisting> + We store these documents in a table named <structname>api</structname>, + in a <type>jsonb</type> column named <structfield>jdoc</structfield>. + If a GIN index is created on this column, + queries like the following can make use of the index: +<programlisting> +-- Find documents in which the key "company" has value "Magnafone" +SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}'; +</programlisting> + However, the index could not be used for queries like the + following, because though the operator <literal>?</literal> is indexable, + it is not applied directly to the indexed column <structfield>jdoc</structfield>: +<programlisting> +-- Find documents in which the key "tags" contains key or array element "qui" +SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui'; +</programlisting> + Still, with appropriate use of expression indexes, the above + query can use an index. If querying for particular items within + the <literal>"tags"</literal> key is common, defining an index like this + may be worthwhile: +<programlisting> +CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags')); +</programlisting> + Now, the <literal>WHERE</literal> clause <literal>jdoc -> 'tags' ? 'qui'</literal> + will be recognized as an application of the indexable + operator <literal>?</literal> to the indexed + expression <literal>jdoc -> 'tags'</literal>. + (More information on expression indexes can be found in <xref + linkend="indexes-expressional"/>.) + </para> + + <para> + Another approach to querying is to exploit containment, for example: +<programlisting> +-- Find documents in which the key "tags" contains array element "qui" +SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}'; +</programlisting> + A simple GIN index on the <structfield>jdoc</structfield> column can support this + query. But note that such an index will store copies of every key and + value in the <structfield>jdoc</structfield> column, whereas the expression index + of the previous example stores only data found under + the <literal>tags</literal> key. While the simple-index approach is far more + flexible (since it supports queries about any key), targeted expression + indexes are likely to be smaller and faster to search than a simple + index. + </para> + + <para> + GIN indexes also support the <literal>@?</literal> + and <literal>@@</literal> operators, which + perform <type>jsonpath</type> matching. Examples are +<programlisting> +SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @? '$.tags[*] ? (@ == "qui")'; +</programlisting> +<programlisting> +SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; +</programlisting> + For these operators, a GIN index extracts clauses of the form + <literal><replaceable>accessors_chain</replaceable> + = <replaceable>constant</replaceable></literal> out of + the <type>jsonpath</type> pattern, and does the index search based on + the keys and values mentioned in these clauses. The accessors chain + may include <literal>.<replaceable>key</replaceable></literal>, + <literal>[*]</literal>, + and <literal>[<replaceable>index</replaceable>]</literal> accessors. + The <literal>jsonb_ops</literal> operator class also + supports <literal>.*</literal> and <literal>.**</literal> accessors, + but the <literal>jsonb_path_ops</literal> operator class does not. + </para> + + <para> + Although the <literal>jsonb_path_ops</literal> operator class supports + only queries with the <literal>@></literal>, <literal>@?</literal> + and <literal>@@</literal> operators, it has notable + performance advantages over the default operator + class <literal>jsonb_ops</literal>. A <literal>jsonb_path_ops</literal> + index is usually much smaller than a <literal>jsonb_ops</literal> + index over the same data, and the specificity of searches is better, + particularly when queries contain keys that appear frequently in the + data. Therefore search operations typically perform better + than with the default operator class. + </para> + + <para> + The technical difference between a <literal>jsonb_ops</literal> + and a <literal>jsonb_path_ops</literal> GIN index is that the former + creates independent index items for each key and value in the data, + while the latter creates index items only for each value in the + data. + <footnote> + <para> + For this purpose, the term <quote>value</quote> includes array elements, + though JSON terminology sometimes considers array elements distinct + from values within objects. + </para> + </footnote> + Basically, each <literal>jsonb_path_ops</literal> index item is + a hash of the value and the key(s) leading to it; for example to index + <literal>{"foo": {"bar": "baz"}}</literal>, a single index item would + be created incorporating all three of <literal>foo</literal>, <literal>bar</literal>, + and <literal>baz</literal> into the hash value. Thus a containment query + looking for this structure would result in an extremely specific index + search; but there is no way at all to find out whether <literal>foo</literal> + appears as a key. On the other hand, a <literal>jsonb_ops</literal> + index would create three index items representing <literal>foo</literal>, + <literal>bar</literal>, and <literal>baz</literal> separately; then to do the + containment query, it would look for rows containing all three of + these items. While GIN indexes can perform such an AND search fairly + efficiently, it will still be less specific and slower than the + equivalent <literal>jsonb_path_ops</literal> search, especially if + there are a very large number of rows containing any single one of the + three index items. + </para> + + <para> + A disadvantage of the <literal>jsonb_path_ops</literal> approach is + that it produces no index entries for JSON structures not containing + any values, such as <literal>{"a": {}}</literal>. If a search for + documents containing such a structure is requested, it will require a + full-index scan, which is quite slow. <literal>jsonb_path_ops</literal> is + therefore ill-suited for applications that often perform such searches. + </para> + + <para> + <type>jsonb</type> also supports <literal>btree</literal> and <literal>hash</literal> + indexes. These are usually useful only if it's important to check + equality of complete JSON documents. + The <literal>btree</literal> ordering for <type>jsonb</type> datums is seldom + of great interest, but for completeness it is: +<synopsis> +<replaceable>Object</replaceable> > <replaceable>Array</replaceable> > <replaceable>Boolean</replaceable> > <replaceable>Number</replaceable> > <replaceable>String</replaceable> > <replaceable>Null</replaceable> + +<replaceable>Object with n pairs</replaceable> > <replaceable>object with n - 1 pairs</replaceable> + +<replaceable>Array with n elements</replaceable> > <replaceable>array with n - 1 elements</replaceable> +</synopsis> + Objects with equal numbers of pairs are compared in the order: +<synopsis> +<replaceable>key-1</replaceable>, <replaceable>value-1</replaceable>, <replaceable>key-2</replaceable> ... +</synopsis> + Note that object keys are compared in their storage order; + in particular, since shorter keys are stored before longer keys, this + can lead to results that might be unintuitive, such as: +<programlisting> +{ "aa": 1, "c": 1} > {"b": 1, "d": 1} +</programlisting> + Similarly, arrays with equal numbers of elements are compared in the + order: +<synopsis> +<replaceable>element-1</replaceable>, <replaceable>element-2</replaceable> ... +</synopsis> + Primitive JSON values are compared using the same + comparison rules as for the underlying + <productname>PostgreSQL</productname> data type. Strings are + compared using the default database collation. + </para> + </sect2> + + <sect2 id="jsonb-subscripting"> + <title><type>jsonb</type> Subscripting</title> + <para> + The <type>jsonb</type> data type supports array-style subscripting expressions + to extract and modify elements. Nested values can be indicated by chaining + subscripting expressions, following the same rules as the <literal>path</literal> + argument in the <literal>jsonb_set</literal> function. If a <type>jsonb</type> + value is an array, numeric subscripts start at zero, and negative integers count + backwards from the last element of the array. Slice expressions are not supported. + The result of a subscripting expression is always of the jsonb data type. + </para> + + <para> + <command>UPDATE</command> statements may use subscripting in the + <literal>SET</literal> clause to modify <type>jsonb</type> values. Subscript + paths must be traversable for all affected values insofar as they exist. For + instance, the path <literal>val['a']['b']['c']</literal> can be traversed all + the way to <literal>c</literal> if every <literal>val</literal>, + <literal>val['a']</literal>, and <literal>val['a']['b']</literal> is an + object. If any <literal>val['a']</literal> or <literal>val['a']['b']</literal> + is not defined, it will be created as an empty object and filled as + necessary. However, if any <literal>val</literal> itself or one of the + intermediary values is defined as a non-object such as a string, number, or + <literal>jsonb</literal> <literal>null</literal>, traversal cannot proceed so + an error is raised and the transaction aborted. + </para> + + <para> + An example of subscripting syntax: + +<programlisting> + +-- Extract object value by key +SELECT ('{"a": 1}'::jsonb)['a']; + +-- Extract nested object value by key path +SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c']; + +-- Extract array element by index +SELECT ('[1, "2", null]'::jsonb)[1]; + +-- Update object value by key. Note the quotes around '1': the assigned +-- value must be of the jsonb type as well +UPDATE table_name SET jsonb_field['key'] = '1'; + +-- This will raise an error if any record's jsonb_field['a']['b'] is something +-- other than an object. For example, the value {"a": 1} has a numeric value +-- of the key 'a'. +UPDATE table_name SET jsonb_field['a']['b']['c'] = '1'; + +-- Filter records using a WHERE clause with subscripting. Since the result of +-- subscripting is jsonb, the value we compare it against must also be jsonb. +-- The double quotes make "value" also a valid jsonb string. +SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"'; +</programlisting> + + <type>jsonb</type> assignment via subscripting handles a few edge cases + differently from <literal>jsonb_set</literal>. When a source <type>jsonb</type> + value is <literal>NULL</literal>, assignment via subscripting will proceed + as if it was an empty JSON value of the type (object or array) implied by the + subscript key: + +<programlisting> +-- Where jsonb_field was NULL, it is now {"a": 1} +UPDATE table_name SET jsonb_field['a'] = '1'; + +-- Where jsonb_field was NULL, it is now [1] +UPDATE table_name SET jsonb_field[0] = '1'; +</programlisting> + + If an index is specified for an array containing too few elements, + <literal>NULL</literal> elements will be appended until the index is reachable + and the value can be set. + +<programlisting> +-- Where jsonb_field was [], it is now [null, null, 2]; +-- where jsonb_field was [0], it is now [0, null, 2] +UPDATE table_name SET jsonb_field[2] = '2'; +</programlisting> + + A <type>jsonb</type> value will accept assignments to nonexistent subscript + paths as long as the last existing element to be traversed is an object or + array, as implied by the corresponding subscript (the element indicated by + the last subscript in the path is not traversed and may be anything). Nested + array and object structures will be created, and in the former case + <literal>null</literal>-padded, as specified by the subscript path until the + assigned value can be placed. + +<programlisting> +-- Where jsonb_field was {}, it is now {"a": [{"b": 1}]} +UPDATE table_name SET jsonb_field['a'][0]['b'] = '1'; + +-- Where jsonb_field was [], it is now [null, {"a": 1}] +UPDATE table_name SET jsonb_field[1]['a'] = '1'; +</programlisting> + + </para> + </sect2> + + <sect2> + <title>Transforms</title> + + <para> + Additional extensions are available that implement transforms for the + <type>jsonb</type> type for different procedural languages. + </para> + + <para> + The extensions for PL/Perl are called <literal>jsonb_plperl</literal> and + <literal>jsonb_plperlu</literal>. If you use them, <type>jsonb</type> + values are mapped to Perl arrays, hashes, and scalars, as appropriate. + </para> + + <para> + The extensions for PL/Python are called <literal>jsonb_plpythonu</literal>, + <literal>jsonb_plpython2u</literal>, and + <literal>jsonb_plpython3u</literal> (see <xref + linkend="plpython-python23"/> for the PL/Python naming convention). If you + use them, <type>jsonb</type> values are mapped to Python dictionaries, + lists, and scalars, as appropriate. + </para> + + <para> + Of these extensions, <literal>jsonb_plperl</literal> is + considered <quote>trusted</quote>, that is, it can be installed by + non-superusers who have <literal>CREATE</literal> privilege on the + current database. The rest require superuser privilege to install. + </para> + </sect2> + + <sect2 id="datatype-jsonpath"> + <title>jsonpath Type</title> + + <indexterm zone="datatype-jsonpath"> + <primary>jsonpath</primary> + </indexterm> + + <para> + The <type>jsonpath</type> type implements support for the SQL/JSON path language + in <productname>PostgreSQL</productname> to efficiently query JSON data. + It provides a binary representation of the parsed SQL/JSON path + expression that specifies the items to be retrieved by the path + engine from the JSON data for further processing with the + SQL/JSON query functions. + </para> + + <para> + The semantics of SQL/JSON path predicates and operators generally follow SQL. + At the same time, to provide a natural way of working with JSON data, + SQL/JSON path syntax uses some JavaScript conventions: + </para> + + <itemizedlist> + <listitem> + <para> + Dot (<literal>.</literal>) is used for member access. + </para> + </listitem> + <listitem> + <para> + Square brackets (<literal>[]</literal>) are used for array access. + </para> + </listitem> + <listitem> + <para> + SQL/JSON arrays are 0-relative, unlike regular SQL arrays that start from 1. + </para> + </listitem> + </itemizedlist> + + <para> + An SQL/JSON path expression is typically written in an SQL query as an + SQL character string literal, so it must be enclosed in single quotes, + and any single quotes desired within the value must be doubled + (see <xref linkend="sql-syntax-strings"/>). + Some forms of path expressions require string literals within them. + These embedded string literals follow JavaScript/ECMAScript conventions: + they must be surrounded by double quotes, and backslash escapes may be + used within them to represent otherwise-hard-to-type characters. + In particular, the way to write a double quote within an embedded string + literal is <literal>\"</literal>, and to write a backslash itself, you + must write <literal>\\</literal>. Other special backslash sequences + include those recognized in JSON strings: + <literal>\b</literal>, + <literal>\f</literal>, + <literal>\n</literal>, + <literal>\r</literal>, + <literal>\t</literal>, + <literal>\v</literal> + for various ASCII control characters, and + <literal>\u<replaceable>NNNN</replaceable></literal> for a Unicode + character identified by its 4-hex-digit code point. The backslash + syntax also includes two cases not allowed by JSON: + <literal>\x<replaceable>NN</replaceable></literal> for a character code + written with only two hex digits, and + <literal>\u{<replaceable>N...</replaceable>}</literal> for a character + code written with 1 to 6 hex digits. + </para> + + <para> + A path expression consists of a sequence of path elements, + which can be any of the following: + <itemizedlist> + <listitem> + <para> + Path literals of JSON primitive types: + Unicode text, numeric, true, false, or null. + </para> + </listitem> + <listitem> + <para> + Path variables listed in <xref linkend="type-jsonpath-variables"/>. + </para> + </listitem> + <listitem> + <para> + Accessor operators listed in <xref linkend="type-jsonpath-accessors"/>. + </para> + </listitem> + <listitem> + <para> + <type>jsonpath</type> operators and methods listed + in <xref linkend="functions-sqljson-path-operators"/>. + </para> + </listitem> + <listitem> + <para> + Parentheses, which can be used to provide filter expressions + or define the order of path evaluation. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + For details on using <type>jsonpath</type> expressions with SQL/JSON + query functions, see <xref linkend="functions-sqljson-path"/>. + </para> + + <table id="type-jsonpath-variables"> + <title><type>jsonpath</type> Variables</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="2*"/> + <thead> + <row> + <entry>Variable</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>$</literal></entry> + <entry>A variable representing the JSON value being queried + (the <firstterm>context item</firstterm>). + </entry> + </row> + <row> + <entry><literal>$varname</literal></entry> + <entry> + A named variable. Its value can be set by the parameter + <parameter>vars</parameter> of several JSON processing functions; + see <xref linkend="functions-json-processing-table"/> for details. + <!-- TODO: describe PASSING clause once implemented !--> + </entry> + </row> + <row> + <entry><literal>@</literal></entry> + <entry>A variable representing the result of path evaluation + in filter expressions. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <table id="type-jsonpath-accessors"> + <title><type>jsonpath</type> Accessors</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="2*"/> + <thead> + <row> + <entry>Accessor Operator</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry> + <para> + <literal>.<replaceable>key</replaceable></literal> + </para> + <para> + <literal>."$<replaceable>varname</replaceable>"</literal> + </para> + </entry> + <entry> + <para> + Member accessor that returns an object member with + the specified key. If the key name matches some named variable + starting with <literal>$</literal> or does not meet the + JavaScript rules for an identifier, it must be enclosed in + double quotes to make it a string literal. + </para> + </entry> + </row> + <row> + <entry> + <para> + <literal>.*</literal> + </para> + </entry> + <entry> + <para> + Wildcard member accessor that returns the values of all + members located at the top level of the current object. + </para> + </entry> + </row> + <row> + <entry> + <para> + <literal>.**</literal> + </para> + </entry> + <entry> + <para> + Recursive wildcard member accessor that processes all levels + of the JSON hierarchy of the current object and returns all + the member values, regardless of their nesting level. This + is a <productname>PostgreSQL</productname> extension of + the SQL/JSON standard. + </para> + </entry> + </row> + <row> + <entry> + <para> + <literal>.**{<replaceable>level</replaceable>}</literal> + </para> + <para> + <literal>.**{<replaceable>start_level</replaceable> to + <replaceable>end_level</replaceable>}</literal> + </para> + </entry> + <entry> + <para> + Like <literal>.**</literal>, but selects only the specified + levels of the JSON hierarchy. Nesting levels are specified as integers. + Level zero corresponds to the current object. To access the lowest + nesting level, you can use the <literal>last</literal> keyword. + This is a <productname>PostgreSQL</productname> extension of + the SQL/JSON standard. + </para> + </entry> + </row> + <row> + <entry> + <para> + <literal>[<replaceable>subscript</replaceable>, ...]</literal> + </para> + </entry> + <entry> + <para> + Array element accessor. + <literal><replaceable>subscript</replaceable></literal> can be + given in two forms: <literal><replaceable>index</replaceable></literal> + or <literal><replaceable>start_index</replaceable> to <replaceable>end_index</replaceable></literal>. + The first form returns a single array element by its index. The second + form returns an array slice by the range of indexes, including the + elements that correspond to the provided + <replaceable>start_index</replaceable> and <replaceable>end_index</replaceable>. + </para> + <para> + The specified <replaceable>index</replaceable> can be an integer, as + well as an expression returning a single numeric value, which is + automatically cast to integer. Index zero corresponds to the first + array element. You can also use the <literal>last</literal> keyword + to denote the last array element, which is useful for handling arrays + of unknown length. + </para> + </entry> + </row> + <row> + <entry> + <para> + <literal>[*]</literal> + </para> + </entry> + <entry> + <para> + Wildcard array element accessor that returns all array elements. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + </sect2> +</sect1> |