diff options
Diffstat (limited to '')
-rw-r--r-- | docs/basic/adapt.rst | 522 |
1 files changed, 522 insertions, 0 deletions
diff --git a/docs/basic/adapt.rst b/docs/basic/adapt.rst new file mode 100644 index 0000000..1538327 --- /dev/null +++ b/docs/basic/adapt.rst @@ -0,0 +1,522 @@ +.. currentmodule:: psycopg + +.. index:: + single: Adaptation + pair: Objects; Adaptation + single: Data types; Adaptation + +.. _types-adaptation: + +Adapting basic Python types +=========================== + +Many standard Python types are adapted into SQL and returned as Python +objects when a query is executed. + +Converting the following data types between Python and PostgreSQL works +out-of-the-box and doesn't require any configuration. In case you need to +customise the conversion you should take a look at :ref:`adaptation`. + + +.. index:: + pair: Boolean; Adaptation + +.. _adapt-bool: + +Booleans adaptation +------------------- + +Python `bool` values `!True` and `!False` are converted to the equivalent +`PostgreSQL boolean type`__:: + + >>> cur.execute("SELECT %s, %s", (True, False)) + # equivalent to "SELECT true, false" + +.. __: https://www.postgresql.org/docs/current/datatype-boolean.html + + +.. index:: + single: Adaptation; numbers + single: Integer; Adaptation + single: Float; Adaptation + single: Decimal; Adaptation + +.. _adapt-numbers: + +Numbers adaptation +------------------ + +.. seealso:: + + - `PostgreSQL numeric types + <https://www.postgresql.org/docs/current/static/datatype-numeric.html>`__ + +- Python `int` values can be converted to PostgreSQL :sql:`smallint`, + :sql:`integer`, :sql:`bigint`, or :sql:`numeric`, according to their numeric + value. Psycopg will choose the smallest data type available, because + PostgreSQL can automatically cast a type up (e.g. passing a `smallint` where + PostgreSQL expect an `integer` is gladly accepted) but will not cast down + automatically (e.g. if a function has an :sql:`integer` argument, passing it + a :sql:`bigint` value will fail, even if the value is 1). + +- Python `float` values are converted to PostgreSQL :sql:`float8`. + +- Python `~decimal.Decimal` values are converted to PostgreSQL :sql:`numeric`. + +On the way back, smaller types (:sql:`int2`, :sql:`int4`, :sql:`float4`) are +promoted to the larger Python counterpart. + +.. note:: + + Sometimes you may prefer to receive :sql:`numeric` data as `!float` + instead, for performance reason or ease of manipulation: you can configure + an adapter to :ref:`cast PostgreSQL numeric to Python float + <adapt-example-float>`. This of course may imply a loss of precision. + + +.. index:: + pair: Strings; Adaptation + single: Unicode; Adaptation + pair: Encoding; SQL_ASCII + +.. _adapt-string: + +Strings adaptation +------------------ + +.. seealso:: + + - `PostgreSQL character types + <https://www.postgresql.org/docs/current/datatype-character.html>`__ + +Python `str` are converted to PostgreSQL string syntax, and PostgreSQL types +such as :sql:`text` and :sql:`varchar` are converted back to Python `!str`: + +.. code:: python + + conn = psycopg.connect() + conn.execute( + "INSERT INTO menu (id, entry) VALUES (%s, %s)", + (1, "Crème Brûlée at 4.99€")) + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + +PostgreSQL databases `have an encoding`__, and `the session has an encoding`__ +too, exposed in the `!Connection.info.`\ `~ConnectionInfo.encoding` +attribute. If your database and connection are in UTF-8 encoding you will +likely have no problem, otherwise you will have to make sure that your +application only deals with the non-ASCII chars that the database can handle; +failing to do so may result in encoding/decoding errors: + +.. __: https://www.postgresql.org/docs/current/sql-createdatabase.html +.. __: https://www.postgresql.org/docs/current/multibyte.html + +.. code:: python + + # The encoding is set at connection time according to the db configuration + conn.info.encoding + 'utf-8' + + # The Latin-9 encoding can manage some European accented letters + # and the Euro symbol + conn.execute("SET client_encoding TO LATIN9") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + + # The Latin-1 encoding doesn't have a representation for the Euro symbol + conn.execute("SET client_encoding TO LATIN1") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + # Traceback (most recent call last) + # ... + # UntranslatableCharacter: character with byte sequence 0xe2 0x82 0xac + # in encoding "UTF8" has no equivalent in encoding "LATIN1" + +In rare cases you may have strings with unexpected encodings in the database. +Using the ``SQL_ASCII`` client encoding will disable decoding of the data +coming from the database, which will be returned as `bytes`: + +.. code:: python + + conn.execute("SET client_encoding TO SQL_ASCII") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + b'Cr\xc3\xa8me Br\xc3\xbbl\xc3\xa9e at 4.99\xe2\x82\xac' + +Alternatively you can cast the unknown encoding data to :sql:`bytea` to +retrieve it as bytes, leaving other strings unaltered: see :ref:`adapt-binary` + +Note that PostgreSQL text cannot contain the ``0x00`` byte. If you need to +store Python strings that may contain binary zeros you should use a +:sql:`bytea` field. + + +.. index:: + single: bytea; Adaptation + single: bytes; Adaptation + single: bytearray; Adaptation + single: memoryview; Adaptation + single: Binary string + +.. _adapt-binary: + +Binary adaptation +----------------- + +Python types representing binary objects (`bytes`, `bytearray`, `memoryview`) +are converted by default to :sql:`bytea` fields. By default data received is +returned as `!bytes`. + +If you are storing large binary data in bytea fields (such as binary documents +or images) you should probably use the binary format to pass and return +values, otherwise binary data will undergo `ASCII escaping`__, taking some CPU +time and more bandwidth. See :ref:`binary-data` for details. + +.. __: https://www.postgresql.org/docs/current/datatype-binary.html + + +.. _adapt-date: + +Date/time types adaptation +-------------------------- + +.. seealso:: + + - `PostgreSQL date/time types + <https://www.postgresql.org/docs/current/datatype-datetime.html>`__ + +- Python `~datetime.date` objects are converted to PostgreSQL :sql:`date`. +- Python `~datetime.datetime` objects are converted to PostgreSQL + :sql:`timestamp` (if they don't have a `!tzinfo` set) or :sql:`timestamptz` + (if they do). +- Python `~datetime.time` objects are converted to PostgreSQL :sql:`time` + (if they don't have a `!tzinfo` set) or :sql:`timetz` (if they do). +- Python `~datetime.timedelta` objects are converted to PostgreSQL + :sql:`interval`. + +PostgreSQL :sql:`timestamptz` values are returned with a timezone set to the +`connection TimeZone setting`__, which is available as a Python +`~zoneinfo.ZoneInfo` object in the `!Connection.info`.\ `~ConnectionInfo.timezone` +attribute:: + + >>> conn.info.timezone + zoneinfo.ZoneInfo(key='Europe/London') + + >>> conn.execute("select '2048-07-08 12:00'::timestamptz").fetchone()[0] + datetime.datetime(2048, 7, 8, 12, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/London')) + +.. note:: + PostgreSQL :sql:`timestamptz` doesn't store "a timestamp with a timezone + attached": it stores a timestamp always in UTC, which is converted, on + output, to the connection TimeZone setting:: + + >>> conn.execute("SET TIMEZONE to 'Europe/Rome'") # UTC+2 in summer + + >>> conn.execute("SELECT '2042-07-01 12:00Z'::timestamptz").fetchone()[0] # UTC input + datetime.datetime(2042, 7, 1, 14, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/Rome')) + + Check out the `PostgreSQL documentation about timezones`__ for all the + details. + + .. __: https://www.postgresql.org/docs/current/datatype-datetime.html + #DATATYPE-TIMEZONES + +.. __: https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-TIMEZONE + + +.. _adapt-json: + +JSON adaptation +--------------- + +Psycopg can map between Python objects and PostgreSQL `json/jsonb +types`__, allowing to customise the load and dump function used. + +.. __: https://www.postgresql.org/docs/current/datatype-json.html + +Because several Python objects could be considered JSON (dicts, lists, +scalars, even date/time if using a dumps function customised to use them), +Psycopg requires you to wrap the object to dump as JSON into a wrapper: +either `psycopg.types.json.Json` or `~psycopg.types.json.Jsonb`. + +.. code:: python + + from psycopg.types.json import Jsonb + + thing = {"foo": ["bar", 42]} + conn.execute("INSERT INTO mytable VALUES (%s)", [Jsonb(thing)]) + +By default Psycopg uses the standard library `json.dumps` and `json.loads` +functions to serialize and de-serialize Python objects to JSON. If you want to +customise how serialization happens, for instance changing serialization +parameters or using a different JSON library, you can specify your own +functions using the `psycopg.types.json.set_json_dumps()` and +`~psycopg.types.json.set_json_loads()` functions, to apply either globally or +to a specific context (connection or cursor). + +.. code:: python + + from functools import partial + from psycopg.types.json import Jsonb, set_json_dumps, set_json_loads + import ujson + + # Use a faster dump function + set_json_dumps(ujson.dumps) + + # Return floating point values as Decimal, just in one connection + set_json_loads(partial(json.loads, parse_float=Decimal), conn) + + conn.execute("SELECT %s", [Jsonb({"value": 123.45})]).fetchone()[0] + # {'value': Decimal('123.45')} + +If you need an even more specific dump customisation only for certain objects +(including different configurations in the same query) you can specify a +`!dumps` parameter in the +`~psycopg.types.json.Json`/`~psycopg.types.json.Jsonb` wrapper, which will +take precedence over what is specified by `!set_json_dumps()`. + +.. code:: python + + from uuid import UUID, uuid4 + + class UUIDEncoder(json.JSONEncoder): + """A JSON encoder which can dump UUID.""" + def default(self, obj): + if isinstance(obj, UUID): + return str(obj) + return json.JSONEncoder.default(self, obj) + + uuid_dumps = partial(json.dumps, cls=UUIDEncoder) + obj = {"uuid": uuid4()} + cnn.execute("INSERT INTO objs VALUES %s", [Json(obj, dumps=uuid_dumps)]) + # will insert: {'uuid': '0a40799d-3980-4c65-8315-2956b18ab0e1'} + + +.. _adapt-list: + +Lists adaptation +---------------- + +Python `list` objects are adapted to `PostgreSQL arrays`__ and back. Only +lists containing objects of the same type can be dumped to PostgreSQL (but the +list may contain `!None` elements). + +.. __: https://www.postgresql.org/docs/current/arrays.html + +.. note:: + + If you have a list of values which you want to use with the :sql:`IN` + operator... don't. It won't work (neither with a list nor with a tuple):: + + >>> conn.execute("SELECT * FROM mytable WHERE id IN %s", [[10,20,30]]) + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + psycopg.errors.SyntaxError: syntax error at or near "$1" + LINE 1: SELECT * FROM mytable WHERE id IN $1 + ^ + + What you want to do instead is to use the `'= ANY()' expression`__ and pass + the values as a list (not a tuple). + + >>> conn.execute("SELECT * FROM mytable WHERE id = ANY(%s)", [[10,20,30]]) + + This has also the advantage of working with an empty list, whereas ``IN + ()`` is not valid SQL. + + .. __: https://www.postgresql.org/docs/current/functions-comparisons.html + #id-1.5.8.30.16 + + +.. _adapt-uuid: + +UUID adaptation +--------------- + +Python `uuid.UUID` objects are adapted to PostgreSQL `UUID type`__ and back:: + + >>> conn.execute("select gen_random_uuid()").fetchone()[0] + UUID('97f0dd62-3bd2-459e-89b8-a5e36ea3c16c') + + >>> from uuid import uuid4 + >>> conn.execute("select gen_random_uuid() = %s", [uuid4()]).fetchone()[0] + False # long shot + +.. __: https://www.postgresql.org/docs/current/datatype-uuid.html + + +.. _adapt-network: + +Network data types adaptation +----------------------------- + +Objects from the `ipaddress` module are converted to PostgreSQL `network +address types`__: + +- `~ipaddress.IPv4Address`, `~ipaddress.IPv4Interface` objects are converted + to the PostgreSQL :sql:`inet` type. On the way back, :sql:`inet` values + indicating a single address are converted to `!IPv4Address`, otherwise they + are converted to `!IPv4Interface` + +- `~ipaddress.IPv4Network` objects are converted to the :sql:`cidr` type and + back. + +- `~ipaddress.IPv6Address`, `~ipaddress.IPv6Interface`, + `~ipaddress.IPv6Network` objects follow the same rules, with IPv6 + :sql:`inet` and :sql:`cidr` values. + +.. __: https://www.postgresql.org/docs/current/datatype-net-types.html#DATATYPE-CIDR + +.. code:: python + + >>> conn.execute("select '192.168.0.1'::inet, '192.168.0.1/24'::inet").fetchone() + (IPv4Address('192.168.0.1'), IPv4Interface('192.168.0.1/24')) + + >>> conn.execute("select '::ffff:1.2.3.0/120'::cidr").fetchone()[0] + IPv6Network('::ffff:102:300/120') + + +.. _adapt-enum: + +Enum adaptation +--------------- + +.. versionadded:: 3.1 + +Psycopg can adapt Python `~enum.Enum` subclasses into PostgreSQL enum types +(created with the |CREATE TYPE AS ENUM|_ command). + +.. |CREATE TYPE AS ENUM| replace:: :sql:`CREATE TYPE ... AS ENUM (...)` +.. _CREATE TYPE AS ENUM: https://www.postgresql.org/docs/current/static/datatype-enum.html + +In order to set up a bidirectional enum mapping, you should get information +about the PostgreSQL enum using the `~types.enum.EnumInfo` class and +register it using `~types.enum.register_enum()`. The behaviour of unregistered +and registered enums is different. + +- If the enum is not registered with `register_enum()`: + + - Pure `!Enum` classes are dumped as normal strings, using their member + names as value. The unknown oid is used, so PostgreSQL should be able to + use this string in most contexts (such as an enum or a text field). + + .. versionchanged:: 3.1 + In previous version dumping pure enums is not supported and raise a + "cannot adapt" error. + + - Mix-in enums are dumped according to their mix-in type (because a `class + MyIntEnum(int, Enum)` is more specifically an `!int` than an `!Enum`, so + it's dumped by default according to `!int` rules). + + - PostgreSQL enums are loaded as Python strings. If you want to load arrays + of such enums you will have to find their OIDs using `types.TypeInfo.fetch()` + and register them using `~types.TypeInfo.register()`. + +- If the enum is registered (using `~types.enum.EnumInfo`\ `!.fetch()` and + `~types.enum.register_enum()`): + + - Enums classes, both pure and mixed-in, are dumped by name. + + - The registered PostgreSQL enum is loaded back as the registered Python + enum members. + +.. autoclass:: psycopg.types.enum.EnumInfo + + `!EnumInfo` is a subclass of `~psycopg.types.TypeInfo`: refer to the + latter's documentation for generic usage, especially the + `~psycopg.types.TypeInfo.fetch()` method. + + .. attribute:: labels + + After `~psycopg.types.TypeInfo.fetch()`, it contains the labels defined + in the PostgreSQL enum type. + + .. attribute:: enum + + After `register_enum()` is called, it will contain the Python type + mapping to the registered enum. + +.. autofunction:: psycopg.types.enum.register_enum + + After registering, fetching data of the registered enum will cast + PostgreSQL enum labels into corresponding Python enum members. + + If no `!enum` is specified, a new `Enum` is created based on + PostgreSQL enum labels. + +Example:: + + >>> from enum import Enum, auto + >>> from psycopg.types.enum import EnumInfo, register_enum + + >>> class UserRole(Enum): + ... ADMIN = auto() + ... EDITOR = auto() + ... GUEST = auto() + + >>> conn.execute("CREATE TYPE user_role AS ENUM ('ADMIN', 'EDITOR', 'GUEST')") + + >>> info = EnumInfo.fetch(conn, "user_role") + >>> register_enum(info, conn, UserRole) + + >>> some_editor = info.enum.EDITOR + >>> some_editor + <UserRole.EDITOR: 2> + + >>> conn.execute( + ... "SELECT pg_typeof(%(editor)s), %(editor)s", + ... {"editor": some_editor} + ... ).fetchone() + ('user_role', <UserRole.EDITOR: 2>) + + >>> conn.execute( + ... "SELECT ARRAY[%s, %s]", + ... [UserRole.ADMIN, UserRole.GUEST] + ... ).fetchone() + [<UserRole.ADMIN: 1>, <UserRole.GUEST: 3>] + +If the Python and the PostgreSQL enum don't match 1:1 (for instance if members +have a different name, or if more than one Python enum should map to the same +PostgreSQL enum, or vice versa), you can specify the exceptions using the +`!mapping` parameter. + +`!mapping` should be a dictionary with Python enum members as keys and the +matching PostgreSQL enum labels as values, or a list of `(member, label)` +pairs with the same meaning (useful when some members are repeated). Order +matters: if an element on either side is specified more than once, the last +pair in the sequence will take precedence:: + + # Legacy roles, defined in medieval times. + >>> conn.execute( + ... "CREATE TYPE abbey_role AS ENUM ('ABBOT', 'SCRIBE', 'MONK', 'GUEST')") + + >>> info = EnumInfo.fetch(conn, "abbey_role") + >>> register_enum(info, conn, UserRole, mapping=[ + ... (UserRole.ADMIN, "ABBOT"), + ... (UserRole.EDITOR, "SCRIBE"), + ... (UserRole.EDITOR, "MONK")]) + + >>> conn.execute("SELECT '{ABBOT,SCRIBE,MONK,GUEST}'::abbey_role[]").fetchone()[0] + [<UserRole.ADMIN: 1>, + <UserRole.EDITOR: 2>, + <UserRole.EDITOR: 2>, + <UserRole.GUEST: 3>] + + >>> conn.execute("SELECT %s::text[]", [list(UserRole)]).fetchone()[0] + ['ABBOT', 'MONK', 'GUEST'] + +A particularly useful case is when the PostgreSQL labels match the *values* of +a `!str`\-based Enum. In this case it is possible to use something like ``{m: +m.value for m in enum}`` as mapping:: + + >>> class LowercaseRole(str, Enum): + ... ADMIN = "admin" + ... EDITOR = "editor" + ... GUEST = "guest" + + >>> conn.execute( + ... "CREATE TYPE lowercase_role AS ENUM ('admin', 'editor', 'guest')") + + >>> info = EnumInfo.fetch(conn, "lowercase_role") + >>> register_enum( + ... info, conn, LowercaseRole, mapping={m: m.value for m in LowercaseRole}) + + >>> conn.execute("SELECT 'editor'::lowercase_role").fetchone()[0] + <LowercaseRole.EDITOR: 'editor'> |