diff options
Diffstat (limited to 'docs/basic')
-rw-r--r-- | docs/basic/adapt.rst | 522 | ||||
-rw-r--r-- | docs/basic/copy.rst | 212 | ||||
-rw-r--r-- | docs/basic/from_pg2.rst | 359 | ||||
-rw-r--r-- | docs/basic/index.rst | 26 | ||||
-rw-r--r-- | docs/basic/install.rst | 172 | ||||
-rw-r--r-- | docs/basic/params.rst | 242 | ||||
-rw-r--r-- | docs/basic/pgtypes.rst | 389 | ||||
-rw-r--r-- | docs/basic/transactions.rst | 388 | ||||
-rw-r--r-- | docs/basic/usage.rst | 232 |
9 files changed, 2542 insertions, 0 deletions
diff --git a/docs/basic/adapt.rst b/docs/basic/adapt.rst new file mode 100644 index 0000000..1538327 --- /dev/null +++ b/docs/basic/adapt.rst @@ -0,0 +1,522 @@ +.. currentmodule:: psycopg + +.. index:: + single: Adaptation + pair: Objects; Adaptation + single: Data types; Adaptation + +.. _types-adaptation: + +Adapting basic Python types +=========================== + +Many standard Python types are adapted into SQL and returned as Python +objects when a query is executed. + +Converting the following data types between Python and PostgreSQL works +out-of-the-box and doesn't require any configuration. In case you need to +customise the conversion you should take a look at :ref:`adaptation`. + + +.. index:: + pair: Boolean; Adaptation + +.. _adapt-bool: + +Booleans adaptation +------------------- + +Python `bool` values `!True` and `!False` are converted to the equivalent +`PostgreSQL boolean type`__:: + + >>> cur.execute("SELECT %s, %s", (True, False)) + # equivalent to "SELECT true, false" + +.. __: https://www.postgresql.org/docs/current/datatype-boolean.html + + +.. index:: + single: Adaptation; numbers + single: Integer; Adaptation + single: Float; Adaptation + single: Decimal; Adaptation + +.. _adapt-numbers: + +Numbers adaptation +------------------ + +.. seealso:: + + - `PostgreSQL numeric types + <https://www.postgresql.org/docs/current/static/datatype-numeric.html>`__ + +- Python `int` values can be converted to PostgreSQL :sql:`smallint`, + :sql:`integer`, :sql:`bigint`, or :sql:`numeric`, according to their numeric + value. Psycopg will choose the smallest data type available, because + PostgreSQL can automatically cast a type up (e.g. passing a `smallint` where + PostgreSQL expect an `integer` is gladly accepted) but will not cast down + automatically (e.g. if a function has an :sql:`integer` argument, passing it + a :sql:`bigint` value will fail, even if the value is 1). + +- Python `float` values are converted to PostgreSQL :sql:`float8`. + +- Python `~decimal.Decimal` values are converted to PostgreSQL :sql:`numeric`. + +On the way back, smaller types (:sql:`int2`, :sql:`int4`, :sql:`float4`) are +promoted to the larger Python counterpart. + +.. note:: + + Sometimes you may prefer to receive :sql:`numeric` data as `!float` + instead, for performance reason or ease of manipulation: you can configure + an adapter to :ref:`cast PostgreSQL numeric to Python float + <adapt-example-float>`. This of course may imply a loss of precision. + + +.. index:: + pair: Strings; Adaptation + single: Unicode; Adaptation + pair: Encoding; SQL_ASCII + +.. _adapt-string: + +Strings adaptation +------------------ + +.. seealso:: + + - `PostgreSQL character types + <https://www.postgresql.org/docs/current/datatype-character.html>`__ + +Python `str` are converted to PostgreSQL string syntax, and PostgreSQL types +such as :sql:`text` and :sql:`varchar` are converted back to Python `!str`: + +.. code:: python + + conn = psycopg.connect() + conn.execute( + "INSERT INTO menu (id, entry) VALUES (%s, %s)", + (1, "Crème Brûlée at 4.99€")) + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + +PostgreSQL databases `have an encoding`__, and `the session has an encoding`__ +too, exposed in the `!Connection.info.`\ `~ConnectionInfo.encoding` +attribute. If your database and connection are in UTF-8 encoding you will +likely have no problem, otherwise you will have to make sure that your +application only deals with the non-ASCII chars that the database can handle; +failing to do so may result in encoding/decoding errors: + +.. __: https://www.postgresql.org/docs/current/sql-createdatabase.html +.. __: https://www.postgresql.org/docs/current/multibyte.html + +.. code:: python + + # The encoding is set at connection time according to the db configuration + conn.info.encoding + 'utf-8' + + # The Latin-9 encoding can manage some European accented letters + # and the Euro symbol + conn.execute("SET client_encoding TO LATIN9") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + 'Crème Brûlée at 4.99€' + + # The Latin-1 encoding doesn't have a representation for the Euro symbol + conn.execute("SET client_encoding TO LATIN1") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + # Traceback (most recent call last) + # ... + # UntranslatableCharacter: character with byte sequence 0xe2 0x82 0xac + # in encoding "UTF8" has no equivalent in encoding "LATIN1" + +In rare cases you may have strings with unexpected encodings in the database. +Using the ``SQL_ASCII`` client encoding will disable decoding of the data +coming from the database, which will be returned as `bytes`: + +.. code:: python + + conn.execute("SET client_encoding TO SQL_ASCII") + conn.execute("SELECT entry FROM menu WHERE id = 1").fetchone()[0] + b'Cr\xc3\xa8me Br\xc3\xbbl\xc3\xa9e at 4.99\xe2\x82\xac' + +Alternatively you can cast the unknown encoding data to :sql:`bytea` to +retrieve it as bytes, leaving other strings unaltered: see :ref:`adapt-binary` + +Note that PostgreSQL text cannot contain the ``0x00`` byte. If you need to +store Python strings that may contain binary zeros you should use a +:sql:`bytea` field. + + +.. index:: + single: bytea; Adaptation + single: bytes; Adaptation + single: bytearray; Adaptation + single: memoryview; Adaptation + single: Binary string + +.. _adapt-binary: + +Binary adaptation +----------------- + +Python types representing binary objects (`bytes`, `bytearray`, `memoryview`) +are converted by default to :sql:`bytea` fields. By default data received is +returned as `!bytes`. + +If you are storing large binary data in bytea fields (such as binary documents +or images) you should probably use the binary format to pass and return +values, otherwise binary data will undergo `ASCII escaping`__, taking some CPU +time and more bandwidth. See :ref:`binary-data` for details. + +.. __: https://www.postgresql.org/docs/current/datatype-binary.html + + +.. _adapt-date: + +Date/time types adaptation +-------------------------- + +.. seealso:: + + - `PostgreSQL date/time types + <https://www.postgresql.org/docs/current/datatype-datetime.html>`__ + +- Python `~datetime.date` objects are converted to PostgreSQL :sql:`date`. +- Python `~datetime.datetime` objects are converted to PostgreSQL + :sql:`timestamp` (if they don't have a `!tzinfo` set) or :sql:`timestamptz` + (if they do). +- Python `~datetime.time` objects are converted to PostgreSQL :sql:`time` + (if they don't have a `!tzinfo` set) or :sql:`timetz` (if they do). +- Python `~datetime.timedelta` objects are converted to PostgreSQL + :sql:`interval`. + +PostgreSQL :sql:`timestamptz` values are returned with a timezone set to the +`connection TimeZone setting`__, which is available as a Python +`~zoneinfo.ZoneInfo` object in the `!Connection.info`.\ `~ConnectionInfo.timezone` +attribute:: + + >>> conn.info.timezone + zoneinfo.ZoneInfo(key='Europe/London') + + >>> conn.execute("select '2048-07-08 12:00'::timestamptz").fetchone()[0] + datetime.datetime(2048, 7, 8, 12, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/London')) + +.. note:: + PostgreSQL :sql:`timestamptz` doesn't store "a timestamp with a timezone + attached": it stores a timestamp always in UTC, which is converted, on + output, to the connection TimeZone setting:: + + >>> conn.execute("SET TIMEZONE to 'Europe/Rome'") # UTC+2 in summer + + >>> conn.execute("SELECT '2042-07-01 12:00Z'::timestamptz").fetchone()[0] # UTC input + datetime.datetime(2042, 7, 1, 14, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/Rome')) + + Check out the `PostgreSQL documentation about timezones`__ for all the + details. + + .. __: https://www.postgresql.org/docs/current/datatype-datetime.html + #DATATYPE-TIMEZONES + +.. __: https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-TIMEZONE + + +.. _adapt-json: + +JSON adaptation +--------------- + +Psycopg can map between Python objects and PostgreSQL `json/jsonb +types`__, allowing to customise the load and dump function used. + +.. __: https://www.postgresql.org/docs/current/datatype-json.html + +Because several Python objects could be considered JSON (dicts, lists, +scalars, even date/time if using a dumps function customised to use them), +Psycopg requires you to wrap the object to dump as JSON into a wrapper: +either `psycopg.types.json.Json` or `~psycopg.types.json.Jsonb`. + +.. code:: python + + from psycopg.types.json import Jsonb + + thing = {"foo": ["bar", 42]} + conn.execute("INSERT INTO mytable VALUES (%s)", [Jsonb(thing)]) + +By default Psycopg uses the standard library `json.dumps` and `json.loads` +functions to serialize and de-serialize Python objects to JSON. If you want to +customise how serialization happens, for instance changing serialization +parameters or using a different JSON library, you can specify your own +functions using the `psycopg.types.json.set_json_dumps()` and +`~psycopg.types.json.set_json_loads()` functions, to apply either globally or +to a specific context (connection or cursor). + +.. code:: python + + from functools import partial + from psycopg.types.json import Jsonb, set_json_dumps, set_json_loads + import ujson + + # Use a faster dump function + set_json_dumps(ujson.dumps) + + # Return floating point values as Decimal, just in one connection + set_json_loads(partial(json.loads, parse_float=Decimal), conn) + + conn.execute("SELECT %s", [Jsonb({"value": 123.45})]).fetchone()[0] + # {'value': Decimal('123.45')} + +If you need an even more specific dump customisation only for certain objects +(including different configurations in the same query) you can specify a +`!dumps` parameter in the +`~psycopg.types.json.Json`/`~psycopg.types.json.Jsonb` wrapper, which will +take precedence over what is specified by `!set_json_dumps()`. + +.. code:: python + + from uuid import UUID, uuid4 + + class UUIDEncoder(json.JSONEncoder): + """A JSON encoder which can dump UUID.""" + def default(self, obj): + if isinstance(obj, UUID): + return str(obj) + return json.JSONEncoder.default(self, obj) + + uuid_dumps = partial(json.dumps, cls=UUIDEncoder) + obj = {"uuid": uuid4()} + cnn.execute("INSERT INTO objs VALUES %s", [Json(obj, dumps=uuid_dumps)]) + # will insert: {'uuid': '0a40799d-3980-4c65-8315-2956b18ab0e1'} + + +.. _adapt-list: + +Lists adaptation +---------------- + +Python `list` objects are adapted to `PostgreSQL arrays`__ and back. Only +lists containing objects of the same type can be dumped to PostgreSQL (but the +list may contain `!None` elements). + +.. __: https://www.postgresql.org/docs/current/arrays.html + +.. note:: + + If you have a list of values which you want to use with the :sql:`IN` + operator... don't. It won't work (neither with a list nor with a tuple):: + + >>> conn.execute("SELECT * FROM mytable WHERE id IN %s", [[10,20,30]]) + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + psycopg.errors.SyntaxError: syntax error at or near "$1" + LINE 1: SELECT * FROM mytable WHERE id IN $1 + ^ + + What you want to do instead is to use the `'= ANY()' expression`__ and pass + the values as a list (not a tuple). + + >>> conn.execute("SELECT * FROM mytable WHERE id = ANY(%s)", [[10,20,30]]) + + This has also the advantage of working with an empty list, whereas ``IN + ()`` is not valid SQL. + + .. __: https://www.postgresql.org/docs/current/functions-comparisons.html + #id-1.5.8.30.16 + + +.. _adapt-uuid: + +UUID adaptation +--------------- + +Python `uuid.UUID` objects are adapted to PostgreSQL `UUID type`__ and back:: + + >>> conn.execute("select gen_random_uuid()").fetchone()[0] + UUID('97f0dd62-3bd2-459e-89b8-a5e36ea3c16c') + + >>> from uuid import uuid4 + >>> conn.execute("select gen_random_uuid() = %s", [uuid4()]).fetchone()[0] + False # long shot + +.. __: https://www.postgresql.org/docs/current/datatype-uuid.html + + +.. _adapt-network: + +Network data types adaptation +----------------------------- + +Objects from the `ipaddress` module are converted to PostgreSQL `network +address types`__: + +- `~ipaddress.IPv4Address`, `~ipaddress.IPv4Interface` objects are converted + to the PostgreSQL :sql:`inet` type. On the way back, :sql:`inet` values + indicating a single address are converted to `!IPv4Address`, otherwise they + are converted to `!IPv4Interface` + +- `~ipaddress.IPv4Network` objects are converted to the :sql:`cidr` type and + back. + +- `~ipaddress.IPv6Address`, `~ipaddress.IPv6Interface`, + `~ipaddress.IPv6Network` objects follow the same rules, with IPv6 + :sql:`inet` and :sql:`cidr` values. + +.. __: https://www.postgresql.org/docs/current/datatype-net-types.html#DATATYPE-CIDR + +.. code:: python + + >>> conn.execute("select '192.168.0.1'::inet, '192.168.0.1/24'::inet").fetchone() + (IPv4Address('192.168.0.1'), IPv4Interface('192.168.0.1/24')) + + >>> conn.execute("select '::ffff:1.2.3.0/120'::cidr").fetchone()[0] + IPv6Network('::ffff:102:300/120') + + +.. _adapt-enum: + +Enum adaptation +--------------- + +.. versionadded:: 3.1 + +Psycopg can adapt Python `~enum.Enum` subclasses into PostgreSQL enum types +(created with the |CREATE TYPE AS ENUM|_ command). + +.. |CREATE TYPE AS ENUM| replace:: :sql:`CREATE TYPE ... AS ENUM (...)` +.. _CREATE TYPE AS ENUM: https://www.postgresql.org/docs/current/static/datatype-enum.html + +In order to set up a bidirectional enum mapping, you should get information +about the PostgreSQL enum using the `~types.enum.EnumInfo` class and +register it using `~types.enum.register_enum()`. The behaviour of unregistered +and registered enums is different. + +- If the enum is not registered with `register_enum()`: + + - Pure `!Enum` classes are dumped as normal strings, using their member + names as value. The unknown oid is used, so PostgreSQL should be able to + use this string in most contexts (such as an enum or a text field). + + .. versionchanged:: 3.1 + In previous version dumping pure enums is not supported and raise a + "cannot adapt" error. + + - Mix-in enums are dumped according to their mix-in type (because a `class + MyIntEnum(int, Enum)` is more specifically an `!int` than an `!Enum`, so + it's dumped by default according to `!int` rules). + + - PostgreSQL enums are loaded as Python strings. If you want to load arrays + of such enums you will have to find their OIDs using `types.TypeInfo.fetch()` + and register them using `~types.TypeInfo.register()`. + +- If the enum is registered (using `~types.enum.EnumInfo`\ `!.fetch()` and + `~types.enum.register_enum()`): + + - Enums classes, both pure and mixed-in, are dumped by name. + + - The registered PostgreSQL enum is loaded back as the registered Python + enum members. + +.. autoclass:: psycopg.types.enum.EnumInfo + + `!EnumInfo` is a subclass of `~psycopg.types.TypeInfo`: refer to the + latter's documentation for generic usage, especially the + `~psycopg.types.TypeInfo.fetch()` method. + + .. attribute:: labels + + After `~psycopg.types.TypeInfo.fetch()`, it contains the labels defined + in the PostgreSQL enum type. + + .. attribute:: enum + + After `register_enum()` is called, it will contain the Python type + mapping to the registered enum. + +.. autofunction:: psycopg.types.enum.register_enum + + After registering, fetching data of the registered enum will cast + PostgreSQL enum labels into corresponding Python enum members. + + If no `!enum` is specified, a new `Enum` is created based on + PostgreSQL enum labels. + +Example:: + + >>> from enum import Enum, auto + >>> from psycopg.types.enum import EnumInfo, register_enum + + >>> class UserRole(Enum): + ... ADMIN = auto() + ... EDITOR = auto() + ... GUEST = auto() + + >>> conn.execute("CREATE TYPE user_role AS ENUM ('ADMIN', 'EDITOR', 'GUEST')") + + >>> info = EnumInfo.fetch(conn, "user_role") + >>> register_enum(info, conn, UserRole) + + >>> some_editor = info.enum.EDITOR + >>> some_editor + <UserRole.EDITOR: 2> + + >>> conn.execute( + ... "SELECT pg_typeof(%(editor)s), %(editor)s", + ... {"editor": some_editor} + ... ).fetchone() + ('user_role', <UserRole.EDITOR: 2>) + + >>> conn.execute( + ... "SELECT ARRAY[%s, %s]", + ... [UserRole.ADMIN, UserRole.GUEST] + ... ).fetchone() + [<UserRole.ADMIN: 1>, <UserRole.GUEST: 3>] + +If the Python and the PostgreSQL enum don't match 1:1 (for instance if members +have a different name, or if more than one Python enum should map to the same +PostgreSQL enum, or vice versa), you can specify the exceptions using the +`!mapping` parameter. + +`!mapping` should be a dictionary with Python enum members as keys and the +matching PostgreSQL enum labels as values, or a list of `(member, label)` +pairs with the same meaning (useful when some members are repeated). Order +matters: if an element on either side is specified more than once, the last +pair in the sequence will take precedence:: + + # Legacy roles, defined in medieval times. + >>> conn.execute( + ... "CREATE TYPE abbey_role AS ENUM ('ABBOT', 'SCRIBE', 'MONK', 'GUEST')") + + >>> info = EnumInfo.fetch(conn, "abbey_role") + >>> register_enum(info, conn, UserRole, mapping=[ + ... (UserRole.ADMIN, "ABBOT"), + ... (UserRole.EDITOR, "SCRIBE"), + ... (UserRole.EDITOR, "MONK")]) + + >>> conn.execute("SELECT '{ABBOT,SCRIBE,MONK,GUEST}'::abbey_role[]").fetchone()[0] + [<UserRole.ADMIN: 1>, + <UserRole.EDITOR: 2>, + <UserRole.EDITOR: 2>, + <UserRole.GUEST: 3>] + + >>> conn.execute("SELECT %s::text[]", [list(UserRole)]).fetchone()[0] + ['ABBOT', 'MONK', 'GUEST'] + +A particularly useful case is when the PostgreSQL labels match the *values* of +a `!str`\-based Enum. In this case it is possible to use something like ``{m: +m.value for m in enum}`` as mapping:: + + >>> class LowercaseRole(str, Enum): + ... ADMIN = "admin" + ... EDITOR = "editor" + ... GUEST = "guest" + + >>> conn.execute( + ... "CREATE TYPE lowercase_role AS ENUM ('admin', 'editor', 'guest')") + + >>> info = EnumInfo.fetch(conn, "lowercase_role") + >>> register_enum( + ... info, conn, LowercaseRole, mapping={m: m.value for m in LowercaseRole}) + + >>> conn.execute("SELECT 'editor'::lowercase_role").fetchone()[0] + <LowercaseRole.EDITOR: 'editor'> diff --git a/docs/basic/copy.rst b/docs/basic/copy.rst new file mode 100644 index 0000000..2bb4498 --- /dev/null +++ b/docs/basic/copy.rst @@ -0,0 +1,212 @@ +.. currentmodule:: psycopg + +.. index:: + pair: COPY; SQL command + +.. _copy: + +Using COPY TO and COPY FROM +=========================== + +Psycopg allows to operate with `PostgreSQL COPY protocol`__. :sql:`COPY` is +one of the most efficient ways to load data into the database (and to modify +it, with some SQL creativity). + +.. __: https://www.postgresql.org/docs/current/sql-copy.html + +Copy is supported using the `Cursor.copy()` method, passing it a query of the +form :sql:`COPY ... FROM STDIN` or :sql:`COPY ... TO STDOUT`, and managing the +resulting `Copy` object in a `!with` block: + +.. code:: python + + with cursor.copy("COPY table_name (col1, col2) FROM STDIN") as copy: + # pass data to the 'copy' object using write()/write_row() + +You can compose a COPY statement dynamically by using objects from the +`psycopg.sql` module: + +.. code:: python + + with cursor.copy( + sql.SQL("COPY {} TO STDOUT").format(sql.Identifier("table_name")) + ) as copy: + # read data from the 'copy' object using read()/read_row() + +.. versionchanged:: 3.1 + + You can also pass parameters to `!copy()`, like in `~Cursor.execute()`: + + .. code:: python + + with cur.copy("COPY (SELECT * FROM table_name LIMIT %s) TO STDOUT", (3,)) as copy: + # expect no more than three records + +The connection is subject to the usual transaction behaviour, so, unless the +connection is in autocommit, at the end of the COPY operation you will still +have to commit the pending changes and you can still roll them back. See +:ref:`transactions` for details. + + +.. _copy-in-row: + +Writing data row-by-row +----------------------- + +Using a copy operation you can load data into the database from any Python +iterable (a list of tuples, or any iterable of sequences): the Python values +are adapted as they would be in normal querying. To perform such operation use +a :sql:`COPY ... FROM STDIN` with `Cursor.copy()` and use `~Copy.write_row()` +on the resulting object in a `!with` block. On exiting the block the +operation will be concluded: + +.. code:: python + + records = [(10, 20, "hello"), (40, None, "world")] + + with cursor.copy("COPY sample (col1, col2, col3) FROM STDIN") as copy: + for record in records: + copy.write_row(record) + +If an exception is raised inside the block, the operation is interrupted and +the records inserted so far are discarded. + +In order to read or write from `!Copy` row-by-row you must not specify +:sql:`COPY` options such as :sql:`FORMAT CSV`, :sql:`DELIMITER`, :sql:`NULL`: +please leave these details alone, thank you :) + + +.. _copy-out-row: + +Reading data row-by-row +----------------------- + +You can also do the opposite, reading rows out of a :sql:`COPY ... TO STDOUT` +operation, by iterating on `~Copy.rows()`. However this is not something you +may want to do normally: usually the normal query process will be easier to +use. + +PostgreSQL, currently, doesn't give complete type information on :sql:`COPY +TO`, so the rows returned will have unparsed data, as strings or bytes, +according to the format. + +.. code:: python + + with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy: + for row in copy.rows(): + print(row) # return unparsed data: ('10', '2046-12-24') + +You can improve the results by using `~Copy.set_types()` before reading, but +you have to specify them yourself. + +.. code:: python + + with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy: + copy.set_types(["int4", "date"]) + for row in copy.rows(): + print(row) # (10, datetime.date(2046, 12, 24)) + + +.. _copy-block: + +Copying block-by-block +---------------------- + +If data is already formatted in a way suitable for copy (for instance because +it is coming from a file resulting from a previous `COPY TO` operation) it can +be loaded into the database using `Copy.write()` instead. + +.. code:: python + + with open("data", "r") as f: + with cursor.copy("COPY data FROM STDIN") as copy: + while data := f.read(BLOCK_SIZE): + copy.write(data) + +In this case you can use any :sql:`COPY` option and format, as long as the +input data is compatible with what the operation in `!copy()` expects. Data +can be passed as `!str`, if the copy is in :sql:`FORMAT TEXT`, or as `!bytes`, +which works with both :sql:`FORMAT TEXT` and :sql:`FORMAT BINARY`. + +In order to produce data in :sql:`COPY` format you can use a :sql:`COPY ... TO +STDOUT` statement and iterate over the resulting `Copy` object, which will +produce a stream of `!bytes` objects: + +.. code:: python + + with open("data.out", "wb") as f: + with cursor.copy("COPY table_name TO STDOUT") as copy: + for data in copy: + f.write(data) + + +.. _copy-binary: + +Binary copy +----------- + +Binary copy is supported by specifying :sql:`FORMAT BINARY` in the :sql:`COPY` +statement. In order to import binary data using `~Copy.write_row()`, all the +types passed to the database must have a binary dumper registered; this is not +necessary if the data is copied :ref:`block-by-block <copy-block>` using +`~Copy.write()`. + +.. warning:: + + PostgreSQL is particularly finicky when loading data in binary mode and + will apply **no cast rules**. This means, for example, that passing the + value 100 to an `integer` column **will fail**, because Psycopg will pass + it as a `smallint` value, and the server will reject it because its size + doesn't match what expected. + + You can work around the problem using the `~Copy.set_types()` method of + the `!Copy` object and specifying carefully the types to load. + +.. seealso:: See :ref:`binary-data` for further info about binary querying. + + +.. _copy-async: + +Asynchronous copy support +------------------------- + +Asynchronous operations are supported using the same patterns as above, using +the objects obtained by an `AsyncConnection`. For instance, if `!f` is an +object supporting an asynchronous `!read()` method returning :sql:`COPY` data, +a fully-async copy operation could be: + +.. code:: python + + async with cursor.copy("COPY data FROM STDIN") as copy: + while data := await f.read(): + await copy.write(data) + +The `AsyncCopy` object documentation describes the signature of the +asynchronous methods and the differences from its sync `Copy` counterpart. + +.. seealso:: See :ref:`async` for further info about using async objects. + + +Example: copying a table across servers +--------------------------------------- + +In order to copy a table, or a portion of a table, across servers, you can use +two COPY operations on two different connections, reading from the first and +writing to the second. + +.. code:: python + + with psycopg.connect(dsn_src) as conn1, psycopg.connect(dsn_tgt) as conn2: + with conn1.cursor().copy("COPY src TO STDOUT (FORMAT BINARY)") as copy1: + with conn2.cursor().copy("COPY tgt FROM STDIN (FORMAT BINARY)") as copy2: + for data in copy1: + copy2.write(data) + +Using :sql:`FORMAT BINARY` usually gives a performance boost, but it only +works if the source and target schema are *perfectly identical*. If the tables +are only *compatible* (for example, if you are copying an :sql:`integer` field +into a :sql:`bigint` destination field) you should omit the `BINARY` option and +perform a text-based copy. See :ref:`copy-binary` for details. + +The same pattern can be adapted to use :ref:`async objects <async>` in order +to perform an :ref:`async copy <copy-async>`. diff --git a/docs/basic/from_pg2.rst b/docs/basic/from_pg2.rst new file mode 100644 index 0000000..0692049 --- /dev/null +++ b/docs/basic/from_pg2.rst @@ -0,0 +1,359 @@ +.. index:: + pair: psycopg2; Differences + +.. currentmodule:: psycopg + +.. _from-psycopg2: + + +Differences from `!psycopg2` +============================ + +Psycopg 3 uses the common DBAPI structure of many other database adapters and +tries to behave as close as possible to `!psycopg2`. There are however a few +differences to be aware of. + +.. tip:: + Most of the times, the workarounds suggested here will work with both + Psycopg 2 and 3, which could be useful if you are porting a program or + writing a program that should work with both Psycopg 2 and 3. + + +.. _server-side-binding: + +Server-side binding +------------------- + +Psycopg 3 sends the query and the parameters to the server separately, instead +of merging them on the client side. Server-side binding works for normal +:sql:`SELECT` and data manipulation statements (:sql:`INSERT`, :sql:`UPDATE`, +:sql:`DELETE`), but it doesn't work with many other statements. For instance, +it doesn't work with :sql:`SET` or with :sql:`NOTIFY`:: + + >>> conn.execute("SET TimeZone TO %s", ["UTC"]) + Traceback (most recent call last): + ... + psycopg.errors.SyntaxError: syntax error at or near "$1" + LINE 1: SET TimeZone TO $1 + ^ + + >>> conn.execute("NOTIFY %s, %s", ["chan", 42]) + Traceback (most recent call last): + ... + psycopg.errors.SyntaxError: syntax error at or near "$1" + LINE 1: NOTIFY $1, $2 + ^ + +and with any data definition statement:: + + >>> conn.execute("CREATE TABLE foo (id int DEFAULT %s)", [42]) + Traceback (most recent call last): + ... + psycopg.errors.UndefinedParameter: there is no parameter $1 + LINE 1: CREATE TABLE foo (id int DEFAULT $1) + ^ + +Sometimes, PostgreSQL offers an alternative: for instance the `set_config()`__ +function can be used instead of the :sql:`SET` statement, the `pg_notify()`__ +function can be used instead of :sql:`NOTIFY`:: + + >>> conn.execute("SELECT set_config('TimeZone', %s, false)", ["UTC"]) + + >>> conn.execute("SELECT pg_notify(%s, %s)", ["chan", "42"]) + +.. __: https://www.postgresql.org/docs/current/functions-admin.html + #FUNCTIONS-ADMIN-SET + +.. __: https://www.postgresql.org/docs/current/sql-notify.html + #id-1.9.3.157.7.5 + +If this is not possible, you must merge the query and the parameter on the +client side. You can do so using the `psycopg.sql` objects:: + + >>> from psycopg import sql + + >>> cur.execute(sql.SQL("CREATE TABLE foo (id int DEFAULT {})").format(42)) + +or creating a :ref:`client-side binding cursor <client-side-binding-cursors>` +such as `ClientCursor`:: + + >>> cur = ClientCursor(conn) + >>> cur.execute("CREATE TABLE foo (id int DEFAULT %s)", [42]) + +If you need `!ClientCursor` often, you can set the `Connection.cursor_factory` +to have them created by default by `Connection.cursor()`. This way, Psycopg 3 +will behave largely the same way of Psycopg 2. + +Note that, both server-side and client-side, you can only specify **values** +as parameters (i.e. *the strings that go in single quotes*). If you need to +parametrize different parts of a statement (such as a table name), you must +use the `psycopg.sql` module:: + + >>> from psycopg import sql + + # This will quote the user and the password using the right quotes + # e.g.: ALTER USER "foo" SET PASSWORD 'bar' + >>> conn.execute( + ... sql.SQL("ALTER USER {} SET PASSWORD {}") + ... .format(sql.Identifier(username), password)) + + +.. _multi-statements: + +Multiple statements in the same query +------------------------------------- + +As a consequence of using :ref:`server-side bindings <server-side-binding>`, +when parameters are used, it is not possible to execute several statements in +the same `!execute()` call, separating them by semicolon:: + + >>> conn.execute( + ... "INSERT INTO foo VALUES (%s); INSERT INTO foo VALUES (%s)", + ... (10, 20)) + Traceback (most recent call last): + ... + psycopg.errors.SyntaxError: cannot insert multiple commands into a prepared statement + +One obvious way to work around the problem is to use several `!execute()` +calls. + +**There is no such limitation if no parameters are used**. As a consequence, you +can compose a multiple query on the client side and run them all in the same +`!execute()` call, using the `psycopg.sql` objects:: + + >>> from psycopg import sql + >>> conn.execute( + ... sql.SQL("INSERT INTO foo VALUES ({}); INSERT INTO foo values ({})" + ... .format(10, 20)) + +or a :ref:`client-side binding cursor <client-side-binding-cursors>`:: + + >>> cur = psycopg.ClientCursor(conn) + >>> cur.execute( + ... "INSERT INTO foo VALUES (%s); INSERT INTO foo VALUES (%s)", + ... (10, 20)) + +.. warning:: + + If a statements must be executed outside a transaction (such as + :sql:`CREATE DATABASE`), it cannot be executed in batch with other + statements, even if the connection is in autocommit mode:: + + >>> conn.autocommit = True + >>> conn.execute("CREATE DATABASE foo; SELECT 1") + Traceback (most recent call last): + ... + psycopg.errors.ActiveSqlTransaction: CREATE DATABASE cannot run inside a transaction block + + This happens because PostgreSQL itself will wrap multiple statements in a + transaction. Note that your will experience a different behaviour in + :program:`psql` (:program:`psql` will split the queries on semicolons and + send them to the server separately). + + This is not new in Psycopg 3: the same limitation is present in + `!psycopg2` too. + + +.. _multi-results: + +Multiple results returned from multiple statements +-------------------------------------------------- + +If more than one statement returning results is executed in psycopg2, only the +result of the last statement is returned:: + + >>> cur_pg2.execute("SELECT 1; SELECT 2") + >>> cur_pg2.fetchone() + (2,) + +In Psycopg 3 instead, all the results are available. After running the query, +the first result will be readily available in the cursor and can be consumed +using the usual `!fetch*()` methods. In order to access the following +results, you can use the `Cursor.nextset()` method:: + + >>> cur_pg3.execute("SELECT 1; SELECT 2") + >>> cur_pg3.fetchone() + (1,) + + >>> cur_pg3.nextset() + True + >>> cur_pg3.fetchone() + (2,) + + >>> cur_pg3.nextset() + None # no more results + +Remember though that you cannot use server-side bindings to :ref:`execute more +than one statement in the same query <multi-statements>`. + + +.. _difference-cast-rules: + +Different cast rules +-------------------- + +In rare cases, especially around variadic functions, PostgreSQL might fail to +find a function candidate for the given data types:: + + >>> conn.execute("SELECT json_build_array(%s, %s)", ["foo", "bar"]) + Traceback (most recent call last): + ... + psycopg.errors.IndeterminateDatatype: could not determine data type of parameter $1 + +This can be worked around specifying the argument types explicitly via a cast:: + + >>> conn.execute("SELECT json_build_array(%s::text, %s::text)", ["foo", "bar"]) + + +.. _in-and-tuple: + +You cannot use ``IN %s`` with a tuple +------------------------------------- + +``IN`` cannot be used with a tuple as single parameter, as was possible with +``psycopg2``:: + + >>> conn.execute("SELECT * FROM foo WHERE id IN %s", [(10,20,30)]) + Traceback (most recent call last): + ... + psycopg.errors.SyntaxError: syntax error at or near "$1" + LINE 1: SELECT * FROM foo WHERE id IN $1 + ^ + +What you can do is to use the `= ANY()`__ construct and pass the candidate +values as a list instead of a tuple, which will be adapted to a PostgreSQL +array:: + + >>> conn.execute("SELECT * FROM foo WHERE id = ANY(%s)", [[10,20,30]]) + +Note that `ANY()` can be used with `!psycopg2` too, and has the advantage of +accepting an empty list of values too as argument, which is not supported by +the :sql:`IN` operator instead. + +.. __: https://www.postgresql.org/docs/current/functions-comparisons.html + #id-1.5.8.30.16 + + +.. _diff-adapt: + +Different adaptation system +--------------------------- + +The adaptation system has been completely rewritten, in order to address +server-side parameters adaptation, but also to consider performance, +flexibility, ease of customization. + +The default behaviour with builtin data should be :ref:`what you would expect +<types-adaptation>`. If you have customised the way to adapt data, or if you +are managing your own extension types, you should look at the :ref:`new +adaptation system <adaptation>`. + +.. seealso:: + + - :ref:`types-adaptation` for the basic behaviour. + - :ref:`adaptation` for more advanced use. + + +.. _diff-copy: + +Copy is no longer file-based +---------------------------- + +`!psycopg2` exposes :ref:`a few copy methods <pg2:copy>` to interact with +PostgreSQL :sql:`COPY`. Their file-based interface doesn't make it easy to load +dynamically-generated data into a database. + +There is now a single `~Cursor.copy()` method, which is similar to +`!psycopg2` `!copy_expert()` in accepting a free-form :sql:`COPY` command and +returns an object to read/write data, block-wise or record-wise. The different +usage pattern also enables :sql:`COPY` to be used in async interactions. + +.. seealso:: See :ref:`copy` for the details. + + +.. _diff-with: + +`!with` connection +------------------ + +In `!psycopg2`, using the syntax :ref:`with connection <pg2:with>`, +only the transaction is closed, not the connection. This behaviour is +surprising for people used to several other Python classes wrapping resources, +such as files. + +In Psycopg 3, using :ref:`with connection <with-connection>` will close the +connection at the end of the `!with` block, making handling the connection +resources more familiar. + +In order to manage transactions as blocks you can use the +`Connection.transaction()` method, which allows for finer control, for +instance to use nested transactions. + +.. seealso:: See :ref:`transaction-context` for details. + + +.. _diff-callproc: + +`!callproc()` is gone +--------------------- + +`cursor.callproc()` is not implemented. The method has a simplistic semantic +which doesn't account for PostgreSQL positional parameters, procedures, +set-returning functions... Use a normal `~Cursor.execute()` with :sql:`SELECT +function_name(...)` or :sql:`CALL procedure_name(...)` instead. + + +.. _diff-client-encoding: + +`!client_encoding` is gone +-------------------------- + +Psycopg automatically uses the database client encoding to decode data to +Unicode strings. Use `ConnectionInfo.encoding` if you need to read the +encoding. You can select an encoding at connection time using the +`!client_encoding` connection parameter and you can change the encoding of a +connection by running a :sql:`SET client_encoding` statement... But why would +you? + + +.. _infinity-datetime: + +No default infinity dates handling +---------------------------------- + +PostgreSQL can represent a much wider range of dates and timestamps than +Python. While Python dates are limited to the years between 1 and 9999 +(represented by constants such as `datetime.date.min` and +`~datetime.date.max`), PostgreSQL dates extend to BC dates and past the year +10K. Furthermore PostgreSQL can also represent symbolic dates "infinity", in +both directions. + +In psycopg2, by default, `infinity dates and timestamps map to 'date.max'`__ +and similar constants. This has the problem of creating a non-bijective +mapping (two Postgres dates, infinity and 9999-12-31, both map to the same +Python date). There is also the perversity that valid Postgres dates, greater +than Python `!date.max` but arguably lesser than infinity, will still +overflow. + +In Psycopg 3, every date greater than year 9999 will overflow, including +infinity. If you would like to customize this mapping (for instance flattening +every date past Y10K on `!date.max`) you can subclass and adapt the +appropriate loaders: take a look at :ref:`this example +<adapt-example-inf-date>` to see how. + +.. __: https://www.psycopg.org/docs/usage.html#infinite-dates-handling + + +.. _whats-new: + +What's new in Psycopg 3 +----------------------- + +- :ref:`Asynchronous support <async>` +- :ref:`Server-side parameters binding <server-side-binding>` +- :ref:`Prepared statements <prepared-statements>` +- :ref:`Binary communication <binary-data>` +- :ref:`Python-based COPY support <copy>` +- :ref:`Support for static typing <static-typing>` +- :ref:`A redesigned connection pool <connection-pools>` +- :ref:`Direct access to the libpq functionalities <psycopg.pq>` diff --git a/docs/basic/index.rst b/docs/basic/index.rst new file mode 100644 index 0000000..bf9e27d --- /dev/null +++ b/docs/basic/index.rst @@ -0,0 +1,26 @@ +.. _basic: + +Getting started with Psycopg 3 +============================== + +This section of the documentation will explain :ref:`how to install Psycopg +<installation>` and how to perform normal activities such as :ref:`querying +the database <usage>` or :ref:`loading data using COPY <copy>`. + +.. important:: + + If you are familiar with psycopg2 please take a look at + :ref:`from-psycopg2` to see what is changed. + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + install + usage + params + adapt + pgtypes + transactions + copy + from_pg2 diff --git a/docs/basic/install.rst b/docs/basic/install.rst new file mode 100644 index 0000000..8e1dc6d --- /dev/null +++ b/docs/basic/install.rst @@ -0,0 +1,172 @@ +.. _installation: + +Installation +============ + +In short, if you use a :ref:`supported system<supported-systems>`:: + + pip install --upgrade pip # upgrade pip to at least 20.3 + pip install "psycopg[binary]" + +and you should be :ref:`ready to start <module-usage>`. Read further for +alternative ways to install. + + +.. _supported-systems: + +Supported systems +----------------- + +The Psycopg version documented here has *official and tested* support for: + +- Python: from version 3.7 to 3.11 + + - Python 3.6 supported before Psycopg 3.1 + +- PostgreSQL: from version 10 to 15 +- OS: Linux, macOS, Windows + +The tests to verify the supported systems run in `Github workflows`__: +anything that is not tested there is not officially supported. This includes: + +.. __: https://github.com/psycopg/psycopg/actions + +- Unofficial Python distributions such as Conda; +- Alternative PostgreSQL implementation; +- macOS hardware and releases not available on Github workflows. + +If you use an unsupported system, things might work (because, for instance, the +database may use the same wire protocol as PostgreSQL) but we cannot guarantee +the correct working or a smooth ride. + + +.. _binary-install: + +Binary installation +------------------- + +The quickest way to start developing with Psycopg 3 is to install the binary +packages by running:: + + pip install "psycopg[binary]" + +This will install a self-contained package with all the libraries needed. +**You will need pip 20.3 at least**: please run ``pip install --upgrade pip`` +to update it beforehand. + +The above package should work in most situations. It **will not work** in +some cases though. + +If your platform is not supported you should proceed to a :ref:`local +installation <local-installation>` or a :ref:`pure Python installation +<pure-python-installation>`. + +.. seealso:: + + Did Psycopg 3 install ok? Great! You can now move on to the :ref:`basic + module usage <module-usage>` to learn how it works. + + Keep on reading if the above method didn't work and you need a different + way to install Psycopg 3. + + For further information about the differences between the packages see + :ref:`pq-impl`. + + +.. _local-installation: + +Local installation +------------------ + +A "Local installation" results in a performing and maintainable library. The +library will include the speed-up C module and will be linked to the system +libraries (``libpq``, ``libssl``...) so that system upgrade of libraries will +upgrade the libraries used by Psycopg 3 too. This is the preferred way to +install Psycopg for a production site. + +In order to perform a local installation you need some prerequisites: + +- a C compiler, +- Python development headers (e.g. the ``python3-dev`` package). +- PostgreSQL client development headers (e.g. the ``libpq-dev`` package). +- The :program:`pg_config` program available in the :envvar:`PATH`. + +You **must be able** to troubleshoot an extension build, for instance you must +be able to read your compiler's error message. If you are not, please don't +try this and follow the `binary installation`_ instead. + +If your build prerequisites are in place you can run:: + + pip install "psycopg[c]" + + +.. _pure-python-installation: + +Pure Python installation +------------------------ + +If you simply install:: + + pip install psycopg + +without ``[c]`` or ``[binary]`` extras you will obtain a pure Python +implementation. This is particularly handy to debug and hack, but it still +requires the system libpq to operate (which will be imported dynamically via +`ctypes`). + +In order to use the pure Python installation you will need the ``libpq`` +installed in the system: for instance on Debian system you will probably +need:: + + sudo apt install libpq5 + +.. note:: + + The ``libpq`` is the client library used by :program:`psql`, the + PostgreSQL command line client, to connect to the database. On most + systems, installing :program:`psql` will install the ``libpq`` too as a + dependency. + +If you are not able to fulfill this requirement please follow the `binary +installation`_. + + +.. _pool-installation: + +Installing the connection pool +------------------------------ + +The :ref:`Psycopg connection pools <connection-pools>` are distributed in a +separate package from the `!psycopg` package itself, in order to allow a +different release cycle. + +In order to use the pool you must install the ``pool`` extra, using ``pip +install "psycopg[pool]"``, or install the `psycopg_pool` package separately, +which would allow to specify the release to install more precisely. + + +Handling dependencies +--------------------- + +If you need to specify your project dependencies (for instance in a +``requirements.txt`` file, ``setup.py``, ``pyproject.toml`` dependencies...) +you should probably specify one of the following: + +- If your project is a library, add a dependency on ``psycopg``. This will + make sure that your library will have the ``psycopg`` package with the right + interface and leaves the possibility of choosing a specific implementation + to the end user of your library. + +- If your project is a final application (e.g. a service running on a server) + you can require a specific implementation, for instance ``psycopg[c]``, + after you have made sure that the prerequisites are met (e.g. the depending + libraries and tools are installed in the host machine). + +In both cases you can specify which version of Psycopg to use using +`requirement specifiers`__. + +.. __: https://pip.pypa.io/en/stable/cli/pip_install/#requirement-specifiers + +If you want to make sure that a specific implementation is used you can +specify the :envvar:`PSYCOPG_IMPL` environment variable: importing the library +will fail if the implementation specified is not available. See :ref:`pq-impl`. diff --git a/docs/basic/params.rst b/docs/basic/params.rst new file mode 100644 index 0000000..a733f07 --- /dev/null +++ b/docs/basic/params.rst @@ -0,0 +1,242 @@ +.. currentmodule:: psycopg + +.. index:: + pair: Query; Parameters + +.. _query-parameters: + +Passing parameters to SQL queries +================================= + +Most of the times, writing a program you will have to mix bits of SQL +statements with values provided by the rest of the program: + +.. code:: + + SELECT some, fields FROM some_table WHERE id = ... + +:sql:`id` equals what? Probably you will have a Python value you are looking +for. + + +`!execute()` arguments +---------------------- + +Passing parameters to a SQL statement happens in functions such as +`Cursor.execute()` by using ``%s`` placeholders in the SQL statement, and +passing a sequence of values as the second argument of the function. For +example the Python function call: + +.. code:: python + + cur.execute(""" + INSERT INTO some_table (id, created_at, last_name) + VALUES (%s, %s, %s); + """, + (10, datetime.date(2020, 11, 18), "O'Reilly")) + +is *roughly* equivalent to the SQL command: + +.. code-block:: sql + + INSERT INTO some_table (id, created_at, last_name) + VALUES (10, '2020-11-18', 'O''Reilly'); + +Note that the parameters will not be really merged to the query: query and the +parameters are sent to the server separately: see :ref:`server-side-binding` +for details. + +Named arguments are supported too using :samp:`%({name})s` placeholders in the +query and specifying the values into a mapping. Using named arguments allows +to specify the values in any order and to repeat the same value in several +places in the query:: + + cur.execute(""" + INSERT INTO some_table (id, created_at, updated_at, last_name) + VALUES (%(id)s, %(created)s, %(created)s, %(name)s); + """, + {'id': 10, 'name': "O'Reilly", 'created': datetime.date(2020, 11, 18)}) + +Using characters ``%``, ``(``, ``)`` in the argument names is not supported. + +When parameters are used, in order to include a literal ``%`` in the query you +can use the ``%%`` string:: + + cur.execute("SELECT (%s % 2) = 0 AS even", (10,)) # WRONG + cur.execute("SELECT (%s %% 2) = 0 AS even", (10,)) # correct + +While the mechanism resembles regular Python strings manipulation, there are a +few subtle differences you should care about when passing parameters to a +query. + +- The Python string operator ``%`` *must not be used*: the `~cursor.execute()` + method accepts a tuple or dictionary of values as second parameter. + |sql-warn|__: + + .. |sql-warn| replace:: **Never** use ``%`` or ``+`` to merge values + into queries + + .. code:: python + + cur.execute("INSERT INTO numbers VALUES (%s, %s)" % (10, 20)) # WRONG + cur.execute("INSERT INTO numbers VALUES (%s, %s)", (10, 20)) # correct + + .. __: sql-injection_ + +- For positional variables binding, *the second argument must always be a + sequence*, even if it contains a single variable (remember that Python + requires a comma to create a single element tuple):: + + cur.execute("INSERT INTO foo VALUES (%s)", "bar") # WRONG + cur.execute("INSERT INTO foo VALUES (%s)", ("bar")) # WRONG + cur.execute("INSERT INTO foo VALUES (%s)", ("bar",)) # correct + cur.execute("INSERT INTO foo VALUES (%s)", ["bar"]) # correct + +- The placeholder *must not be quoted*:: + + cur.execute("INSERT INTO numbers VALUES ('%s')", ("Hello",)) # WRONG + cur.execute("INSERT INTO numbers VALUES (%s)", ("Hello",)) # correct + +- The variables placeholder *must always be a* ``%s``, even if a different + placeholder (such as a ``%d`` for integers or ``%f`` for floats) may look + more appropriate for the type. You may find other placeholders used in + Psycopg queries (``%b`` and ``%t``) but they are not related to the + type of the argument: see :ref:`binary-data` if you want to read more:: + + cur.execute("INSERT INTO numbers VALUES (%d)", (10,)) # WRONG + cur.execute("INSERT INTO numbers VALUES (%s)", (10,)) # correct + +- Only query values should be bound via this method: it shouldn't be used to + merge table or field names to the query. If you need to generate SQL queries + dynamically (for instance choosing a table name at runtime) you can use the + functionalities provided in the `psycopg.sql` module:: + + cur.execute("INSERT INTO %s VALUES (%s)", ('numbers', 10)) # WRONG + cur.execute( # correct + SQL("INSERT INTO {} VALUES (%s)").format(Identifier('numbers')), + (10,)) + + +.. index:: Security, SQL injection + +.. _sql-injection: + +Danger: SQL injection +--------------------- + +The SQL representation of many data types is often different from their Python +string representation. The typical example is with single quotes in strings: +in SQL single quotes are used as string literal delimiters, so the ones +appearing inside the string itself must be escaped, whereas in Python single +quotes can be left unescaped if the string is delimited by double quotes. + +Because of the difference, sometimes subtle, between the data types +representations, a naïve approach to query strings composition, such as using +Python strings concatenation, is a recipe for *terrible* problems:: + + SQL = "INSERT INTO authors (name) VALUES ('%s')" # NEVER DO THIS + data = ("O'Reilly", ) + cur.execute(SQL % data) # THIS WILL FAIL MISERABLY + # SyntaxError: syntax error at or near "Reilly" + +If the variables containing the data to send to the database come from an +untrusted source (such as data coming from a form on a web site) an attacker +could easily craft a malformed string, either gaining access to unauthorized +data or performing destructive operations on the database. This form of attack +is called `SQL injection`_ and is known to be one of the most widespread forms +of attack on database systems. Before continuing, please print `this page`__ +as a memo and hang it onto your desk. + +.. _SQL injection: https://en.wikipedia.org/wiki/SQL_injection +.. __: https://xkcd.com/327/ + +Psycopg can :ref:`automatically convert Python objects to SQL +values<types-adaptation>`: using this feature your code will be more robust +and reliable. We must stress this point: + +.. warning:: + + - Don't manually merge values to a query: hackers from a foreign country + will break into your computer and steal not only your disks, but also + your cds, leaving you only with the three most embarrassing records you + ever bought. On cassette tapes. + + - If you use the ``%`` operator to merge values to a query, con artists + will seduce your cat, who will run away taking your credit card + and your sunglasses with them. + + - If you use ``+`` to merge a textual value to a string, bad guys in + balaclava will find their way to your fridge, drink all your beer, and + leave your toilet seat up and your toilet paper in the wrong orientation. + + - You don't want to manually merge values to a query: :ref:`use the + provided methods <query-parameters>` instead. + +The correct way to pass variables in a SQL command is using the second +argument of the `Cursor.execute()` method:: + + SQL = "INSERT INTO authors (name) VALUES (%s)" # Note: no quotes + data = ("O'Reilly", ) + cur.execute(SQL, data) # Note: no % operator + +.. note:: + + Python static code checkers are not quite there yet, but, in the future, + it will be possible to check your code for improper use of string + expressions in queries. See :ref:`literal-string` for details. + +.. seealso:: + + Now that you know how to pass parameters to queries, you can take a look + at :ref:`how Psycopg converts data types <types-adaptation>`. + + +.. index:: + pair: Binary; Parameters + +.. _binary-data: + +Binary parameters and results +----------------------------- + +PostgreSQL has two different ways to transmit data between client and server: +`~psycopg.pq.Format.TEXT`, always available, and `~psycopg.pq.Format.BINARY`, +available most of the times but not always. Usually the binary format is more +efficient to use. + +Psycopg can support both formats for each data type. Whenever a value +is passed to a query using the normal ``%s`` placeholder, the best format +available is chosen (often, but not always, the binary format is picked as the +best choice). + +If you have a reason to select explicitly the binary format or the text format +for a value you can use respectively a ``%b`` placeholder or a ``%t`` +placeholder instead of the normal ``%s``. `~Cursor.execute()` will fail if a +`~psycopg.adapt.Dumper` for the right data type and format is not available. + +The same two formats, text or binary, are used by PostgreSQL to return data +from a query to the client. Unlike with parameters, where you can choose the +format value-by-value, all the columns returned by a query will have the same +format. Every type returned by the query should have a `~psycopg.adapt.Loader` +configured, otherwise the data will be returned as unparsed `!str` (for text +results) or buffer (for binary results). + +.. note:: + The `pg_type`_ table defines which format is supported for each PostgreSQL + data type. Text input/output is managed by the functions declared in the + ``typinput`` and ``typoutput`` fields (always present), binary + input/output is managed by the ``typsend`` and ``typreceive`` (which are + optional). + + .. _pg_type: https://www.postgresql.org/docs/current/catalog-pg-type.html + +Because not every PostgreSQL type supports binary output, by default, the data +will be returned in text format. In order to return data in binary format you +can create the cursor using `Connection.cursor`\ `!(binary=True)` or execute +the query using `Cursor.execute`\ `!(binary=True)`. A case in which +requesting binary results is a clear winner is when you have large binary data +in the database, such as images:: + + cur.execute( + "SELECT image_data FROM images WHERE id = %s", [image_id], binary=True) + data = cur.fetchone()[0] diff --git a/docs/basic/pgtypes.rst b/docs/basic/pgtypes.rst new file mode 100644 index 0000000..14ee5be --- /dev/null +++ b/docs/basic/pgtypes.rst @@ -0,0 +1,389 @@ +.. currentmodule:: psycopg + +.. index:: + single: Adaptation + pair: Objects; Adaptation + single: Data types; Adaptation + +.. _extra-adaptation: + +Adapting other PostgreSQL types +=============================== + +PostgreSQL offers other data types which don't map to native Python types. +Psycopg offers wrappers and conversion functions to allow their use. + + +.. index:: + pair: Composite types; Data types + pair: tuple; Adaptation + pair: namedtuple; Adaptation + +.. _adapt-composite: + +Composite types casting +----------------------- + +Psycopg can adapt PostgreSQL composite types (either created with the |CREATE +TYPE|_ command or implicitly defined after a table row type) to and from +Python tuples, `~collections.namedtuple`, or any other suitable object +configured. + +.. |CREATE TYPE| replace:: :sql:`CREATE TYPE` +.. _CREATE TYPE: https://www.postgresql.org/docs/current/static/sql-createtype.html + +Before using a composite type it is necessary to get information about it +using the `~psycopg.types.composite.CompositeInfo` class and to register it +using `~psycopg.types.composite.register_composite()`. + +.. autoclass:: psycopg.types.composite.CompositeInfo + + `!CompositeInfo` is a `~psycopg.types.TypeInfo` subclass: check its + documentation for the generic usage, especially the + `~psycopg.types.TypeInfo.fetch()` method. + + .. attribute:: python_type + + After `register_composite()` is called, it will contain the python type + mapping to the registered composite. + +.. autofunction:: psycopg.types.composite.register_composite + + After registering, fetching data of the registered composite will invoke + `!factory` to create corresponding Python objects. + + If no factory is specified, a `~collection.namedtuple` is created and used + to return data. + + If the `!factory` is a type (and not a generic callable), then dumpers for + that type are created and registered too, so that passing objects of that + type to a query will adapt them to the registered type. + +Example:: + + >>> from psycopg.types.composite import CompositeInfo, register_composite + + >>> conn.execute("CREATE TYPE card AS (value int, suit text)") + + >>> info = CompositeInfo.fetch(conn, "card") + >>> register_composite(info, conn) + + >>> my_card = info.python_type(8, "hearts") + >>> my_card + card(value=8, suit='hearts') + + >>> conn.execute( + ... "SELECT pg_typeof(%(card)s), (%(card)s).suit", {"card": my_card} + ... ).fetchone() + ('card', 'hearts') + + >>> conn.execute("SELECT (%s, %s)::card", [1, "spades"]).fetchone()[0] + card(value=1, suit='spades') + + +Nested composite types are handled as expected, provided that the type of the +composite components are registered as well:: + + >>> conn.execute("CREATE TYPE card_back AS (face card, back text)") + + >>> info2 = CompositeInfo.fetch(conn, "card_back") + >>> register_composite(info2, conn) + + >>> conn.execute("SELECT ((8, 'hearts'), 'blue')::card_back").fetchone()[0] + card_back(face=card(value=8, suit='hearts'), back='blue') + + +.. index:: + pair: range; Data types + +.. _adapt-range: + +Range adaptation +---------------- + +PostgreSQL `range types`__ are a family of data types representing a range of +values between two elements. The type of the element is called the range +*subtype*. PostgreSQL offers a few built-in range types and allows the +definition of custom ones. + +.. __: https://www.postgresql.org/docs/current/rangetypes.html + +All the PostgreSQL range types are loaded as the `~psycopg.types.range.Range` +Python type, which is a `~typing.Generic` type and can hold bounds of +different types. + +.. autoclass:: psycopg.types.range.Range + + This Python type is only used to pass and retrieve range values to and + from PostgreSQL and doesn't attempt to replicate the PostgreSQL range + features: it doesn't perform normalization and doesn't implement all the + operators__ supported by the database. + + PostgreSQL will perform normalisation on `!Range` objects used as query + parameters, so, when they are fetched back, they will be found in the + normal form (for instance ranges on integers will have `[)` bounds). + + .. __: https://www.postgresql.org/docs/current/static/functions-range.html#RANGE-OPERATORS-TABLE + + `!Range` objects are immutable, hashable, and support the `!in` operator + (checking if an element is within the range). They can be tested for + equivalence. Empty ranges evaluate to `!False` in a boolean context, + nonempty ones evaluate to `!True`. + + `!Range` objects have the following attributes: + + .. autoattribute:: isempty + .. autoattribute:: lower + .. autoattribute:: upper + .. autoattribute:: lower_inc + .. autoattribute:: upper_inc + .. autoattribute:: lower_inf + .. autoattribute:: upper_inf + +The built-in range objects are adapted automatically: if a `!Range` objects +contains `~datetime.date` bounds, it is dumped using the :sql:`daterange` OID, +and of course :sql:`daterange` values are loaded back as `!Range[date]`. + +If you create your own range type you can use `~psycopg.types.range.RangeInfo` +and `~psycopg.types.range.register_range()` to associate the range type with +its subtype and make it work like the builtin ones. + +.. autoclass:: psycopg.types.range.RangeInfo + + `!RangeInfo` is a `~psycopg.types.TypeInfo` subclass: check its + documentation for generic details, especially the + `~psycopg.types.TypeInfo.fetch()` method. + +.. autofunction:: psycopg.types.range.register_range + +Example:: + + >>> from psycopg.types.range import Range, RangeInfo, register_range + + >>> conn.execute("CREATE TYPE strrange AS RANGE (SUBTYPE = text)") + >>> info = RangeInfo.fetch(conn, "strrange") + >>> register_range(info, conn) + + >>> conn.execute("SELECT pg_typeof(%s)", [Range("a", "z")]).fetchone()[0] + 'strrange' + + >>> conn.execute("SELECT '[a,z]'::strrange").fetchone()[0] + Range('a', 'z', '[]') + + +.. index:: + pair: range; Data types + +.. _adapt-multirange: + +Multirange adaptation +--------------------- + +Since PostgreSQL 14, every range type is associated with a multirange__, a +type representing a disjoint set of ranges. A multirange is +automatically available for every range, built-in and user-defined. + +.. __: https://www.postgresql.org/docs/current/rangetypes.html + +All the PostgreSQL range types are loaded as the +`~psycopg.types.multirange.Multirange` Python type, which is a mutable +sequence of `~psycopg.types.range.Range` elements. + +.. autoclass:: psycopg.types.multirange.Multirange + + This Python type is only used to pass and retrieve multirange values to + and from PostgreSQL and doesn't attempt to replicate the PostgreSQL + multirange features: overlapping items are not merged, empty ranges are + not discarded, the items are not ordered, the behaviour of `multirange + operators`__ is not replicated in Python. + + PostgreSQL will perform normalisation on `!Multirange` objects used as + query parameters, so, when they are fetched back, they will be found + ordered, with overlapping ranges merged, etc. + + .. __: https://www.postgresql.org/docs/current/static/functions-range.html#MULTIRANGE-OPERATORS-TABLE + + `!Multirange` objects are a `~collections.abc.MutableSequence` and are + totally ordered: they behave pretty much like a list of `!Range`. Like + Range, they are `~typing.Generic` on the subtype of their range, so you + can declare a variable to be `!Multirange[date]` and mypy will complain if + you try to add it a `Range[Decimal]`. + +Like for `~psycopg.types.range.Range`, built-in multirange objects are adapted +automatically: if a `!Multirange` object contains `!Range` with +`~datetime.date` bounds, it is dumped using the :sql:`datemultirange` OID, and +:sql:`datemultirange` values are loaded back as `!Multirange[date]`. + +If you have created your own range type you can use +`~psycopg.types.multirange.MultirangeInfo` and +`~psycopg.types.multirange.register_multirange()` to associate the resulting +multirange type with its subtype and make it work like the builtin ones. + +.. autoclass:: psycopg.types.multirange.MultirangeInfo + + `!MultirangeInfo` is a `~psycopg.types.TypeInfo` subclass: check its + documentation for generic details, especially the + `~psycopg.types.TypeInfo.fetch()` method. + +.. autofunction:: psycopg.types.multirange.register_multirange + +Example:: + + >>> from psycopg.types.multirange import \ + ... Multirange, MultirangeInfo, register_multirange + >>> from psycopg.types.range import Range + + >>> conn.execute("CREATE TYPE strrange AS RANGE (SUBTYPE = text)") + >>> info = MultirangeInfo.fetch(conn, "strmultirange") + >>> register_multirange(info, conn) + + >>> rec = conn.execute( + ... "SELECT pg_typeof(%(mr)s), %(mr)s", + ... {"mr": Multirange([Range("a", "q"), Range("l", "z")])}).fetchone() + + >>> rec[0] + 'strmultirange' + >>> rec[1] + Multirange([Range('a', 'z', '[)')]) + + +.. index:: + pair: hstore; Data types + pair: dict; Adaptation + +.. _adapt-hstore: + +Hstore adaptation +----------------- + +The |hstore|_ data type is a key-value store embedded in PostgreSQL. It +supports GiST or GIN indexes allowing search by keys or key/value pairs as +well as regular BTree indexes for equality, uniqueness etc. + +.. |hstore| replace:: :sql:`hstore` +.. _hstore: https://www.postgresql.org/docs/current/static/hstore.html + +Psycopg can convert Python `!dict` objects to and from |hstore| structures. +Only dictionaries with string keys and values are supported. `!None` is also +allowed as value but not as a key. + +In order to use the |hstore| data type it is necessary to load it in a +database using: + +.. code:: none + + =# CREATE EXTENSION hstore; + +Because |hstore| is distributed as a contrib module, its oid is not well +known, so it is necessary to use `!TypeInfo`\.\ +`~psycopg.types.TypeInfo.fetch()` to query the database and get its oid. The +resulting object can be passed to +`~psycopg.types.hstore.register_hstore()` to configure dumping `!dict` to +|hstore| and parsing |hstore| back to `!dict`, in the context where the +adapter is registered. + +.. autofunction:: psycopg.types.hstore.register_hstore + +Example:: + + >>> from psycopg.types import TypeInfo + >>> from psycopg.types.hstore import register_hstore + + >>> info = TypeInfo.fetch(conn, "hstore") + >>> register_hstore(info, conn) + + >>> conn.execute("SELECT pg_typeof(%s)", [{"a": "b"}]).fetchone()[0] + 'hstore' + + >>> conn.execute("SELECT 'foo => bar'::hstore").fetchone()[0] + {'foo': 'bar'} + + +.. index:: + pair: geometry; Data types + single: PostGIS; Data types + +.. _adapt-shapely: + +Geometry adaptation using Shapely +--------------------------------- + +When using the PostGIS_ extension, it can be useful to retrieve geometry_ +values and have them automatically converted to Shapely_ instances. Likewise, +you may want to store such instances in the database and have the conversion +happen automatically. + +.. warning:: + Psycopg doesn't have a dependency on the ``shapely`` package: you should + install the library as an additional dependency of your project. + +.. warning:: + This module is experimental and might be changed in the future according + to users' feedback. + +.. _PostGIS: https://postgis.net/ +.. _geometry: https://postgis.net/docs/geometry.html +.. _Shapely: https://github.com/Toblerity/Shapely +.. _shape: https://shapely.readthedocs.io/en/stable/manual.html#shapely.geometry.shape + +Since PostgGIS is an extension, the :sql:`geometry` type oid is not well +known, so it is necessary to use `!TypeInfo`\.\ +`~psycopg.types.TypeInfo.fetch()` to query the database and find it. The +resulting object can be passed to `~psycopg.types.shapely.register_shapely()` +to configure dumping `shape`_ instances to :sql:`geometry` columns and parsing +:sql:`geometry` data back to `!shape` instances, in the context where the +adapters are registered. + +.. function:: psycopg.types.shapely.register_shapely + + Register Shapely dumper and loaders. + + After invoking this function on an adapter, the queries retrieving + PostGIS geometry objects will return Shapely's shape object instances + both in text and binary mode. + + Similarly, shape objects can be sent to the database. + + This requires the Shapely library to be installed. + + :param info: The object with the information about the geometry type. + :param context: The context where to register the adapters. If `!None`, + register it globally. + + .. note:: + + Registering the adapters doesn't affect objects already created, even + if they are children of the registered context. For instance, + registering the adapter globally doesn't affect already existing + connections. + +Example:: + + >>> from psycopg.types import TypeInfo + >>> from psycopg.types.shapely import register_shapely + >>> from shapely.geometry import Point + + >>> info = TypeInfo.fetch(conn, "geometry") + >>> register_shapely(info, conn) + + >>> conn.execute("SELECT pg_typeof(%s)", [Point(1.2, 3.4)]).fetchone()[0] + 'geometry' + + >>> conn.execute(""" + ... SELECT ST_GeomFromGeoJSON('{ + ... "type":"Point", + ... "coordinates":[-48.23456,20.12345]}') + ... """).fetchone()[0] + <shapely.geometry.multipolygon.MultiPolygon object at 0x7fb131f3cd90> + +Notice that, if the geometry adapters are registered on a specific object (a +connection or cursor), other connections and cursors will be unaffected:: + + >>> conn2 = psycopg.connect(CONN_STR) + >>> conn2.execute(""" + ... SELECT ST_GeomFromGeoJSON('{ + ... "type":"Point", + ... "coordinates":[-48.23456,20.12345]}') + ... """).fetchone()[0] + '0101000020E61000009279E40F061E48C0F2B0506B9A1F3440' + diff --git a/docs/basic/transactions.rst b/docs/basic/transactions.rst new file mode 100644 index 0000000..b976046 --- /dev/null +++ b/docs/basic/transactions.rst @@ -0,0 +1,388 @@ +.. currentmodule:: psycopg + +.. index:: Transactions management +.. index:: InFailedSqlTransaction +.. index:: idle in transaction + +.. _transactions: + +Transactions management +======================= + +Psycopg has a behaviour that may seem surprising compared to +:program:`psql`: by default, any database operation will start a new +transaction. As a consequence, changes made by any cursor of the connection +will not be visible until `Connection.commit()` is called, and will be +discarded by `Connection.rollback()`. The following operation on the same +connection will start a new transaction. + +If a database operation fails, the server will refuse further commands, until +a `~rollback()` is called. + +If the cursor is closed with a transaction open, no COMMIT command is sent to +the server, which will then discard the connection. Certain middleware (such +as PgBouncer) will also discard a connection left in transaction state, so, if +possible you will want to commit or rollback a connection before finishing +working with it. + +An example of what will happen, the first time you will use Psycopg (and to be +disappointed by it), is likely: + +.. code:: python + + conn = psycopg.connect() + + # Creating a cursor doesn't start a transaction or affect the connection + # in any way. + cur = conn.cursor() + + cur.execute("SELECT count(*) FROM my_table") + # This function call executes: + # - BEGIN + # - SELECT count(*) FROM my_table + # So now a transaction has started. + + # If your program spends a long time in this state, the server will keep + # a connection "idle in transaction", which is likely something undesired + + cur.execute("INSERT INTO data VALUES (%s)", ("Hello",)) + # This statement is executed inside the transaction + + conn.close() + # No COMMIT was sent: the INSERT was discarded. + +There are a few things going wrong here, let's see how they can be improved. + +One obvious problem after the run above is that, firing up :program:`psql`, +you will see no new record in the table ``data``. One way to fix the problem +is to call `!conn.commit()` before closing the connection. Thankfully, if you +use the :ref:`connection context <with-connection>`, Psycopg will commit the +connection at the end of the block (or roll it back if the block is exited +with an exception): + +The code modified using a connection context will result in the following +sequence of database statements: + +.. code-block:: python + :emphasize-lines: 1 + + with psycopg.connect() as conn: + + cur = conn.cursor() + + cur.execute("SELECT count(*) FROM my_table") + # This function call executes: + # - BEGIN + # - SELECT count(*) FROM my_table + # So now a transaction has started. + + cur.execute("INSERT INTO data VALUES (%s)", ("Hello",)) + # This statement is executed inside the transaction + + # No exception at the end of the block: + # COMMIT is executed. + +This way we don't have to remember to call neither `!close()` nor `!commit()` +and the database operations actually have a persistent effect. The code might +still do something you don't expect: keep a transaction from the first +operation to the connection closure. You can have a finer control over the +transactions using an :ref:`autocommit transaction <autocommit>` and/or +:ref:`transaction contexts <transaction-context>`. + +.. warning:: + + By default even a simple :sql:`SELECT` will start a transaction: in + long-running programs, if no further action is taken, the session will + remain *idle in transaction*, an undesirable condition for several + reasons (locks are held by the session, tables bloat...). For long lived + scripts, either make sure to terminate a transaction as soon as possible or + use an `~Connection.autocommit` connection. + +.. hint:: + + If a database operation fails with an error message such as + *InFailedSqlTransaction: current transaction is aborted, commands ignored + until end of transaction block*, it means that **a previous operation + failed** and the database session is in a state of error. You need to call + `~Connection.rollback()` if you want to keep on using the same connection. + + +.. _autocommit: + +Autocommit transactions +----------------------- + +The manual commit requirement can be suspended using `~Connection.autocommit`, +either as connection attribute or as `~psycopg.Connection.connect()` +parameter. This may be required to run operations that cannot be executed +inside a transaction, such as :sql:`CREATE DATABASE`, :sql:`VACUUM`, +:sql:`CALL` on `stored procedures`__ using transaction control. + +.. __: https://www.postgresql.org/docs/current/xproc.html + +With an autocommit transaction, the above sequence of operation results in: + +.. code-block:: python + :emphasize-lines: 1 + + with psycopg.connect(autocommit=True) as conn: + + cur = conn.cursor() + + cur.execute("SELECT count(*) FROM my_table") + # This function call now only executes: + # - SELECT count(*) FROM my_table + # and no transaction starts. + + cur.execute("INSERT INTO data VALUES (%s)", ("Hello",)) + # The result of this statement is persisted immediately by the database + + # The connection is closed at the end of the block but, because it is not + # in a transaction state, no COMMIT is executed. + +An autocommit transaction behaves more as someone coming from :program:`psql` +would expect. This has a beneficial performance effect, because less queries +are sent and less operations are performed by the database. The statements, +however, are not executed in an atomic transaction; if you need to execute +certain operations inside a transaction, you can achieve that with an +autocommit connection too, using an explicit :ref:`transaction block +<transaction-context>`. + + +.. _transaction-context: + +Transaction contexts +-------------------- + +A more transparent way to make sure that transactions are finalised at the +right time is to use `!with` `Connection.transaction()` to create a +transaction context. When the context is entered, a transaction is started; +when leaving the context the transaction is committed, or it is rolled back if +an exception is raised inside the block. + +Continuing the example above, if you want to use an autocommit connection but +still wrap selected groups of commands inside an atomic transaction, you can +use a `!transaction()` context: + +.. code-block:: python + :emphasize-lines: 8 + + with psycopg.connect(autocommit=True) as conn: + + cur = conn.cursor() + + cur.execute("SELECT count(*) FROM my_table") + # The connection is autocommit, so no BEGIN executed. + + with conn.transaction(): + # BEGIN is executed, a transaction started + + cur.execute("INSERT INTO data VALUES (%s)", ("Hello",)) + cur.execute("INSERT INTO times VALUES (now())") + # These two operation run atomically in the same transaction + + # COMMIT is executed at the end of the block. + # The connection is in idle state again. + + # The connection is closed at the end of the block. + + +Note that connection blocks can also be used with non-autocommit connections: +in this case you still need to pay attention to eventual transactions started +automatically. If an operation starts an implicit transaction, a +`!transaction()` block will only manage :ref:`a savepoint sub-transaction +<nested-transactions>`, leaving the caller to deal with the main transaction, +as explained in :ref:`transactions`: + +.. code:: python + + conn = psycopg.connect() + + cur = conn.cursor() + + cur.execute("SELECT count(*) FROM my_table") + # This function call executes: + # - BEGIN + # - SELECT count(*) FROM my_table + # So now a transaction has started. + + with conn.transaction(): + # The block starts with a transaction already open, so it will execute + # - SAVEPOINT + + cur.execute("INSERT INTO data VALUES (%s)", ("Hello",)) + + # The block was executing a sub-transaction so on exit it will only run: + # - RELEASE SAVEPOINT + # The transaction is still on. + + conn.close() + # No COMMIT was sent: the INSERT was discarded. + +If a `!transaction()` block starts when no transaction is active then it will +manage a proper transaction. In essence, a transaction context tries to leave +a connection in the state it found it, and leaves you to deal with the wider +context. + +.. hint:: + The interaction between non-autocommit transactions and transaction + contexts is probably surprising. Although the non-autocommit default is + what's demanded by the DBAPI, the personal preference of several experienced + developers is to: + + - use a connection block: ``with psycopg.connect(...) as conn``; + - use an autocommit connection, either passing `!autocommit=True` as + `!connect()` parameter or setting the attribute ``conn.autocommit = + True``; + - use `!with conn.transaction()` blocks to manage transactions only where + needed. + + +.. _nested-transactions: + +Nested transactions +^^^^^^^^^^^^^^^^^^^ + +Transaction blocks can be also nested (internal transaction blocks are +implemented using SAVEPOINT__): an exception raised inside an inner block +has a chance of being handled and not completely fail outer operations. The +following is an example where a series of operations interact with the +database: operations are allowed to fail; at the end we also want to store the +number of operations successfully processed. + +.. __: https://www.postgresql.org/docs/current/sql-savepoint.html + +.. code:: python + + with conn.transaction() as tx1: + num_ok = 0 + for operation in operations: + try: + with conn.transaction() as tx2: + unreliable_operation(conn, operation) + except Exception: + logger.exception(f"{operation} failed") + else: + num_ok += 1 + + save_number_of_successes(conn, num_ok) + +If `!unreliable_operation()` causes an error, including an operation causing a +database error, all its changes will be reverted. The exception bubbles up +outside the block: in the example it is intercepted by the `!try` so that the +loop can complete. The outermost block is unaffected (unless other errors +happen there). + +You can also write code to explicitly roll back any currently active +transaction block, by raising the `Rollback` exception. The exception "jumps" +to the end of a transaction block, rolling back its transaction but allowing +the program execution to continue from there. By default the exception rolls +back the innermost transaction block, but any current block can be specified +as the target. In the following example, a hypothetical `!CancelCommand` +may stop the processing and cancel any operation previously performed, +but not entirely committed yet. + +.. code:: python + + from psycopg import Rollback + + with conn.transaction() as outer_tx: + for command in commands(): + with conn.transaction() as inner_tx: + if isinstance(command, CancelCommand): + raise Rollback(outer_tx) + process_command(command) + + # If `Rollback` is raised, it would propagate only up to this block, + # and the program would continue from here with no exception. + + +.. _transaction-characteristics: + +Transaction characteristics +--------------------------- + +You can set `transaction parameters`__ for the transactions that Psycopg +handles. They affect the transactions started implicitly by non-autocommit +transactions and the ones started explicitly by `Connection.transaction()` for +both autocommit and non-autocommit transactions. Leaving these parameters as +`!None` will use the server's default behaviour (which is controlled +by server settings such as default_transaction_isolation__). + +.. __: https://www.postgresql.org/docs/current/sql-set-transaction.html +.. __: https://www.postgresql.org/docs/current/runtime-config-client.html + #GUC-DEFAULT-TRANSACTION-ISOLATION + +In order to set these parameters you can use the connection attributes +`~Connection.isolation_level`, `~Connection.read_only`, +`~Connection.deferrable`. For async connections you must use the equivalent +`~AsyncConnection.set_isolation_level()` method and similar. The parameters +can only be changed if there isn't a transaction already active on the +connection. + +.. warning:: + + Applications running at `~IsolationLevel.REPEATABLE_READ` or + `~IsolationLevel.SERIALIZABLE` isolation level are exposed to serialization + failures. `In certain concurrent update cases`__, PostgreSQL will raise an + exception looking like:: + + psycopg2.errors.SerializationFailure: could not serialize access + due to concurrent update + + In this case the application must be prepared to repeat the operation that + caused the exception. + + .. __: https://www.postgresql.org/docs/current/transaction-iso.html + #XACT-REPEATABLE-READ + + +.. index:: + pair: Two-phase commit; Transaction + +.. _two-phase-commit: + +Two-Phase Commit protocol support +--------------------------------- + +.. versionadded:: 3.1 + +Psycopg exposes the two-phase commit features available in PostgreSQL +implementing the `two-phase commit extensions`__ proposed by the DBAPI. + +The DBAPI model of two-phase commit is inspired by the `XA specification`__, +according to which transaction IDs are formed from three components: + +- a format ID (non-negative 32 bit integer) +- a global transaction ID (string not longer than 64 bytes) +- a branch qualifier (string not longer than 64 bytes) + +For a particular global transaction, the first two components will be the same +for all the resources. Every resource will be assigned a different branch +qualifier. + +According to the DBAPI specification, a transaction ID is created using the +`Connection.xid()` method. Once you have a transaction id, a distributed +transaction can be started with `Connection.tpc_begin()`, prepared using +`~Connection.tpc_prepare()` and completed using `~Connection.tpc_commit()` or +`~Connection.tpc_rollback()`. Transaction IDs can also be retrieved from the +database using `~Connection.tpc_recover()` and completed using the above +`!tpc_commit()` and `!tpc_rollback()`. + +PostgreSQL doesn't follow the XA standard though, and the ID for a PostgreSQL +prepared transaction can be any string up to 200 characters long. Psycopg's +`Xid` objects can represent both XA-style transactions IDs (such as the ones +created by the `!xid()` method) and PostgreSQL transaction IDs identified by +an unparsed string. + +The format in which the Xids are converted into strings passed to the +database is the same employed by the `PostgreSQL JDBC driver`__: this should +allow interoperation between tools written in Python and in Java. For example +a recovery tool written in Python would be able to recognize the components of +transactions produced by a Java program. + +For further details see the documentation for the :ref:`tpc-methods`. + +.. __: https://www.python.org/dev/peps/pep-0249/#optional-two-phase-commit-extensions +.. __: https://publications.opengroup.org/c193 +.. __: https://jdbc.postgresql.org/ diff --git a/docs/basic/usage.rst b/docs/basic/usage.rst new file mode 100644 index 0000000..6c69fe8 --- /dev/null +++ b/docs/basic/usage.rst @@ -0,0 +1,232 @@ +.. currentmodule:: psycopg + +.. _module-usage: + +Basic module usage +================== + +The basic Psycopg usage is common to all the database adapters implementing +the `DB-API`__ protocol. Other database adapters, such as the builtin +`sqlite3` or `psycopg2`, have roughly the same pattern of interaction. + +.. __: https://www.python.org/dev/peps/pep-0249/ + + +.. index:: + pair: Example; Usage + +.. _usage: + +Main objects in Psycopg 3 +------------------------- + +Here is an interactive session showing some of the basic commands: + +.. code:: python + + # Note: the module name is psycopg, not psycopg3 + import psycopg + + # Connect to an existing database + with psycopg.connect("dbname=test user=postgres") as conn: + + # Open a cursor to perform database operations + with conn.cursor() as cur: + + # Execute a command: this creates a new table + cur.execute(""" + CREATE TABLE test ( + id serial PRIMARY KEY, + num integer, + data text) + """) + + # Pass data to fill a query placeholders and let Psycopg perform + # the correct conversion (no SQL injections!) + cur.execute( + "INSERT INTO test (num, data) VALUES (%s, %s)", + (100, "abc'def")) + + # Query the database and obtain data as Python objects. + cur.execute("SELECT * FROM test") + cur.fetchone() + # will return (1, 100, "abc'def") + + # You can use `cur.fetchmany()`, `cur.fetchall()` to return a list + # of several records, or even iterate on the cursor + for record in cur: + print(record) + + # Make the changes to the database persistent + conn.commit() + + +In the example you can see some of the main objects and methods and how they +relate to each other: + +- The function `~Connection.connect()` creates a new database session and + returns a new `Connection` instance. `AsyncConnection.connect()` + creates an `asyncio` connection instead. + +- The `~Connection` class encapsulates a database session. It allows to: + + - create new `~Cursor` instances using the `~Connection.cursor()` method to + execute database commands and queries, + + - terminate transactions using the methods `~Connection.commit()` or + `~Connection.rollback()`. + +- The class `~Cursor` allows interaction with the database: + + - send commands to the database using methods such as `~Cursor.execute()` + and `~Cursor.executemany()`, + + - retrieve data from the database, iterating on the cursor or using methods + such as `~Cursor.fetchone()`, `~Cursor.fetchmany()`, `~Cursor.fetchall()`. + +- Using these objects as context managers (i.e. using `!with`) will make sure + to close them and free their resources at the end of the block (notice that + :ref:`this is different from psycopg2 <diff-with>`). + + +.. seealso:: + + A few important topics you will have to deal with are: + + - :ref:`query-parameters`. + - :ref:`types-adaptation`. + - :ref:`transactions`. + + +Shortcuts +--------- + +The pattern above is familiar to `!psycopg2` users. However, Psycopg 3 also +exposes a few simple extensions which make the above pattern leaner: + +- the `Connection` objects exposes an `~Connection.execute()` method, + equivalent to creating a cursor, calling its `~Cursor.execute()` method, and + returning it. + + .. code:: + + # In Psycopg 2 + cur = conn.cursor() + cur.execute(...) + + # In Psycopg 3 + cur = conn.execute(...) + +- The `Cursor.execute()` method returns `!self`. This means that you can chain + a fetch operation, such as `~Cursor.fetchone()`, to the `!execute()` call: + + .. code:: + + # In Psycopg 2 + cur.execute(...) + record = cur.fetchone() + + cur.execute(...) + for record in cur: + ... + + # In Psycopg 3 + record = cur.execute(...).fetchone() + + for record in cur.execute(...): + ... + +Using them together, in simple cases, you can go from creating a connection to +using a result in a single expression: + +.. code:: + + print(psycopg.connect(DSN).execute("SELECT now()").fetchone()[0]) + # 2042-07-12 18:15:10.706497+01:00 + + +.. index:: + pair: Connection; `!with` + +.. _with-connection: + +Connection context +------------------ + +Psycopg 3 `Connection` can be used as a context manager: + +.. code:: python + + with psycopg.connect() as conn: + ... # use the connection + + # the connection is now closed + +When the block is exited, if there is a transaction open, it will be +committed. If an exception is raised within the block the transaction is +rolled back. In both cases the connection is closed. It is roughly the +equivalent of: + +.. code:: python + + conn = psycopg.connect() + try: + ... # use the connection + except BaseException: + conn.rollback() + else: + conn.commit() + finally: + conn.close() + +.. note:: + This behaviour is not what `!psycopg2` does: in `!psycopg2` :ref:`there is + no final close() <pg2:with>` and the connection can be used in several + `!with` statements to manage different transactions. This behaviour has + been considered non-standard and surprising so it has been replaced by the + more explicit `~Connection.transaction()` block. + +Note that, while the above pattern is what most people would use, `connect()` +doesn't enter a block itself, but returns an "un-entered" connection, so that +it is still possible to use a connection regardless of the code scope and the +developer is free to use (and responsible for calling) `~Connection.commit()`, +`~Connection.rollback()`, `~Connection.close()` as and where needed. + +.. warning:: + If a connection is just left to go out of scope, the way it will behave + with or without the use of a `!with` block is different: + + - if the connection is used without a `!with` block, the server will find + a connection closed INTRANS and roll back the current transaction; + + - if the connection is used with a `!with` block, there will be an + explicit COMMIT and the operations will be finalised. + + You should use a `!with` block when your intention is just to execute a + set of operations and then committing the result, which is the most usual + thing to do with a connection. If your connection life cycle and + transaction pattern is different, and want more control on it, the use + without `!with` might be more convenient. + + See :ref:`transactions` for more information. + +`AsyncConnection` can be also used as context manager, using ``async with``, +but be careful about its quirkiness: see :ref:`async-with` for details. + + +Adapting pyscopg to your program +-------------------------------- + +The above :ref:`pattern of use <usage>` only shows the default behaviour of +the adapter. Psycopg can be customised in several ways, to allow the smoothest +integration between your Python program and your PostgreSQL database: + +- If your program is concurrent and based on `asyncio` instead of on + threads/processes, you can use :ref:`async connections and cursors <async>`. + +- If you want to customise the objects that the cursor returns, instead of + receiving tuples, you can specify your :ref:`row factories <row-factories>`. + +- If you want to customise how Python values and PostgreSQL types are mapped + into each other, beside the :ref:`basic type mapping <types-adaptation>`, + you can :ref:`configure your types <adaptation>`. |