Adding upstream version 3.1.7.upstream/3.1.7 upstream

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-04 17:41:08 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-04 17:41:08 +0000
commit: 506ed8899b3a97e512be3fd6d44d5b11463bf9bf (patch)
tree: 808913770c5e6935d3714058c2a066c57b4632ec /docs/advanced/pipeline.rst
parent: Initial commit. (diff)
download: psycopg3-506ed8899b3a97e512be3fd6d44d5b11463bf9bf.tar.xz
psycopg3-506ed8899b3a97e512be3fd6d44d5b11463bf9bf.zip
1 files changed, 324 insertions, 0 deletions
diff --git a/docs/advanced/pipeline.rst b/docs/advanced/pipeline.rst
new file mode 100644
index 0000000..980fea7
--- /dev/null
+++ b/docs/advanced/pipeline.rst
@@ -0,0 +1,324 @@
+.. currentmodule:: psycopg
+
+.. _pipeline-mode:
+
+Pipeline mode support
+=====================
+
+.. versionadded:: 3.1
+
+The *pipeline mode* allows PostgreSQL client applications to send a query
+without having to read the result of the previously sent query. Taking
+advantage of the pipeline mode, a client will wait less for the server, since
+multiple queries/results can be sent/received in a single network roundtrip.
+Pipeline mode can provide a significant performance boost to the application.
+
+Pipeline mode is most useful when the server is distant, i.e., network latency
+(“ping time”) is high, and also when many small operations are being performed
+in rapid succession. There is usually less benefit in using pipelined commands
+when each query takes many multiples of the client/server round-trip time to
+execute. A 100-statement operation run on a server 300 ms round-trip-time away
+would take 30 seconds in network latency alone without pipelining; with
+pipelining it may spend as little as 0.3 s waiting for results from the
+server.
+
+The server executes statements, and returns results, in the order the client
+sends them. The server will begin executing the commands in the pipeline
+immediately, not waiting for the end of the pipeline. Note that results are
+buffered on the server side; the server flushes that buffer when a
+:ref:`synchronization point <pipeline-sync>` is established.
+
+.. seealso::
+
+    The PostgreSQL documentation about:
+
+    - `pipeline mode`__
+    - `extended query message flow`__
+
+    contains many details around when it is most useful to use the pipeline
+    mode and about errors management and interaction with transactions.
+
+    .. __: https://www.postgresql.org/docs/current/libpq-pipeline-mode.html
+    .. __: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY
+
+
+Client-server messages flow
+---------------------------
+
+In order to understand better how the pipeline mode works, we should take a
+closer look at the `PostgreSQL client-server message flow`__.
+
+During normal querying, each statement is transmitted by the client to the
+server as a stream of request messages, terminating with a **Sync** message to
+tell it that it should process the messages sent so far. The server will
+execute the statement and describe the results back as a stream of messages,
+terminating with a **ReadyForQuery**, telling the client that it may now send a
+new query.
+
+For example, the statement (returning no result):
+
+.. code:: python
+
+    conn.execute("INSERT INTO mytable (data) VALUES (%s)", ["hello"])
+
+results in the following two groups of messages:
+
+.. table::
+    :align: left
+
+    +---------------+-----------------------------------------------------------+
+    | Direction     | Message                                                   |
+    +===============+===========================================================+
+    | Python        | - Parse ``INSERT INTO ... (VALUE $1)`` (skipped if        |
+    |               |   :ref:`the statement is prepared <prepared-statements>`) |
+    | |>|           | - Bind ``'hello'``                                        |
+    |               | - Describe                                                |
+    | PostgreSQL    | - Execute                                                 |
+    |               | - Sync                                                    |
+    +---------------+-----------------------------------------------------------+
+    | PostgreSQL    | - ParseComplete                                           |
+    |               | - BindComplete                                            |
+    | |<|           | - NoData                                                  |
+    |               | - CommandComplete ``INSERT 0 1``                          |
+    | Python        | - ReadyForQuery                                           |
+    +---------------+-----------------------------------------------------------+
+
+and the query:
+
+.. code:: python
+
+    conn.execute("SELECT data FROM mytable WHERE id = %s", [1])
+
+results in the two groups of messages:
+
+.. table::
+    :align: left
+
+    +---------------+-----------------------------------------------------------+
+    | Direction     | Message                                                   |
+    +===============+===========================================================+
+    | Python        | - Parse ``SELECT data FROM mytable WHERE id = $1``        |
+    |               | - Bind ``1``                                              |
+    | |>|           | - Describe                                                |
+    |               | - Execute                                                 |
+    | PostgreSQL    | - Sync                                                    |
+    +---------------+-----------------------------------------------------------+
+    | PostgreSQL    | - ParseComplete                                           |
+    |               | - BindComplete                                            |
+    | |<|           | - RowDescription    ``data``                              |
+    |               | - DataRow           ``hello``                             |
+    | Python        | - CommandComplete ``SELECT 1``                            |
+    |               | - ReadyForQuery                                           |
+    +---------------+-----------------------------------------------------------+
+
+The two statements, sent consecutively, pay the communication overhead four
+times, once per leg.
+
+The pipeline mode allows the client to combine several operations in longer
+streams of messages to the server, then to receive more than one response in a
+single batch. If we execute the two operations above in a pipeline:
+
+.. code:: python
+
+    with conn.pipeline():
+        conn.execute("INSERT INTO mytable (data) VALUES (%s)", ["hello"])
+        conn.execute("SELECT data FROM mytable WHERE id = %s", [1])
+
+they will result in a single roundtrip between the client and the server:
+
+.. table::
+    :align: left
+
+    +---------------+-----------------------------------------------------------+
+    | Direction     | Message                                                   |
+    +===============+===========================================================+
+    | Python        | - Parse ``INSERT INTO ... (VALUE $1)``                    |
+    |               | - Bind ``'hello'``                                        |
+    | |>|           | - Describe                                                |
+    |               | - Execute                                                 |
+    | PostgreSQL    | - Parse ``SELECT data FROM mytable WHERE id = $1``        |
+    |               | - Bind ``1``                                              |
+    |               | - Describe                                                |
+    |               | - Execute                                                 |
+    |               | - Sync (sent only once)                                   |
+    +---------------+-----------------------------------------------------------+
+    | PostgreSQL    | - ParseComplete                                           |
+    |               | - BindComplete                                            |
+    | |<|           | - NoData                                                  |
+    |               | - CommandComplete ``INSERT 0 1``                          |
+    | Python        | - ParseComplete                                           |
+    |               | - BindComplete                                            |
+    |               | - RowDescription    ``data``                              |
+    |               | - DataRow           ``hello``                             |
+    |               | - CommandComplete ``SELECT 1``                            |
+    |               | - ReadyForQuery (sent only once)                          |
+    +---------------+-----------------------------------------------------------+
+
+.. |<| unicode:: U+25C0
+.. |>| unicode:: U+25B6
+
+.. __: https://www.postgresql.org/docs/current/protocol-flow.html
+
+
+.. _pipeline-usage:
+
+Pipeline mode usage
+-------------------
+
+Psycopg supports the pipeline mode via the `Connection.pipeline()` method. The
+method is a context manager: entering the ``with`` block yields a `Pipeline`
+object. At the end of block, the connection resumes the normal operation mode.
+
+Within the pipeline block, you can use normally one or more cursors to execute
+several operations, using `Connection.execute()`, `Cursor.execute()` and
+`~Cursor.executemany()`.
+
+.. code:: python
+
+    >>> with conn.pipeline():
+    ...     conn.execute("INSERT INTO mytable VALUES (%s)", ["hello"])
+    ...     with conn.cursor() as cur:
+    ...         cur.execute("INSERT INTO othertable VALUES (%s)", ["world"])
+    ...         cur.executemany(
+    ...             "INSERT INTO elsewhere VALUES (%s)",
+    ...             [("one",), ("two",), ("four",)])
+
+Unlike in normal mode, Psycopg will not wait for the server to receive the
+result of each query; the client will receive results in batches when the
+server flushes it output buffer.
+
+When a flush (or a sync) is performed, all pending results are sent back to
+the cursors which executed them. If a cursor had run more than one query, it
+will receive more than one result; results after the first will be available,
+in their execution order, using `~Cursor.nextset()`:
+
+.. code:: python
+
+    >>> with conn.pipeline():
+    ...     with conn.cursor() as cur:
+    ...        cur.execute("INSERT INTO mytable (data) VALUES (%s) RETURNING *", ["hello"])
+    ...        cur.execute("INSERT INTO mytable (data) VALUES (%s) RETURNING *", ["world"])
+    ...        while True:
+    ...            print(cur.fetchall())
+    ...            if not cur.nextset():
+    ...                break
+
+    [(1, 'hello')]
+    [(2, 'world')]
+
+If any statement encounters an error, the server aborts the current
+transaction and will not execute any subsequent command in the queue until the
+next :ref:`synchronization point <pipeline-sync>`; a `~errors.PipelineAborted`
+exception is raised for each such command. Query processing resumes after the
+synchronization point.
+
+.. warning::
+
+    Certain features are not available in pipeline mode, including:
+
+    - COPY is not supported in pipeline mode by PostgreSQL.
+    - `Cursor.stream()` doesn't make sense in pipeline mode (its job is the
+      opposite of batching!)
+    - `ServerCursor` are currently not implemented in pipeline mode.
+
+.. note::
+
+    Starting from Psycopg 3.1, `~Cursor.executemany()` makes use internally of
+    the pipeline mode; as a consequence there is no need to handle a pipeline
+    block just to call `!executemany()` once.
+
+
+.. _pipeline-sync:
+
+Synchronization points
+----------------------
+
+Flushing query results to the client can happen either when a synchronization
+point is established by Psycopg:
+
+- using the `Pipeline.sync()` method;
+- on `Connection.commit()` or `~Connection.rollback()`;
+- at the end of a `!Pipeline` block;
+- possibly when opening a nested `!Pipeline` block;
+- using a fetch method such as `Cursor.fetchone()` (which only flushes the
+  query but doesn't issue a Sync and doesn't reset a pipeline state error).
+
+The server might perform a flush on its own initiative, for instance when the
+output buffer is full.
+
+Note that, even in :ref:`autocommit <autocommit>`, the server wraps the
+statements sent in pipeline mode in an implicit transaction, which will be
+only committed when the Sync is received. As such, a failure in a group of
+statements will probably invalidate the effect of statements executed after
+the previous Sync, and will propagate to the following Sync.
+
+For example, in the following block:
+
+.. code:: python
+
+    >>> with psycopg.connect(autocommit=True) as conn:
+    ...     with conn.pipeline() as p, conn.cursor() as cur:
+    ...         try:
+    ...             cur.execute("INSERT INTO mytable (data) VALUES (%s)", ["one"])
+    ...             cur.execute("INSERT INTO no_such_table (data) VALUES (%s)", ["two"])
+    ...             conn.execute("INSERT INTO mytable (data) VALUES (%s)", ["three"])
+    ...             p.sync()
+    ...         except psycopg.errors.UndefinedTable:
+    ...             pass
+    ...         cur.execute("INSERT INTO mytable (data) VALUES (%s)", ["four"])
+
+there will be an error in the block, ``relation "no_such_table" does not
+exist`` caused by the insert ``two``, but probably raised by the `!sync()`
+call. At at the end of the block, the table will contain:
+
+.. code:: text
+
+    =# SELECT * FROM mytable;
+    +----+------+
+    | id | data |
+    +----+------+
+    |  2 | four |
+    +----+------+
+    (1 row)
+
+because:
+
+- the value 1 of the sequence is consumed by the statement ``one``, but
+  the record discarded because of the error in the same implicit transaction;
+- the statement ``three`` is not executed because the pipeline is aborted (so
+  it doesn't consume a sequence item);
+- the statement ``four`` is executed with
+  success after the Sync has terminated the failed transaction.
+
+.. warning::
+
+    The exact Python statement where an exception caused by a server error is
+    raised is somewhat arbitrary: it depends on when the server flushes its
+    buffered result.
+
+    If you want to make sure that a group of statements is applied atomically
+    by the server, do make use of transaction methods such as
+    `~Connection.commit()` or `~Connection.transaction()`: these methods will
+    also sync the pipeline and raise an exception if there was any error in
+    the commands executed so far.
+
+
+The fine prints
+---------------
+
+.. warning::
+
+    The Pipeline mode is an experimental feature.
+
+    Its behaviour, especially around error conditions and concurrency, hasn't
+    been explored as much as the normal request-response messages pattern, and
+    its async nature makes it inherently more complex.
+
+    As we gain more experience and feedback (which is welcome), we might find
+    bugs and shortcomings forcing us to change the current interface or
+    behaviour.
+
+The pipeline mode is available on any currently supported PostgreSQL version,
+but, in order to make use of it, the client must use a libpq from PostgreSQL
+14 or higher. You can use `Pipeline.is_supported()` to make sure your client
+has the right library.
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-04 17:41:08 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-04 17:41:08 +0000
commit	506ed8899b3a97e512be3fd6d44d5b11463bf9bf (patch)
tree	808913770c5e6935d3714058c2a066c57b4632ec /docs/advanced/pipeline.rst
parent	Initial commit. (diff)
download	psycopg3-506ed8899b3a97e512be3fd6d44d5b11463bf9bf.tar.xz psycopg3-506ed8899b3a97e512be3fd6d44d5b11463bf9bf.zip