diff options
Diffstat (limited to 'docs/basic/copy.rst')
-rw-r--r-- | docs/basic/copy.rst | 212 |
1 files changed, 212 insertions, 0 deletions
diff --git a/docs/basic/copy.rst b/docs/basic/copy.rst new file mode 100644 index 0000000..2bb4498 --- /dev/null +++ b/docs/basic/copy.rst @@ -0,0 +1,212 @@ +.. currentmodule:: psycopg + +.. index:: + pair: COPY; SQL command + +.. _copy: + +Using COPY TO and COPY FROM +=========================== + +Psycopg allows to operate with `PostgreSQL COPY protocol`__. :sql:`COPY` is +one of the most efficient ways to load data into the database (and to modify +it, with some SQL creativity). + +.. __: https://www.postgresql.org/docs/current/sql-copy.html + +Copy is supported using the `Cursor.copy()` method, passing it a query of the +form :sql:`COPY ... FROM STDIN` or :sql:`COPY ... TO STDOUT`, and managing the +resulting `Copy` object in a `!with` block: + +.. code:: python + + with cursor.copy("COPY table_name (col1, col2) FROM STDIN") as copy: + # pass data to the 'copy' object using write()/write_row() + +You can compose a COPY statement dynamically by using objects from the +`psycopg.sql` module: + +.. code:: python + + with cursor.copy( + sql.SQL("COPY {} TO STDOUT").format(sql.Identifier("table_name")) + ) as copy: + # read data from the 'copy' object using read()/read_row() + +.. versionchanged:: 3.1 + + You can also pass parameters to `!copy()`, like in `~Cursor.execute()`: + + .. code:: python + + with cur.copy("COPY (SELECT * FROM table_name LIMIT %s) TO STDOUT", (3,)) as copy: + # expect no more than three records + +The connection is subject to the usual transaction behaviour, so, unless the +connection is in autocommit, at the end of the COPY operation you will still +have to commit the pending changes and you can still roll them back. See +:ref:`transactions` for details. + + +.. _copy-in-row: + +Writing data row-by-row +----------------------- + +Using a copy operation you can load data into the database from any Python +iterable (a list of tuples, or any iterable of sequences): the Python values +are adapted as they would be in normal querying. To perform such operation use +a :sql:`COPY ... FROM STDIN` with `Cursor.copy()` and use `~Copy.write_row()` +on the resulting object in a `!with` block. On exiting the block the +operation will be concluded: + +.. code:: python + + records = [(10, 20, "hello"), (40, None, "world")] + + with cursor.copy("COPY sample (col1, col2, col3) FROM STDIN") as copy: + for record in records: + copy.write_row(record) + +If an exception is raised inside the block, the operation is interrupted and +the records inserted so far are discarded. + +In order to read or write from `!Copy` row-by-row you must not specify +:sql:`COPY` options such as :sql:`FORMAT CSV`, :sql:`DELIMITER`, :sql:`NULL`: +please leave these details alone, thank you :) + + +.. _copy-out-row: + +Reading data row-by-row +----------------------- + +You can also do the opposite, reading rows out of a :sql:`COPY ... TO STDOUT` +operation, by iterating on `~Copy.rows()`. However this is not something you +may want to do normally: usually the normal query process will be easier to +use. + +PostgreSQL, currently, doesn't give complete type information on :sql:`COPY +TO`, so the rows returned will have unparsed data, as strings or bytes, +according to the format. + +.. code:: python + + with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy: + for row in copy.rows(): + print(row) # return unparsed data: ('10', '2046-12-24') + +You can improve the results by using `~Copy.set_types()` before reading, but +you have to specify them yourself. + +.. code:: python + + with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy: + copy.set_types(["int4", "date"]) + for row in copy.rows(): + print(row) # (10, datetime.date(2046, 12, 24)) + + +.. _copy-block: + +Copying block-by-block +---------------------- + +If data is already formatted in a way suitable for copy (for instance because +it is coming from a file resulting from a previous `COPY TO` operation) it can +be loaded into the database using `Copy.write()` instead. + +.. code:: python + + with open("data", "r") as f: + with cursor.copy("COPY data FROM STDIN") as copy: + while data := f.read(BLOCK_SIZE): + copy.write(data) + +In this case you can use any :sql:`COPY` option and format, as long as the +input data is compatible with what the operation in `!copy()` expects. Data +can be passed as `!str`, if the copy is in :sql:`FORMAT TEXT`, or as `!bytes`, +which works with both :sql:`FORMAT TEXT` and :sql:`FORMAT BINARY`. + +In order to produce data in :sql:`COPY` format you can use a :sql:`COPY ... TO +STDOUT` statement and iterate over the resulting `Copy` object, which will +produce a stream of `!bytes` objects: + +.. code:: python + + with open("data.out", "wb") as f: + with cursor.copy("COPY table_name TO STDOUT") as copy: + for data in copy: + f.write(data) + + +.. _copy-binary: + +Binary copy +----------- + +Binary copy is supported by specifying :sql:`FORMAT BINARY` in the :sql:`COPY` +statement. In order to import binary data using `~Copy.write_row()`, all the +types passed to the database must have a binary dumper registered; this is not +necessary if the data is copied :ref:`block-by-block <copy-block>` using +`~Copy.write()`. + +.. warning:: + + PostgreSQL is particularly finicky when loading data in binary mode and + will apply **no cast rules**. This means, for example, that passing the + value 100 to an `integer` column **will fail**, because Psycopg will pass + it as a `smallint` value, and the server will reject it because its size + doesn't match what expected. + + You can work around the problem using the `~Copy.set_types()` method of + the `!Copy` object and specifying carefully the types to load. + +.. seealso:: See :ref:`binary-data` for further info about binary querying. + + +.. _copy-async: + +Asynchronous copy support +------------------------- + +Asynchronous operations are supported using the same patterns as above, using +the objects obtained by an `AsyncConnection`. For instance, if `!f` is an +object supporting an asynchronous `!read()` method returning :sql:`COPY` data, +a fully-async copy operation could be: + +.. code:: python + + async with cursor.copy("COPY data FROM STDIN") as copy: + while data := await f.read(): + await copy.write(data) + +The `AsyncCopy` object documentation describes the signature of the +asynchronous methods and the differences from its sync `Copy` counterpart. + +.. seealso:: See :ref:`async` for further info about using async objects. + + +Example: copying a table across servers +--------------------------------------- + +In order to copy a table, or a portion of a table, across servers, you can use +two COPY operations on two different connections, reading from the first and +writing to the second. + +.. code:: python + + with psycopg.connect(dsn_src) as conn1, psycopg.connect(dsn_tgt) as conn2: + with conn1.cursor().copy("COPY src TO STDOUT (FORMAT BINARY)") as copy1: + with conn2.cursor().copy("COPY tgt FROM STDIN (FORMAT BINARY)") as copy2: + for data in copy1: + copy2.write(data) + +Using :sql:`FORMAT BINARY` usually gives a performance boost, but it only +works if the source and target schema are *perfectly identical*. If the tables +are only *compatible* (for example, if you are copying an :sql:`integer` field +into a :sql:`bigint` destination field) you should omit the `BINARY` option and +perform a text-based copy. See :ref:`copy-binary` for details. + +The same pattern can be adapted to use :ref:`async objects <async>` in order +to perform an :ref:`async copy <copy-async>`. |