Using COPY TO and COPY FROM#

Psycopg allows to operate with PostgreSQL COPY protocol. COPY is one of the most efficient ways to load data into the database (and to modify it, with some SQL creativity).

Copy is supported using the Cursor.copy() method, passing it a query of the form COPY ... FROM STDIN or COPY ... TO STDOUT, and managing the resulting Copy object in a with block:

with cursor.copy("COPY table_name (col1, col2) FROM STDIN") as copy:
    # pass data to the 'copy' object using write()/write_row()

You can compose a COPY statement dynamically by using objects from the psycopg.sql module:

with cursor.copy(
    sql.SQL("COPY {} TO STDOUT").format(sql.Identifier("table_name"))
) as copy:
    # read data from the 'copy' object using read()/read_row()

Changed in version 3.1: You can also pass parameters to copy(), like in execute():

with cur.copy("COPY (SELECT * FROM table_name LIMIT %s) TO STDOUT", (3,)) as copy:
    # expect no more than three records

The connection is subject to the usual transaction behaviour, so, unless the connection is in autocommit, at the end of the COPY operation you will still have to commit the pending changes and you can still roll them back. See Transactions management for details.

Writing data row-by-row#

Using a copy operation you can load data into the database from any Python iterable (a list of tuples, or any iterable of sequences): the Python values are adapted as they would be in normal querying. To perform such operation use a COPY ... FROM STDIN with Cursor.copy() and use write_row() on the resulting object in a with block. On exiting the block the operation will be concluded:

records = [(10, 20, "hello"), (40, None, "world")]

with cursor.copy("COPY sample (col1, col2, col3) FROM STDIN") as copy:
    for record in records:
        copy.write_row(record)

If an exception is raised inside the block, the operation is interrupted and the records inserted so far are discarded.

In order to read or write from Copy row-by-row you must not specify COPY options such as FORMAT CSV, DELIMITER, NULL: please leave these details alone, thank you :)

Reading data row-by-row#

You can also do the opposite, reading rows out of a COPY ... TO STDOUT operation, by iterating on rows(). However this is not something you may want to do normally: usually the normal query process will be easier to use.

PostgreSQL, currently, doesn’t give complete type information on COPY TO, so the rows returned will have unparsed data, as strings or bytes, according to the format.

with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy:
    for row in copy.rows():
        print(row)  # return unparsed data: ('10', '2046-12-24')

You can improve the results by using set_types() before reading, but you have to specify them yourself.

with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy:
    copy.set_types(["int4", "date"])
    for row in copy.rows():
        print(row)  # (10, datetime.date(2046, 12, 24))

Copying block-by-block#

If data is already formatted in a way suitable for copy (for instance because it is coming from a file resulting from a previous COPY TO operation) it can be loaded into the database using Copy.write() instead.

with open("data", "r") as f:
    with cursor.copy("COPY data FROM STDIN") as copy:
        while data := f.read(BLOCK_SIZE):
            copy.write(data)

In this case you can use any COPY option and format, as long as the input data is compatible with what the operation in copy() expects. Data can be passed as str, if the copy is in FORMAT TEXT, or as bytes, which works with both FORMAT TEXT and FORMAT BINARY.

In order to produce data in COPY format you can use a COPY ... TO STDOUT statement and iterate over the resulting Copy object, which will produce a stream of bytes objects:

with open("data.out", "wb") as f:
    with cursor.copy("COPY table_name TO STDOUT") as copy:
        for data in copy:
            f.write(data)

Binary copy#

Binary copy is supported by specifying FORMAT BINARY in the COPY statement. In order to import binary data using write_row(), all the types passed to the database must have a binary dumper registered; this is not necessary if the data is copied block-by-block using write().

Warning

PostgreSQL is particularly finicky when loading data in binary mode and will apply no cast rules. This means, for example, that passing the value 100 to an integer column will fail, because Psycopg will pass it as a smallint value, and the server will reject it because its size doesn’t match what expected.

You can work around the problem using the set_types() method of the Copy object and specifying carefully the types to load.

Asynchronous copy support#

Asynchronous operations are supported using the same patterns as above, using the objects obtained by an AsyncConnection. For instance, if f is an object supporting an asynchronous read() method returning COPY data, a fully-async copy operation could be:

async with cursor.copy("COPY data FROM STDIN") as copy:
    while data := await f.read():
        await copy.write(data)

The AsyncCopy object documentation describes the signature of the asynchronous methods and the differences from its sync Copy counterpart.

Example: copying a table across servers#

In order to copy a table, or a portion of a table, across servers, you can use two COPY operations on two different connections, reading from the first and writing to the second.

with psycopg.connect(dsn_src) as conn1, psycopg.connect(dsn_tgt) as conn2:
    with conn1.cursor().copy("COPY src TO STDOUT (FORMAT BINARY)") as copy1:
        with conn2.cursor().copy("COPY tgt FROM STDIN (FORMAT BINARY)") as copy2:
            for data in copy1:
                copy2.write(data)

Using FORMAT BINARY usually gives a performance boost, but it only works if the source and target schema are perfectly identical. If the tables are only compatible (for example, if you are copying an integer field into a bigint destination field) you should omit the BINARY option and perform a text-based copy. See Binary copy for details.

The same pattern can be adapted to use async objects in order to perform an async copy.