Re: Best strategy for bulk inserts where some violate unique constraint?

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Best strategy for bulk inserts where some violate unique constraint?
Дата
Msg-id 52798192.7040000@gmail.com
обсуждение исходный текст
Ответ на Re: Best strategy for bulk inserts where some violate unique constraint?  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Ответы Re: Best strategy for bulk inserts where some violate unique constraint?  (Denis Papathanasiou <denis.papathanasiou@gmail.com>)
Список psycopg
On 11/05/2013 03:31 PM, Daniele Varrazzo wrote:

>
> As the last example do you mean the executemany() example? That is not
> going to be much faster than repeated execute().
>
> The easiest thing you can do is to switch to autocommit=True and do
> repeated execute with insert. If one fail you can just ignore the
> IntegrityError and go on.
>
> About as easy you can prepare a statement and execute it repeatedly
> using PREPARE/EXECUTE: see
> <http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
> no builtin support for that in psycopg but you can just execute()
> these statements. It may save you something. You can also take a look
> at this example of a PREPAREing cursor:
> <https://gist.github.com/dvarrazzo/3797445>.
>
> However, the fastest way to insert data into Postgres is COPY, see
> <http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
> have to present your data as a file. I can't remember what happens
> when a record fails the integrity test: I think the other would still
> be inserted but you will have to check.

It will fail.

http://www.postgresql.org/docs/9.3/interactive/sql-copy.html

COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space.

I would suggest checking out pg_loader it is designed to deal with this
scenario.


A much more robust strategy is
> to create a temporary table with the right schema but without
> constraints, load the data there using COPY and then move the data to
> the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
> ... and specify a condition to avoid the records that would fail the
> constraint.
>
> I would go for COPY, it's by far faster than execute[many], even
> including prepare.
>
> -- Daniele
>
>


--
Adrian Klaver
adrian.klaver@gmail.com


В списке psycopg по дате отправления:

Предыдущее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Best strategy for bulk inserts where some violate unique constraint?
Следующее
От: Denis Papathanasiou
Дата:
Сообщение: Re: Best strategy for bulk inserts where some violate unique constraint?