Re: Best strategy for bulk inserts where some violate unique constraint?

Поиск
Список
Период
Сортировка
От Daniele Varrazzo
Тема Re: Best strategy for bulk inserts where some violate unique constraint?
Дата
Msg-id CA+mi_8bqAV_zK3kZr=yYpCwjkgJ8ie5ABXZtbwQv07G7L02paw@mail.gmail.com
обсуждение исходный текст
Ответ на Best strategy for bulk inserts where some violate unique constraint?  (Denis Papathanasiou <denis.papathanasiou@gmail.com>)
Ответы Re: Best strategy for bulk inserts where some violate unique constraint?  (Adrian Klaver <adrian.klaver@gmail.com>)
Re: Best strategy for bulk inserts where some violate unique constraint?  (Denis Papathanasiou <denis.papathanasiou@gmail.com>)
Список psycopg
On Tue, Nov 5, 2013 at 8:14 PM, Denis Papathanasiou
<denis.papathanasiou@gmail.com> wrote:
> I'm using Pyscopg2 to insert multiple rows at once, using a dictionary as
> shown at the bottom of this page:
> https://wiki.postgresql.org/wiki/Psycopg2_Tutorial
>
> I notice that even if just one entry violates a table unique constraint (an
> IntegrityError in psycopg2 parlance), the entire contents of the dictionary
> do not get inserted into the table.
>
> Is there any way to tell postgres: it's ok to reject individual entries that
> may violate table unique constraints, but at the same time allow those which
> do not?
>
> It's too expensive for me to do the inserts one at a time where I could just
> catch the psycopg2.IntegrityError.

As the last example do you mean the executemany() example? That is not
going to be much faster than repeated execute().

The easiest thing you can do is to switch to autocommit=True and do
repeated execute with insert. If one fail you can just ignore the
IntegrityError and go on.

About as easy you can prepare a statement and execute it repeatedly
using PREPARE/EXECUTE: see
<http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
no builtin support for that in psycopg but you can just execute()
these statements. It may save you something. You can also take a look
at this example of a PREPAREing cursor:
<https://gist.github.com/dvarrazzo/3797445>.

However, the fastest way to insert data into Postgres is COPY, see
<http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
have to present your data as a file. I can't remember what happens
when a record fails the integrity test: I think the other would still
be inserted but you will have to check. A much more robust strategy is
to create a temporary table with the right schema but without
constraints, load the data there using COPY and then move the data to
the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
... and specify a condition to avoid the records that would fail the
constraint.

I would go for COPY, it's by far faster than execute[many], even
including prepare.

-- Daniele


В списке psycopg по дате отправления:

Предыдущее
От: Denis Papathanasiou
Дата:
Сообщение: Best strategy for bulk inserts where some violate unique constraint?
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Best strategy for bulk inserts where some violate unique constraint?