Re: Best strategy for bulk inserts where some violate unique constraint?

Поиск
Список
Период
Сортировка
От Denis Papathanasiou
Тема Re: Best strategy for bulk inserts where some violate unique constraint?
Дата
Msg-id CAEpnaGwf4DazeGy_tGTPZg7w9wixJvufDM0HJNkBLZ-zB2PHmw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Best strategy for bulk inserts where some violate unique constraint?  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Ответы Re: Best strategy for bulk inserts where some violate unique constraint?
Список psycopg
On Tue, Nov 5, 2013 at 6:31 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
As the last example do you mean the executemany() example?

Yes, I was referring to this code example:

> If you had the following:
> namedict = ({"first_name":"Joshua", "last_name":"Drake"},
>             {"first_name":"Steven", "last_name":"Foo"},
>             {"first_name":"David", "last_name":"Bar"})
> You could easily insert all three rows within the dictionary by using:
> cur = conn.cursor()
> cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
> The cur.executemany statement will automatically iterate through the dictionary and execute the INSERT query for each row.

What I wound up doing in my own code is this:

    # process each {columns=values} dict
    try:
        cur.executemany("INSERT INTO ...", insert_dict)
    except psycopg2.IntegrityError:
        conn.commit()

Hoping that by catching the unique constraint error with a commit, it would at least preserve the inserts from the dict that do not violate the table constraints, but that is not how postgres works.
 
That is not
going to be much faster than repeated execute().

Really?
 
The easiest thing you can do is to switch to autocommit=True and do
repeated execute with insert. If one fail you can just ignore the
IntegrityError and go on.

Ok, I'll try that, thanks.  

About as easy you can prepare a statement and execute it repeatedly
using PREPARE/EXECUTE: see
<http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
no builtin support for that in psycopg but you can just execute()
these statements. It may save you something. You can also take a look
at this example of a PREPAREing cursor:
<https://gist.github.com/dvarrazzo/3797445>.

Thanks for the example.
 
However, the fastest way to insert data into Postgres is COPY, see
<http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
have to present your data as a file. I can't remember what happens
when a record fails the integrity test: I think the other would still
be inserted but you will have to check. A much more robust strategy is
to create a temporary table with the right schema but without
constraints, load the data there using COPY and then move the data to
the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
... and specify a condition to avoid the records that would fail the
constraint.

I would go for COPY, it's by far faster than execute[many], even
including prepare.

I saw Adrian's reply about how this still won't do what I need in terms of ignoring the constraint violators and keeping the valid entries; will switching autocommit to True first have the desired effect? 

В списке psycopg по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Best strategy for bulk inserts where some violate unique constraint?
Следующее
От: Denis Papathanasiou
Дата:
Сообщение: Re: Best strategy for bulk inserts where some violate unique constraint?