Обсуждение: cursor.executemany generates multiple INSERTs

Поиск
Список
Период
Сортировка

cursor.executemany generates multiple INSERTs

От
Idan Kamara
Дата:
Hi,

When using executemany to do a bulk insert, psycopg2 generates an
INSERT for each item in the given sequence of items.

This is a lot slower than a single INSERT using Postgres multirow
VALUES syntax, e.g.

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

Why doesn't psycopg2 use this syntax?


Re: cursor.executemany generates multiple INSERTs

От
Daniele Varrazzo
Дата:
On Mon, Nov 26, 2012 at 10:53 AM, Idan Kamara <idankk86@gmail.com> wrote:
> Hi,
>
> When using executemany to do a bulk insert, psycopg2 generates an
> INSERT for each item in the given sequence of items.
>
> This is a lot slower than a single INSERT using Postgres multirow
> VALUES syntax, e.g.
>
> INSERT INTO films (code, title, did, date_prod, kind) VALUES
>     ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
>     ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
>
> Why doesn't psycopg2 use this syntax?

Because psycopg's role is not to generate sql, but only to talk with
the database and to convert python types in sql syntax and back.

You can easily generate a string such as "INSERT INTO films (...)
VALUES %s, %s, %s ..." with n placeholders and pass n tuples as
argument: psycopg will convert the entire tuple in a syntax understood
by postgres.

In [15]: data = [
    ('B6717', 'Tampopo', 110, date(1985,02,10), 'Comedy'),
    ('HG120', 'The Dinner Game', 140, None, 'Comedy')]

In [18]: cur.mogrify("insert into foo values " + ','.join(["%s"] *
len(data)), data)
Out[18]: "insert into foo values ('B6717', 'Tampopo', 110,
'1985-02-10'::date, 'Comedy'),('HG120', 'The Dinner Game', 140, NULL,
'Comedy')"

But the "connective tissue" of the query must be generated by the
application, or by some higher level library you may decide to use.

Also note that the fastest way to insert values in the db is to use
COPY. In that area I'd say psycopg could do more, e.g. it would be
great to use a sequence of python tuples in place of a file-like
object to use as copy source, but that takes an entirely different
adaptation infrastructure in place.

-- Daniele


Re: cursor.executemany generates multiple INSERTs

От
Idan Kamara
Дата:
On Mon, Nov 26, 2012 at 1:51 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> On Mon, Nov 26, 2012 at 10:53 AM, Idan Kamara <idankk86@gmail.com> wrote:
>> Hi,
>>
>> When using executemany to do a bulk insert, psycopg2 generates an
>> INSERT for each item in the given sequence of items.
>>
>> This is a lot slower than a single INSERT using Postgres multirow
>> VALUES syntax, e.g.
>>
>> INSERT INTO films (code, title, did, date_prod, kind) VALUES
>>     ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
>>     ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
>>
>> Why doesn't psycopg2 use this syntax?
>
> Because psycopg's role is not to generate sql, but only to talk with
> the database and to convert python types in sql syntax and back.

Fair enough, it should perhaps be noted on executemany that this
is the intended behavior.

>
> You can easily generate a string such as "INSERT INTO films (...)
> VALUES %s, %s, %s ..." with n placeholders and pass n tuples as
> argument: psycopg will convert the entire tuple in a syntax understood
> by postgres.
>
> In [15]: data = [
>     ('B6717', 'Tampopo', 110, date(1985,02,10), 'Comedy'),
>     ('HG120', 'The Dinner Game', 140, None, 'Comedy')]
>
> In [18]: cur.mogrify("insert into foo values " + ','.join(["%s"] *
> len(data)), data)
> Out[18]: "insert into foo values ('B6717', 'Tampopo', 110,
> '1985-02-10'::date, 'Comedy'),('HG120', 'The Dinner Game', 140, NULL,
> 'Comedy')"
>
> But the "connective tissue" of the query must be generated by the
> application, or by some higher level library you may decide to use.

Yes, I'll take this up to sqlqlchemy and deal with it there.

>
> Also note that the fastest way to insert values in the db is to use
> COPY. In that area I'd say psycopg could do more, e.g. it would be
> great to use a sequence of python tuples in place of a file-like
> object to use as copy source, but that takes an entirely different
> adaptation infrastructure in place.
>
> -- Daniele