Re: [psycopg] speed concerns with executemany()

Поиск
Список
Период
Сортировка
От Daniele Varrazzo
Тема Re: [psycopg] speed concerns with executemany()
Дата
Msg-id CA+mi_8a9f0QDBskxae1WB8uLWwov8B1ZjDMwos3vDY0-YqnjQA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [psycopg] speed concerns with executemany()  (Aryeh Leib Taurog <python@aryehleib.com>)
Ответы Re: [psycopg] speed concerns with executemany()
Re: [psycopg] speed concerns with executemany()
Список psycopg
On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog
<python@aryehleib.com> wrote:
> On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>> classic: 427.618795156 sec
>> joined: 7.55754685402 sec
>
> This is really interesting.  I have long been using a utility I put
> together to insert using BINARY COPY.  In fact I just brushed it up a
> bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy>
>
> I'm curious to run a benchmark against the improved executemany.  I'd
> hoped that pgcopy would be generally useful, but it may no longer be
> necessary.  A fast executemany() certainly suits more use cases.

(Sorry, mant to write this message earlier but forgot it in my drafts.)

There's always the case that a sequence of:

1) psycopg executemany as is now: separate statements
- insert into table values (...);
- insert into table values (...);
- insert into table values (...);

is slower than

2) psycopg executemany as proposed: a single statement containing
- insert into table values (...);
  insert into table values (...);
  insert into table values (...);

which is slower than

3) a single insert with many params. Plays well with PQexecParams
   but would need some form of generation by the client
- insert into table values (...), (...), (...);

which is slower than

4) copy.

While the proposed executemany is a nice low hanging fruit it will
break on PQexecParams and it's far from being optimal anyway. Wonder
if there is a way to help users at least to have 3 without bothering
with mogrify (due to break too with the PQexecParams switch).

Brainstorming from here: expect no consistency.

Another good property of 2 is that it supports any statements: update,
delete, select (which may call a stored procedure repeatedly to
perform some data manipulation) whereas a manipulation into form 3 is
specific to inserts (which can be a starting point for a fast
update/select anyway: fast-insert into a temp table, then select or
update with a join).

fast-forward to last message:

On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote:

> Over network, with NRECS=10000 and page size=100:
> classic: 716.759769917 sec
> joined: 15.141461134 sec
> pgcopy: 3.70594096184 sec

Aryeh thank you for this benchmark. Could you please add a test like
(code untested):

    def insertmany(self, sql, argslist, page_size=100):
        tmpl = None
        for page in paginate(argslist, page_size=page_size):
            if tmpl is None and page:
                tmpl = '(%s)' % ','.join([%s] * len(page[0]))
            self.execute(sql % ",".join(self.mogrify(tmpl, args) for
args in page))

where sql should be an insert with a single %s placeholder, which
would be replaced by one or more arguments records, and see where we
end up? I expect somewhere between "joined" and "pgcopy" but wonder
closer to which.

-- Daniele


В списке psycopg по дате отправления:

Предыдущее
От: Aryeh Leib Taurog
Дата:
Сообщение: Re: [psycopg] speed concerns with executemany()
Следующее
От: Aryeh Leib Taurog
Дата:
Сообщение: Re: [psycopg] speed concerns with executemany()