Re: [psycopg] speed concerns with executemany()

Поиск
Список
Период
Сортировка
От Aryeh Leib Taurog
Тема Re: [psycopg] speed concerns with executemany()
Дата
Msg-id 20170201133219.GA2766@deb76.aryehleib.com
обсуждение исходный текст
Ответ на Re: [psycopg] speed concerns with executemany()  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Список psycopg
On Wed, Feb 01, 2017 at 02:21:37AM +0000, Daniele Varrazzo wrote:
> On Mon, Jan 30, 2017 at 9:51 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote:
> > I haven't dug into PQexecParams, but it seems to me that these
> > alternate strategies could be made available outside of psycopg2.
>
> I got to the same conclusion, so I've implemented functions to
> implement the functions execute_batch() and execute_values() (resp.
> what you call "joined" and "folded" in your benchmarks) in the extras
> module, and leave the semantics of executemany() untouched (no extra
> parameters or rowcount breaking only if going batched... just too
> confusing).
>
> Implementation, docs, tests in `this commit`__.
>
> .. __: https://github.com/psycopg/psycopg2/commit/a95fd3df1abc0282f1c47fa2170191f037c3c8de
>
> I also thought about implementing an execute_prepared() function,
> which would have run PREPARE, then EXECUTE in a loop (possibly batched
> as in execute_batch), finally DEALLOCATE. Implementation doesn't seem
> trivial because:
>
> - parameters %s and %(name)s should be replaced with $1, $2, ..., with
> matching unescaping of literal % and escaping of literal $;
> - in case of error, calling DEALLOCATE is tricky: if the connection is
> not autocommit the transaction is failed, the function could use a
> savepoint but then it would clear the error state too; not calling
> DEALLOCATE would leave the prepared statement there and according to
> the name chosen for the statement (e.g. 'psycopg_%s' % id(cursor))
> would make another execute_prepared() fail... it should be possible to
> wrap the PREPARE in a savepoint to deal with this problem;
> - if the connection is autocommit all the above is not needed.
>
> ISTM that it's easier to leave the users to call
> PREPARE/execute_batch('EXECUTE')/DEALLOCATE: knowing the context in
> which the sequence is called wouldn't require parameters mangling and
> the error management would be simpler for them.
>
> Thoughts? Shall we merge this stuff? Any feedback is welcome.

I think it looks great.  I agree that execute_prepared() would be
tricky, and it doesn't seem to provide much performance benefit over
the methods you've already implemented.

I have run the following benchmarks for UPDATE across the Atlantic:

1. classic executemany

2. "joined" - statements executed in batches

3. "prepared/joined" - as you suggested above

4. pgcopy - copy into a temp table, then UPDATE FROM temp table.

<https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d#file-updatemany-py-L69>

With NRECS = 5000
classic: 794.658465862 sec
joined: 11.6829760075 sec
prepared: 10.1489500999 sec
pgcopy: 2.68695497513 sec


В списке psycopg по дате отправления:

Предыдущее
От: Daniele Varrazzo
Дата:
Сообщение: Re: [psycopg] speed concerns with executemany()
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: [psycopg] speed concerns with executemany()