Re: [psycopg] speed concerns with executemany()
От | Daniele Varrazzo |
---|---|
Тема | Re: [psycopg] speed concerns with executemany() |
Дата | |
Msg-id | CA+mi_8a9f0QDBskxae1WB8uLWwov8B1ZjDMwos3vDY0-YqnjQA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [psycopg] speed concerns with executemany() (Aryeh Leib Taurog <python@aryehleib.com>) |
Ответы |
Re: [psycopg] speed concerns with executemany()
Re: [psycopg] speed concerns with executemany() |
Список | psycopg |
On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> aklaver(at)tito:~> python psycopg_executemany.py -p 100 >> classic: 427.618795156 sec >> joined: 7.55754685402 sec > > This is really interesting. I have long been using a utility I put > together to insert using BINARY COPY. In fact I just brushed it up a > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> > > I'm curious to run a benchmark against the improved executemany. I'd > hoped that pgcopy would be generally useful, but it may no longer be > necessary. A fast executemany() certainly suits more use cases. (Sorry, mant to write this message earlier but forgot it in my drafts.) There's always the case that a sequence of: 1) psycopg executemany as is now: separate statements - insert into table values (...); - insert into table values (...); - insert into table values (...); is slower than 2) psycopg executemany as proposed: a single statement containing - insert into table values (...); insert into table values (...); insert into table values (...); which is slower than 3) a single insert with many params. Plays well with PQexecParams but would need some form of generation by the client - insert into table values (...), (...), (...); which is slower than 4) copy. While the proposed executemany is a nice low hanging fruit it will break on PQexecParams and it's far from being optimal anyway. Wonder if there is a way to help users at least to have 3 without bothering with mogrify (due to break too with the PQexecParams switch). Brainstorming from here: expect no consistency. Another good property of 2 is that it supports any statements: update, delete, select (which may call a stored procedure repeatedly to perform some data manipulation) whereas a manipulation into form 3 is specific to inserts (which can be a starting point for a fast update/select anyway: fast-insert into a temp table, then select or update with a join). fast-forward to last message: On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > Over network, with NRECS=10000 and page size=100: > classic: 716.759769917 sec > joined: 15.141461134 sec > pgcopy: 3.70594096184 sec Aryeh thank you for this benchmark. Could you please add a test like (code untested): def insertmany(self, sql, argslist, page_size=100): tmpl = None for page in paginate(argslist, page_size=page_size): if tmpl is None and page: tmpl = '(%s)' % ','.join([%s] * len(page[0])) self.execute(sql % ",".join(self.mogrify(tmpl, args) for args in page)) where sql should be an insert with a single %s placeholder, which would be replaced by one or more arguments records, and see where we end up? I expect somewhere between "joined" and "pgcopy" but wonder closer to which. -- Daniele
В списке psycopg по дате отправления: