Sending stuff in big-batches + autocommit (fast transactions) + few network calls is performance 101 I thought. I think the "executemany" should be documented what it does (it looked suspicious when I saw it long time ago, why I didn't use it).
On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote: Alright that I get. Still the practical outcome is each INSERT is being done in a transaction (an implicit one) so the transaction overhead comes into play. Or am I missing something?
Nope, not missing a thing. The theory (and it is only that) is that when they do the .executemany(), each of those INSERTs pays the transaction overhead, while if they do one big INSERT, just that one statement does.
Just ran a quick and dirty test using IPython %timeit.
With a list of 200 tuples each which had 3 integers INSERTing into: test=> \d psycopg_table Table "public.psycopg_table" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | c | integer |
The results where:
sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)"
Without autocommit:
In [65]: timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 12.5 ms per loop
With autocommit:
In [72]: timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 1.71 s per loop