Batches of single-insert statements vs batches of multi-insert statements

Поиск
Список
Период
Сортировка
От Christopher Deckers
Тема Batches of single-insert statements vs batches of multi-insert statements
Дата
Msg-id CADFnS4TP2tqF5qXDpvAJykdREoZtAQ9nJZNrN3Xh4MQxTfAv7g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Batches of single-insert statements vs batches ofmulti-insert statements  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Список pgsql-jdbc
Hi all,

I tried all sorts of query rewrite to boost a huge sequence of insert statements and here are my findings. All is run on a local Postgres 9.5.3, on a 64-bit Windows box with 32GB RAM, i7@3.6GHz.

1. Batches of single insert statements (original case):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?)
-> 52 seconds

2. Batches of single insert statements to an unlogged table, then copy:
CREATE UNLOGGED TABLE SomeUnloggedTable AS SELECT * FROM SomeTable WHERE 1 = 0
Batch of: INSERT INTO SomeUnloggedTable (Col1, Col2) VALUES (?, ?)
INSERT INTO SomeTable SELECT * FROM SomeUnloggedTable
DROP TABLE SomeUnloggedTable
-> 43 seconds (15 seconds in the copy from unlogged to target table)

3. Batches of 100 multi-insert (followed by batches of single inserts for remainder):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?), (?, ?), (?, ?)...
-> 30 seconds

So, rewriting the batch of single insert statements to batches of 100 multi-insert statements almost doubled the speed.

Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?

Rewriting our code to use multi-value inserts is more cumbersome and not applicable everywhere. It also makes the code database-specific as all the engines we support do not all handle this syntax.
These considerations actually lead me to this old thread:
https://www.postgresql.org/message-id/55130DC8.2070508%40redhat.com

Anyway, these tests definitely show that the performance of the standard JDBC batch API could potentially be improved in Postgres internally with a bit of magic :)
Isn't it possible (either in the driver or in the engine) to aggregate the single-insert statements of a batch to turn them into a multi-insert equivalent? It would not have to handle all cases: simply handling basic commands like "INSERT INTO X (a, b, c, ...) VALUES (?, ?, ?, ...)" would help a lot.
If this should not be done in the JDBC driver, then in which area could such handling be done?
Or have I missed something?

If needed, I am ready to offer my assistance with testing (I can tweak parameters, install dev drivers, modify my code to run alternate statements, experiment, etc.).

Cheers,
-Christopher

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Thomas Kellerer
Дата:
Сообщение: Re: Batches of single-insert statements vs batches of multi-insert statements
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: Re: Batches of single-insert statements vs batches of multi-insert statements