Обсуждение: Batches of single-insert statements vs batches of multi-insert statements
Batches of single-insert statements vs batches of multi-insert statements
От
 
		    	Christopher Deckers
		    Дата:
		        Hi all,
I tried all sorts of query rewrite to boost a huge sequence of insert statements and here are my findings. All is run on a local Postgres 9.5.3, on a 64-bit Windows box with 32GB RAM, i7@3.6GHz.
1. Batches of single insert statements (original case):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?)
-> 52 seconds
2. Batches of single insert statements to an unlogged table, then copy:
CREATE UNLOGGED TABLE SomeUnloggedTable AS SELECT * FROM SomeTable WHERE 1 = 0
Batch of: INSERT INTO SomeUnloggedTable (Col1, Col2) VALUES (?, ?)
INSERT INTO SomeTable SELECT * FROM SomeUnloggedTable
DROP TABLE SomeUnloggedTable
-> 43 seconds (15 seconds in the copy from unlogged to target table)
3. Batches of 100 multi-insert (followed by batches of single inserts for remainder):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?), (?, ?), (?, ?)...
-> 30 seconds
So, rewriting the batch of single insert statements to batches of 100 multi-insert statements almost doubled the speed.
Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?
Rewriting our code to use multi-value inserts is more cumbersome and not applicable everywhere. It also makes the code database-specific as all the engines we support do not all handle this syntax.
These considerations actually lead me to this old thread:
https://www.postgresql.org/message-id/55130DC8.2070508%40redhat.com
Anyway, these tests definitely show that the performance of the standard JDBC batch API could potentially be improved in Postgres internally with a bit of magic :)
Isn't it possible (either in the driver or in the engine) to aggregate the single-insert statements of a batch to turn them into a multi-insert equivalent? It would not have to handle all cases: simply handling basic commands like "INSERT INTO X (a, b, c, ...) VALUES (?, ?, ?, ...)" would help a lot.
If this should not be done in the JDBC driver, then in which area could such handling be done?
Or have I missed something?
If needed, I am ready to offer my assistance with testing (I can tweak parameters, install dev drivers, modify my code to run alternate statements, experiment, etc.).
I tried all sorts of query rewrite to boost a huge sequence of insert statements and here are my findings. All is run on a local Postgres 9.5.3, on a 64-bit Windows box with 32GB RAM, i7@3.6GHz.
1. Batches of single insert statements (original case):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?)
-> 52 seconds
2. Batches of single insert statements to an unlogged table, then copy:
CREATE UNLOGGED TABLE SomeUnloggedTable AS SELECT * FROM SomeTable WHERE 1 = 0
Batch of: INSERT INTO SomeUnloggedTable (Col1, Col2) VALUES (?, ?)
INSERT INTO SomeTable SELECT * FROM SomeUnloggedTable
DROP TABLE SomeUnloggedTable
-> 43 seconds (15 seconds in the copy from unlogged to target table)
3. Batches of 100 multi-insert (followed by batches of single inserts for remainder):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?), (?, ?), (?, ?)...
-> 30 seconds
So, rewriting the batch of single insert statements to batches of 100 multi-insert statements almost doubled the speed.
Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?
Rewriting our code to use multi-value inserts is more cumbersome and not applicable everywhere. It also makes the code database-specific as all the engines we support do not all handle this syntax.
These considerations actually lead me to this old thread:
https://www.postgresql.org/message-id/55130DC8.2070508%40redhat.com
Anyway, these tests definitely show that the performance of the standard JDBC batch API could potentially be improved in Postgres internally with a bit of magic :)
Isn't it possible (either in the driver or in the engine) to aggregate the single-insert statements of a batch to turn them into a multi-insert equivalent? It would not have to handle all cases: simply handling basic commands like "INSERT INTO X (a, b, c, ...) VALUES (?, ?, ?, ...)" would help a lot.
If this should not be done in the JDBC driver, then in which area could such handling be done?
Or have I missed something?
If needed, I am ready to offer my assistance with testing (I can tweak parameters, install dev drivers, modify my code to run alternate statements, experiment, etc.).
Cheers,
-Christopher