reWriteBatchedInserts causes Inserts to freeze/lock
| От | Semen Yefimenko | 
|---|---|
| Тема | reWriteBatchedInserts causes Inserts to freeze/lock | 
| Дата | |
| Msg-id | CAKnyMoJp658dgjgRVYc5Fm_D6-U=qnFtu3rRtPju68rTzc06Sg@mail.gmail.com обсуждение исходный текст | 
| Ответы | Re: reWriteBatchedInserts causes Inserts to freeze/lock | 
| Список | pgsql-jdbc | 
Hi all,
		
	Recently I have come across an issue while inserting data.
I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char.
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into.
I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char.
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into.
CREATE TABLE schema.tableplaceholder
(
col1 character varying(24) COLLATE pg_catalog."default" NOT NULL,
col2 timestamp without time zone,
col3 character varying(24) COLLATE pg_catalog."default",
col4 character varying(24) COLLATE pg_catalog."default",
col5 character varying(24) COLLATE pg_catalog."default",
col6 character varying(15) COLLATE pg_catalog."default",
col7 boolean,
col8 timestamp without time zone,
col9 timestamp without time zone,
col10 character varying(12) COLLATE pg_catalog."default",
col11 character varying(53) COLLATE pg_catalog."default",
col12 character varying(24) COLLATE pg_catalog."default",
col13 character varying(30) COLLATE pg_catalog."default",
col14 character varying(45) COLLATE pg_catalog."default",
col15 character varying(4) COLLATE pg_catalog."default",
col16 character varying(5) COLLATE pg_catalog."default",
col17 boolean,
col18 character varying(128) COLLATE pg_catalog."default",
col19 numeric,
col20 numeric,
col21 character varying(4) COLLATE pg_catalog."default",
col22 character varying(24) COLLATE pg_catalog."default",
col23 character varying[] COLLATE pg_catalog."default",
CONSTRAINT pkeyconst PRIMARY KEY (col1)
USING INDEX TABLESPACE tablespace
)
SELECT a.datname, l.relation::regclass, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pidFROM pg_stat_activity aJOIN pg_locks l ON l.pid = a.pidORDER BY a.query_start;
We have a Postgres Version 12.5 (PostgreSQL 12.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit) and are running batches of 25.000 while committing every 100.000. The jdbc version we're running is: 42.2.20Result:
"databaseName" "schema.sequenceName" "RowExclusiveLock" true "schema" "INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(... " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "107270" "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(... " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(... " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "schema. tableplaceholder " "RowExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(... " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
When I was optimizing the performance in another project, I found out about the jdbc BUG https://github.com/pgjdbc/pgjdbc/issues/194 which had the exact same behaviour.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:
This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted.
Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)x42
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)
...
This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted.
Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.
В списке pgsql-jdbc по дате отправления: