Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
От | Michail Nikolaev |
---|---|
Тема | Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY |
Дата | |
Msg-id | CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
|
Список | pgsql-hackers |
Hello, hackers.
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Fujii.Yuki@df.MitsubishiElectric.co.jp"Дата:
Сообщение: Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()