Обсуждение: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Hello, hackers.
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Вложения
On Tue, Jun 11, 2024 at 01:00:00PM +0200, Michail Nikolaev wrote: > Probably something wrong with arbiter index selection for different > backends. I am afraid it could be a symptom of a more serious issue. ON CONFLICT selects an index that may be rebuilt in parallel of the REINDEX happening, and its contents may be incomplete. Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid? Using the ccnew or ccold indexes would not be correct for the conflict resolutions. -- Michael
Вложения
Hello, Michael.
> Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid?
As far as I can see (1) !indisvalid indexes are filtered out.
But... It looks like this choice is not locked in any way (2), so index_concurrently_swap or index_concurrently_set_dead can change this index after the decision is made, even despite WaitForLockersMultiple (3). In some cases, it may cause a data loss...
But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.
Hello.
> But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.
It seems like the assumption is correct - we may use an invalid index as arbiter due to race condition.
The attached patch adds a check for that case, and now the test fails like this:
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Best regards,
Mikhail.
Вложения
Hello, everyone.
> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.
So, seems like I have found the source of the problem:
1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.
> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.
So, seems like I have found the source of the problem:
1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.
It does no lock selected indexes in any additional way which prevents index_concurrently_swap changing them (set and clear validity).
RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193
2) other backend marks some index as invalid and commits
index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
3) first backend invalidates catalog snapshot because transactional snapshot
InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
4) first backend copies indexes selected using previous catalog snapshot
ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) then reads indexes using new fresh snapshot
RelationGetIndexList relcache.c:4816
ExecOpenIndices execIndexing.c:175
ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)
ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
............
ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.
RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193
2) other backend marks some index as invalid and commits
index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
3) first backend invalidates catalog snapshot because transactional snapshot
InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
4) first backend copies indexes selected using previous catalog snapshot
ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) then reads indexes using new fresh snapshot
RelationGetIndexList relcache.c:4816
ExecOpenIndices execIndexing.c:175
ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)
ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
............
ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.
The simplest possible fix is to use ShareLock instead ShareUpdateExclusiveLock in the index_concurrently_swap
oldClassRel = relation_open(oldIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);
But this is not a "concurrent" way. But such update should be fast enough as far as I understand.
Best regards,
Mikhail.
Вложения
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The simplest possible fix is to use ShareLock
> instead ShareUpdateExclusiveLock in the index_concurrently_swap
>
> oldClassRel = relation_open(oldIndexId, ShareLock);
> newClassRel = relation_open(newIndexId, ShareLock);
>
> But this is not a "concurrent" way. But such update should be fast enough
> as far as I understand.
Nope, that won't fly far. We should not use a ShareLock in this step
or we are going to conflict with row exclusive locks, impacting all
workloads when doing a REINDEX CONCURRENTLY.
That may be a long shot, but the issue is that we do the swap of all
the indexes in a single transaction, but do not wait for them to
complete when committing the swap's transaction in phase 4. Your
report is telling us that we really have a good reason to wait for all
the transactions that may use these indexes to finish. One thing
coming on top of my mind to keep things concurrent-safe while allowing
a clean use of the arbiter indexes would be to stick a
WaitForLockersMultiple() on AccessExclusiveLock just *before* the
transaction commit of phase 4, say, lacking the progress report part:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
CommandCounterIncrement();
}
+ WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
+
/* Commit this transaction and make index swaps visible */
CommitTransactionCommand();
StartTransactionCommand();
This is a non-fresh Friday-afternoon idea, but it would make sure that
we don't have any transactions using the indexes switched to _ccold
with indisvalid that are waiting for a drop in phase 5. Your tests
seem to pass with that, and that keeps the operation intact
concurrent-wise (I'm really wishing for isolation tests with injection
points just now, because I could use them here).
> + Assert(indexRelation->rd_index->indislive);
> + Assert(indexRelation->rd_index->indisvalid);
> +
> if (!indexRelation->rd_index->indimmediate)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
This kind of validation check may be a good idea in the long term.
That seems incredibly useful to me if we were to add more code paths
that do concurrent index rebuilds, to make sure that we don't rely on
an index we should not use at all. That's a HEAD-only thing IMO,
though.
--
Michael
Вложения
Hello, Michael!
> This is a non-fresh Friday-afternoon idea, but it would make sure that
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5. Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5. Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).
Yes, I also have tried that approach, but it doesn't work, unfortunately.
You may fail test increasing number of connections:
'--no-vacuum --client=10 -j 2 --transactions=1000',
The source of the issue is not the swap of the indexes (and not related to REINDEX CONCURRENTLY only), but the fact that indexes are fetched once during planning (to find the arbiter), but then later reread with a new catalog snapshot for the the actual execution.
So, other possible fixes I see:
* fallback to replanning in case we see something changed during the execution
* select arbiter indexes during actual execution
> That's a HEAD-only thing IMO,
> though.
> though.
Do you mean that it needs to be moved to a separate patch?
Best regards,
Mikhail.
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote: > The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well. While looking at all that, I've been also curious about this specific point, and it is indeed possible to finish in a state where a duplicate key would be found in one of indexes selected by the executor during an INSERT ON CONFLICT while a concurrent set of CICs and DICs are run, so you don't really need a REINDEX. See for example the attached test. -- Michael
Вложения
On Fri, Jun 21, 2024 at 11:31:21AM +0200, Michail Nikolaev wrote: > Yes, I also have tried that approach, but it doesn't work, unfortunately. > You may fail test increasing number of connections: > > '--no-vacuum --client=10 -j 2 --transactions=1000', > > The source of the issue is not the swap of the indexes (and not related to > REINDEX CONCURRENTLY only), but the fact that indexes are fetched once > during planning (to find the arbiter), but then later reread with a new > catalog snapshot for the the actual execution. When I first saw this report, my main worry was that I have somewhat managed to break the state of the indexes leading to data corruption because of an incorrect step in the concurrent operations. However, as far as I can see this is not the case, as an effect of two properties we rely on for concurrent index operations, that hold in the executor and the planner. Simply put: - The planner ignores indexes with !indisvalid. - The executor ignores indexes with !indislive. The critical point is that we wait in DROP INDEX CONC and REINDEX CONC for any transactions still using an index that's waiting to be marked as !indislive, because such indexes *must* not be used in the executor. > So, other possible fixes I see: > * fallback to replanning in case we see something changed during the > execution > * select arbiter indexes during actual execution These two properties make ON CONFLICT react the way it should depending on the state of the indexes selected by the planner based on the query clauses, with changes reflecting when executing, with two patterns involved: - An index may be created in a concurrent operation after the planner has selected the arbiter indexes (the index may be defined, still not valid yet, or just created after), then the query execution would need to handle the extra index created available at execution, with a failure on a ccnew index. - An index may be selected at planning phase, then a different index could be used by a constraint once both indexes swap, with a failure on a ccold index. As far as I can see, it depends on what kind of query semantics and the amount of transparency you are looking for here in your application. An error in the query itself can also be defined as useful so as your application is aware of what happens as an effect of the concurrent index build (reindex or CIC/DIC), and it is not really clear to me why silently falling back to a re-selection of the arbiter indexes would be always better. Replanning could be actually dangerous if a workload is heavily concurrently REINDEX'd, as we could fall into a trap where a query can never decide which index to use. I'm not saying that it cannot be improved, but it's not completely clear to me what query semantics are the best for all users because the behavior of HEAD and your suggestions have merits and demerits. Anything we could come up with would be an improvement anyway, AFAIU. >> That's a HEAD-only thing IMO, >> though. > > Do you mean that it needs to be moved to a separate patch? It should, but I'm wondering if that's necessary for two reasons. First, a check on indisvalid would be incorrect, because indexes marked as !indisvalid && indislive mean that there is a concurrent operation happening, and that this concurrent operation is waiting for all transactions working with a lock on this index to finish before flipping the live flag and make this index invalid for decisions taken in the executor, like HOT updates, etc. A check on indislive may be an idea, still I'm slightly biased regarding its additional value because any indexes opened for a relation are fetched from the relcache with RelationGetIndexList() explaining why indislive indexes cannot be fetched, and we rely on that in the executor for the indexes opened by a relation. -- Michael
Вложения
Hello, Michael!
> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.
From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR: duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.
On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.
Also, as you can see from the topic starter letter, we could see errors like this:
* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
So, the first error message does not provide any clue for the developer to understand what happened.
> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.
Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.
For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:
1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.
And, during the reindex concurrently:
1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;
Currently, some queries are still using the old one as arbiter, some queries use both.
2) WaitForLockersMultiple
Now all queries use both indexes as arbiter.
3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;
Now, some queries use only the new index, both some still use both.
4) WaitForLockersMultiple;
Now, all queries use only the new index - we are safe to mark the old one it as !indislive.
> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:
Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);
and it is always the valid check.
Best regards,
Mikhail.
> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.
From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR: duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.
On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.
Also, as you can see from the topic starter letter, we could see errors like this:
* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
So, the first error message does not provide any clue for the developer to understand what happened.
> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.
Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.
For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:
1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.
And, during the reindex concurrently:
1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;
Currently, some queries are still using the old one as arbiter, some queries use both.
2) WaitForLockersMultiple
Now all queries use both indexes as arbiter.
3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;
Now, some queries use only the new index, both some still use both.
4) WaitForLockersMultiple;
Now, all queries use only the new index - we are safe to mark the old one it as !indislive.
> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:
Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);
and it is always the valid check.
Best regards,
Mikhail.
Hello, Noah!
> On your other thread, it would be useful to see stack traces from the high-CPU
> processes once the live lock has ended all query completion.
> processes once the live lock has ended all query completion.
I was wrong, it is not a livelock, it is a deadlock, actually. I missed it because pgbench retries deadlocks automatically.
It looks like this:
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl ERROR: deadlock detected
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL: Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT: See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT: while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL: Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT: See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT: while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Stacktraces:
-------------------------
INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4 in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9 in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4 in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9 in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4 in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9 in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274
INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4 in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9 in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4 in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9 in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4 in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9 in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274
It looks like a deadlock caused by different set of indexes being used as arbiter indexes (or by the different order).
Best regards,
Mikhail.
Hell, everyone!
Using the brand-new injection points support in specs, I created a spec to reproduce the issue.
It fails like this currently:
make -C src/test/modules/injection_points/ check
@@ -64,6 +64,7 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
injection_points_attach
@@ -129,3 +130,4 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey"
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
injection_points_attach
@@ -129,3 +130,4 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey"
Best regards,
Mikhail.
Вложения
Some tests of stabilization, discussed in [0]. Also, an issue known for more then 1.5year... Should we at least document it? [0]: https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com
Вложения
On Mon, Oct 20, 2025 at 09:27:00PM +0200, Mihail Nikalayeu wrote: > Some tests of stabilization, discussed in [0]. > > Also, an issue known for more then 1.5year... Should we at least document it? Yes, I'm happy to push a patch documenting it. Would you like to propose the specific doc patch? I regret lacking the bandwidth to review the fix patches. > [0]: https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com
Hello, Noah! Thanks for the attention! On Mon, Oct 27, 2025 at 7:06 PM Noah Misch <noah@leadboat.com> wrote: > Yes, I'm happy to push a patch documenting it. Would you like to propose the > specific doc patch? I regret lacking the bandwidth to review the fix patches. Of course! First version in attachment, waiting for your comment. Best regards, Mikhail.
Вложения
On Tue, Oct 28, 2025 at 02:19:00AM +0100, Mihail Nikalayeu wrote: > On Mon, Oct 27, 2025 at 7:06 PM Noah Misch <noah@leadboat.com> wrote: > > Yes, I'm happy to push a patch documenting it. Would you like to propose the > > specific doc patch? I regret lacking the bandwidth to review the fix patches. > > Of course! > > First version in attachment, waiting for your comment. Thanks. Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/ concurrent REINDEX CONCURRENTLY? A search of the thread found no mention of "ON CONSTRAINT". It seems safe to assume that clause would avoid problems w/ CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX. The attached version has these changes: - Mention ON CONSTRAINT as a workaround. Will remove if you find or suspect it's not effective. - Limit the doc change to ON CONFLICT. I think mentioning it at the INDEX commands is undue emphasis. - Use term "unique violation", a term used earlier on the same page, instead of "duplicate key ...". - Remove the internals-focused point about the brief window. - Remove some detail I considered insufficiently surprising, e.g. the point about "compatible with the index being built".
Вложения
Hello! On Mon, Nov 3, 2025 at 12:21 AM Noah Misch <noah@leadboat.com> wrote: > Thanks. Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/ > concurrent REINDEX CONCURRENTLY? A search of the thread found no mention of > "ON CONSTRAINT". It seems safe to assume that clause would avoid problems w/ > CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX. It is also affected. There is a special reindex_concurrently_upsert_on_constraint spec in the patch. And even a special commit (0004) to fix it :) But yes, it happens only in the case of REINDEX. I removed the mention of "ON CONSTRAINT" and added a small comment near infer_arbiter_indexes. Doc patch is 0001, other - specs and fixes for future. Best regards, Mikhail.
Вложения
- v11-0002-Specs-to-reproduce-the-issues-with-CREATE-INDEX-.patch
- v11-0005-Modify-the-ExecInitPartitionInfo-function-to-con.patch
- v11-0003-Modify-the-infer_arbiter_indexes-function-to-con.patch
- v11-0006-Revert-Doc-cover-index-CONCURRENTLY-causing-erro.patch
- v11-0004-Modify-the-infer_arbiter_indexes-function-to-als.patch
- v11-0001-Doc-cover-index-CONCURRENTLY-causing-errors-in-I.patch
On Mon, Nov 03, 2025 at 09:41:00PM +0100, Mihail Nikalayeu wrote: > On Mon, Nov 3, 2025 at 12:21 AM Noah Misch <noah@leadboat.com> wrote: > > Thanks. Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/ > > concurrent REINDEX CONCURRENTLY? A search of the thread found no mention of > > "ON CONSTRAINT". It seems safe to assume that clause would avoid problems w/ > > CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX. > > It is also affected. There is a special > reindex_concurrently_upsert_on_constraint spec in the patch. > And even a special commit (0004) to fix it :) My mistake. I've redone the mbox search that I thought I had done. That search should have found it yesterday, so I must have made a typo then. > I removed the mention of "ON CONSTRAINT" and added a small comment > near infer_arbiter_indexes. > > Doc patch is 0001, other - specs and fixes for future. I re-flowed the new comment to the standard 78 columns and pushed 0001. Thanks.
Hello, Noah!
I re-flowed the new comment to the standard 78 columns and pushed 0001.
Thanks again!
Hello! Rebased, excluded committed part, updated revering part. Best regards, Mikhail.
Вложения
- v12-0002-Modify-the-infer_arbiter_indexes-function-to-con.patch
- v12-0004-Modify-the-ExecInitPartitionInfo-function-to-con.patch
- v12-0005-Revert-Doc-cover-index-CONCURRENTLY-causing-erro.patch
- v12-0003-Modify-the-infer_arbiter_indexes-function-to-als.patch
- v12-0001-Specs-to-reproduce-the-issues-with-CREATE-INDEX-.patch
On 2024-Aug-24, Michail Nikolaev wrote:
Hello, I'm working on getting the 0002 fix committed, with the tests it
fixes. I ran across this comment:
> @@ -813,7 +814,13 @@ infer_arbiter_indexes(PlannerInfo *root)
> idxRel = index_open(indexoid, rte->rellockmode);
> idxForm = idxRel->rd_index;
>
> - if (!idxForm->indisvalid)
> + /*
> + * We need to consider both indisvalid and indisready indexes because
> + * them may become indisvalid before execution phase. It is required
> + * to keep set of indexes used as arbiter to be the same for all
> + * concurrent transactions.
> + */
> + if (!idxForm->indisready)
> goto next;
>
> /*
I think this comment is wrong, or at least confusing. It says "we need
to consider both indisvalid and indisready indexes", which is somewhat
dubious: perhaps it wants to say "we need to consider indexes that have
indisvalid and indisready". But that is also wrong: I mean, if we
wanted to consider indexes that are marked indisready=false, then we
wouldn't "next" here (which essentially ignores such indexes). So,
really, I think what this wants to say is "we need to consider indexes
that are indisready regardless of whether or not they are indisvalid,
because they may become valid later".
Also, I think this comment lacks an explanation of _why_ indexes with
indisvalid=false are required. You wrote earlier[1] in the thread the
following:
> The reason is that these indexes are required for correct processing during
> the execution phase.
> If "ready" indexes are skipped as arbiters by one transaction, they may
> already have become "valid" for another concurrent transaction during its
> planning phase.
> As a result, both transactions could concurrently process the UPSERT
> command with different sets of arbiters (while using the same set of
> indexes for tuple insertion later).
> This can lead to unexpected "duplicate key value violates unique
> constraint" errors and deadlocks.
I think this text is also confusing or wrong. I think you meant 'If
"not valid" indexes are skipped as arbiters, they may have become
"valid" for another concurrent transaction'. The text in catalogs.sgml
for these columns is:
: indisvalid bool
:
: If true, the index is currently valid for queries. False means the
: index is possibly incomplete: it must still be modified by
: INSERT/UPDATE operations, but it cannot safely be used for queries.
: If it is unique, the uniqueness property is not guaranteed true
: either.
: indisready bool
:
: If true, the index is currently ready for inserts. False means the
: index must be ignored by INSERT/UPDATE operations.
It makes sense that indisready indexes must be ignored, because
basically the catalog state is not ready yet. So that part seems
correct.
The other critical point is that the uniqueness must hold, and an index
with indisvalid=false would not necessarily detect that because the tree
might not be built completely yet, so the heap tuple that would be a
duplicate of the one we're writing might not have been scanned yet. But
that's not a problem, because we require that a valid index exists, even
if we don't "infer" that one -- which means the uniqueness clause is
being enforced by that other index.
Given all of that, I have rewritten the comment thusly:
/*
* Ignore indexes that aren't indisready, because we cannot trust their
* catalog structure yet. However, if any indexes are marked
* indisready but not yet indisvalid, we still consider them, because
* they might turn valid while we're running. Doing it this way
* allows a concurrent transaction with a slightly later catalog
* snapshot infer the same set of indexes, which is critical to
* prevent spurious 'duplicate key' errors.
*
* However, another critical aspect is that a unique index that isn't
* yet marked indisvalid=true might not be complete yet, meaning it
* wouldn't detect possible duplicate rows. In order to prevent false
* negatives, we require that we include in the set of inferred indexes
* at least one index that is marked valid.
*/
if (!idxForm->indisready)
goto next;
Am I understanding this correctly?
Thanks,
[1] https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329
Hello, Álvaro! > Hello, I'm working on getting the 0002 fix committed, with the tests it > fixes. I ran across this comment: Nice! > Am I understanding this correctly? Yes, you are right. Looks like I mixed up everything while writing the comment. I attached an updated version, changes are: 1) your fix for comment 2) some updates for 0001 test stability (related to [0]). I'll recheck other patches too, maybe something needs to be adjusted also. Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com
Вложения
- v13-0001-Specs-to-reproduce-the-issues-with-CREATE-INDEX-.patch
- v13-0005-Revert-Doc-cover-index-CONCURRENTLY-causing-erro.patch
- v13-0003-Modify-the-infer_arbiter_indexes-function-to-als.patch
- v13-0004-Modify-the-ExecInitPartitionInfo-function-to-con.patch
- v13-0002-Modify-the-infer_arbiter_indexes-function-to-con.patch
Hello,
On 2025-Nov-22, Mihail Nikalayeu wrote:
> I attached an updated version, changes are:
> 1) your fix for comment
> 2) some updates for 0001 test stability (related to [0]).
Thank you. I've been looking at this part of 0001 too:
> @@ -942,6 +943,8 @@ retry:
> econtext->ecxt_scantuple = save_scantuple;
>
> ExecDropSingleTupleTableSlot(existing_slot);
> + if (!conflict)
> + INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);
and wondering whether it would make sense to pass the 'conflict' as an
argument to the injection point instead of being conditional on it.
That is, do it like
ExecDropSingleTupleTableSlot(existing_slot);
+ INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);
However, I don't see that the SQL injection point interface has the
ability to receive arguments, so I'm guessing that this is not workable?
Maybe something to consider for the future.
BTW I've split the patches to commit differently: instead of 0001 with a
bunch of failing tests and then 0002 with fixes for some of them, I
intend to push a modified 0002 with the tests in 0001 that it fixes,
then your 0003 with the tests it fixes, and so on. (I have already cut
it this way, so you don't need to resubmit anything at this point. But
I'll verify what changes you have in the v13-0001 compared to the one
before, to be certain I'm not missing any later changes there.)
Thanks!
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)
On Sat, Nov 22, 2025 at 03:16:44PM +0100, Alvaro Herrera wrote:
> ExecDropSingleTupleTableSlot(existing_slot);
> + INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);
>
> However, I don't see that the SQL injection point interface has the
> ability to receive arguments, so I'm guessing that this is not workable?
> Maybe something to consider for the future.
Are you referring to something like 371f2db8b05e here? This has been
added in 18.
--
Michael
Вложения
On 2025-Nov-23, Michael Paquier wrote:
> On Sat, Nov 22, 2025 at 03:16:44PM +0100, Alvaro Herrera wrote:
> > ExecDropSingleTupleTableSlot(existing_slot);
> > + INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);
> >
> > However, I don't see that the SQL injection point interface has the
> > ability to receive arguments, so I'm guessing that this is not workable?
> > Maybe something to consider for the future.
>
> Are you referring to something like 371f2db8b05e here? This has been
> added in 18.
Yes, exactly that ... but can this be used by the SQL injection points
functionality? The test is an isolation .spec file, and I didn't find a
way to say "make me sleep when I hit this injection point, but only if
conflict is false". Or maybe I just missed it.
Thanks!
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
On Sun, Nov 23, 2025 at 01:14:37PM +0100, Alvaro Herrera wrote: > Yes, exactly that ... but can this be used by the SQL injection points > functionality? The test is an isolation .spec file, and I didn't find a > way to say "make me sleep when I hit this injection point, but only if > conflict is false". Or maybe I just missed it. (Sorry for the low activity, last week was a crazy conference week and I'm still recovering.) Reading through v13-0001, there is currently no direct way with the existing callbacks to do as you want, which would be to push down a conditional wait inside the callback itself, based on a run-time stack. There would be two ways to do that, by extending the facility: - Simple one: addition of a new dedicated callback, that accepts one single boolean argument. - More complicated one: extend the module injection_points so as it is possible to pass down conditions that should be checked at run-time. I've mentioned that in the past, folks felt meh. Saying that, Mihail's patch to just run the injection point only if conflict == false is OK, and that's what I have seen most hackers do as a matter of simplicity. This makes the injection point footprint in the backend slightly larger but it's not that bad, englobed inside an ifdef. You could also use a secondary point for an else branch defined in execIndexing.c, with a different name and a different callback attached to it if you want to take a special action for the conflict == true case. -- Michael
Вложения
On 2025-Nov-22, Mihail Nikalayeu wrote: > 2) some updates for 0001 test stability (related to [0]). This patch would bring the committed test file up to date with what you last submitted. However, I didn't understand what is the problem with the original formulation, and I haven't seen the test fail ... can you explain? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)
Вложения
On Mon, Nov 24, 2025 at 06:49:47PM +0100, Alvaro Herrera wrote:
> This patch would bring the committed test file up to date with what you
> last submitted. However, I didn't understand what is the problem with
> the original formulation, and I haven't seen the test fail ... can you
> explain?
Reading through bc32a12e0db2, I am puzzled by the committed result
here:
+#ifdef USE_INJECTION_POINTS
+ if (conflict)
+ INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);
+ else
+ INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);
+#endif
The "no-conflict" point is used in the isolation test, but not the
other in the "conflict == true" path:
$ git grep check-exclusion-or-unique-constraint-conflict
src/backend/executor/execIndexing.c:
INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);
If you have no plans for it in the long-term, I'd rather remove it
from the tree, rather than keep it. Of course, I would keep the
USE_INJECTION_POINTS block to avoid the extra boolean check in
non-USE_INJECTION_POINTS builds.
--
Michael
Вложения
Hello, Álvaro! On Mon, Nov 24, 2025 at 6:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > This patch would bring the committed test file up to date with what you > last submitted. However, I didn't understand what is the problem with > the original formulation, and I haven't seen the test fail ... can you > explain? Yes, it looks strange - but it is the best option I have found so far. I've seen tests fail a few times in CI due to race. More details are available at [0] and particularly [1]. In a few words - it is an attempt to make sure the test goes to the wake-up backend only after it actually enters to wait mode. For that reason an additional 'notice' point is used by spec. I have proposed another possible solution for the [0] thread. Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com [1]: https://www.postgresql.org/message-id/flat/aREW7Qo0GqjfiHn7%40paquier.xyz#fde8593a6239a594724f8badd33f7266
On 2025-Nov-24, Mihail Nikalayeu wrote:
> In a few words - it is an attempt to make sure the test goes to the
> wake-up backend only after it actually enters to wait mode. For that
> reason an additional 'notice' point is used by spec.
> I have proposed another possible solution for the [0] thread.
That makes sense. I pushed it now, many thanks.
On 2025-Nov-25, Michael Paquier wrote:
> Reading through bc32a12e0db2, I am puzzled by the committed result
> here:
> +#ifdef USE_INJECTION_POINTS
> + if (conflict)
> + INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);
> + else
> + INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);
> +#endif
> If you have no plans for it in the long-term, I'd rather remove it
> from the tree, rather than keep it.
Removed.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"We have labored long to build a heaven, only to find it
populated with horrors" (Prof. Milton Glass)
Hello,
We ran into one more problem with the new test, evidenced by timeouts by
buildfarm member prion. For CATCACHE_FORCE_RELEASE builds on two of the
tests, we get a few invalidations of the catalog snapshot ahead of what
we expect, and because we have an injection point to sleep there, those
tests get stuck.
Here's one possible fix. I had to take the attach operation on
invalidate-catalog-snapshot-end to a new step of s1, instead of
occurring in the setup block. I understand that this is because no step
can run until the setup of all steps completes, so if one setup gets
stuck, we're out of luck. And then, session s4 can do a conditional
wakeup of session s1.
Patch attached. Thoughts?
Maybe there's some other way to go about this -- for instance I
considered the idea of moving the injection point somewhere else from
InvalidateCatalogSnapshot(). I don't have any ideas about that though,
but I'm willing to listen if anybody has any.
The output in both cases is different (we get more notices in the weird
build and also s1 goes to sleep in different order), so I had to add an
alternative expected file. The diffs look okay to me -- essentially
they say, in the normal case, the injection_points_attach() returns
immediately, but in the other case, s1 goes to sleep and is awakened
when it sees the 'case' expression by session 4:
--- expected/index-concurrently-upsert-predicate.out 2025-11-26 19:13:58.702213673 +0100
+++ expected/index-concurrently-upsert-predicate_1.out 2025-11-26 19:04:16.745666956 +0100
@@ -1,8 +1,9 @@
Parsed test spec with 5 sessions
starting permutation: s1_attach_invalidate_catalog_snapshot s4_wakeup_s1_setup s5_noop s3_start_create_index
s1_start_upserts4_wakeup_define_index_before_set_valid s2_start_upsert s5_wakeup_s1_from_invalidate_catalog_snapshot
s4_wakeup_s2s4_wakeup_s1
+s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
injection_points_attach
-----------------------
(1 row)
@@ -14,18 +15,15 @@
injection_points_attach
-----------------------
(1 row)
+s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
+s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
step s1_attach_invalidate_catalog_snapshot:
SELECT injection_points_attach('invalidate-catalog-snapshot-end', 'wait');
-
-injection_points_attach
------------------------
-
-(1 row)
-
+ <waiting ...>
step s4_wakeup_s1_setup:
select case when
(select pid from pg_stat_activity
where wait_event_type = 'InjectionPoint' and
wait_event = 'invalidate-catalog-snapshot-end') is not null
@@ -35,10 +33,16 @@
case
----
(1 row)
+step s1_attach_invalidate_catalog_snapshot: <... completed>
+injection_points_attach
+-----------------------
+
+(1 row)
+
step s5_noop:
<waiting ...>
step s3_start_create_index:
CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_special_duplicate ON test.tbl(abs(i)) WHERE i < 10000;
<waiting ...>
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Los minutos y los segundos son mercadería de la ciudad, donde un infeliz se
afana por no perder ni siquiera un segundo y no advierto que obrando de ese
modo pierde una vida." ("La vuelta de Don Camilo", G. Guareschi)
Вложения
Hello! On Wed, Nov 26, 2025 at 7:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > We ran into one more problem with the new test, evidenced by timeouts by > buildfarm member prion. For CATCACHE_FORCE_RELEASE builds on two of the > tests, we get a few invalidations of the catalog snapshot ahead of what > we expect, and because we have an injection point to sleep there, those > tests get stuck. Oh, I missed that. Non-yet pushed tests are probably affected too. > Here's one possible fix. I had to take the attach operation on > invalidate-catalog-snapshot-end to a new step of s1, instead of > occurring in the setup block. I understand that this is because no step > can run until the setup of all steps completes, so if one setup gets > stuck, we're out of luck. And then, session s4 can do a conditional > wakeup of session s1. I have tried to move the setup of invalidate-catalog-snapshot-end to s1_start_upsert as the first command - but for some reason it wasn't working the way I expected. But maybe I missed something. > Patch attached. Thoughts? Solution seems reasonable to me, another related ideas: * replace "select case when" with function like injection_points_wakeup_if_waiting to avoid the possible race between select and wake up (but AFAIK it is not possible in the current case) * introduce some injection_points function to enter "ignore all runs, but still allowed to attach/detach" mode and "normal" mode.. As first command of setup - enter such "setup mode", as last - back to normal. > Maybe there's some other way to go about this -- for instance I > considered the idea of moving the injection point somewhere else from > InvalidateCatalogSnapshot(). I don't have any ideas about that though, > but I'm willing to listen if anybody has any. AFAIU it is the only place. Best regard, Mikhail.
Hi, On 2025-Nov-27, Mihail Nikalayeu wrote: > On Wed, Nov 26, 2025 at 7:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > We ran into one more problem with the new test, evidenced by timeouts by > > buildfarm member prion. For CATCACHE_FORCE_RELEASE builds on two of the > > tests, we get a few invalidations of the catalog snapshot ahead of what > > we expect, and because we have an injection point to sleep there, those > > tests get stuck. > > Oh, I missed that. Non-yet pushed tests are probably affected too. Yeah, I suspect as much. > > Here's one possible fix. > > I have tried to move the setup of invalidate-catalog-snapshot-end to > s1_start_upsert as the first command - but for some reason it wasn't > working the way I expected. But maybe I missed something. Right, this is why I said that this is one possible fix. I mean, maybe there are other ways to fix it. I'm not sure it's the simplest or the most robust, but I don't want to spend too much time looking for other ways either. > Solution seems reasonable to me, another related ideas: > * replace "select case when" with function like > injection_points_wakeup_if_waiting to avoid the possible race between > select and wake up (but AFAIK it is not possible in the current case) > * introduce some injection_points function to enter "ignore all runs, > but still allowed to attach/detach" mode and "normal" mode.. As first > command of setup - enter such "setup mode", as last - back to normal. Ah, I had thought about the first one of these ideas, but not the second one. I noted both in the commit message, in case somebody is motivated to implement them. Thanks for reviewing. I have pushed it now. Looking at the next one. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today" (Mary Gardiner)
Hello, Here's a slightly different approach for the fix proposed in your 0003. I wasn't happy with the idea of opening all indexes twice in infer_arbiter_indexes(), so I instead made it collect all Relations from those indexes in an initial loop, then process them in the two places that wanted them, and we close them all again together. I think this also makes the code clearer. We no longer have the "next" goto label to close the index at the bottom of the loop, but instead we can just do "continue" cleanly. I also rewrote some comments. I may not have done all the edits I wanted, but ran out of time today and I think this is in pretty good shape. I tried under CATCACHE_FORCE_RELEASE and saw no problems. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Вложения
Please also check my (very raw) proposal in the thread "Revisiting
{CREATE INDEX, REINDEX} CONCURRENTLY improvements"
TL;DR - use logical decoding for adding index entries for tuples added
during CIC.
Maybe this also makes the issue with ON CONFLICT UPDATE and REINDEX
CONCURRENTLY go away.
On Fri, Nov 28, 2025 at 6:30 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> Hello,
>
> Here's a slightly different approach for the fix proposed in your 0003.
> I wasn't happy with the idea of opening all indexes twice in
> infer_arbiter_indexes(), so I instead made it collect all Relations from
> those indexes in an initial loop, then process them in the two places
> that wanted them, and we close them all again together. I think this
> also makes the code clearer. We no longer have the "next" goto label to
> close the index at the bottom of the loop, but instead we can just do
> "continue" cleanly.
>
> I also rewrote some comments. I may not have done all the edits I
> wanted, but ran out of time today and I think this is in pretty good
> shape.
>
> I tried under CATCACHE_FORCE_RELEASE and saw no problems.
>
>
> --
> Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hello! On Fri, Nov 28, 2025 at 6:30 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I wasn't happy with the idea of opening all indexes twice in > infer_arbiter_indexes(), so I instead made it collect all Relations from > those indexes in an initial loop, then process them in the two places > that wanted them, and we close them all again together. I think this > also makes the code clearer. We no longer have the "next" goto label to > close the index at the bottom of the loop, but instead we can just do > "continue" cleanly. Yes, agreed - that looks better than my version. Few moments: > Second, if an attribute list was specified in the ON >* CONFLICT clause, we use the list to find the indexes whose attributes >* match that list. I think we may notice expressions and predicates also. > /* > * Find the named constraint index to extract its attributes and > * predicates. > */ > foreach_ptr(RelationData, idxRel, indexRelList) Should we consider assert to ensure we have actually found something? Best regards, Mikhail.
Hi!
On Fri, Nov 28, 2025 at 7:09 PM Hannu Krosing <hannuk@google.com> wrote:
> Please also check my (very raw) proposal in the thread "Revisiting
> {CREATE INDEX, REINDEX} CONCURRENTLY improvements"
>
> TL;DR - use logical decoding for adding index entries for tuples added
> during CIC.
No, the current issue is caused by another reason - it is more about
switching from old to new index.
Best regards,
Mikhail.
Also, I think it is good to mention changes in parse_clause.c in the commit message.
On 2025-11-28, Mihail Nikalayeu wrote: > Also, I think it is good to mention changes in parse_clause.c in the > commit message. Oh rats, no, thanks for noticing, that's a mostly unrelated change that I intend to commit separately, mentioned in anotherthread, that I happened to merge here accidentally. -- Álvaro Herrera
Hi Mihail, Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the attribute numbers, in case the partition has a different column layout than the parent (e.g. in case there are dropped columns or just different column orders) Regards -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "We have labored long to build a heaven, only to find it populated with horrors" (Prof. Milton Glass)
Hello! On Sat, Nov 29, 2025 at 6:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the > attribute numbers, in case the partition has a different column layout > than the parent (e.g. in case there are dropped columns or just > different column orders) Oh, yes, you're right. I'll prepare an updated version (also it looks like some inner loops may be refactored in a more effective way). Best regards, Mikhail.
Hello, Álvaro! On Sun, Nov 30, 2025 at 1:11 PM Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote: > On Sat, Nov 29, 2025 at 6:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the > > attribute numbers, in case the partition has a different column layout > > than the parent (e.g. in case there are dropped columns or just > > different column orders) > > Oh, yes, you're right. I'll prepare an updated version (also it looks > like some inner loops may be refactored in a more effective way). I was wrong, function is called only for indexes from the same relation (actual partition). But anyway I reworked the commit - it is much clearer now (and a little bit more effective). Best regards, Mikhail.
Вложения
Hello Alvaro,
27.11.2025 14:32, Álvaro Herrera wrote:
> Thanks for reviewing. I have pushed it now. Looking at the next one.
I've noticed two failures from skink you could find interesting: [1], [2].
The difference from [2]:
ok 3 - syscache-update-pruned 94198 ms
not ok 4 - index-concurrently-upsert 14008 ms
ok 5 - reindex-concurrently-upsert 14379 ms
--- /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/modules/injection_points/expected/index-concurrently-upsert.out
2025-11-27 13:38:19.513528475 +0100
+++
/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/injection_points/isolation/results/index-concurrently-upsert.out
2025-11-30 00:10:01.697938769 +0100
@@ -107,6 +107,7 @@
(1 row)
s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
+s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
step s1_start_upsert: <... completed>
step s2_start_upsert: <... completed>
step s3_start_create_index: <... completed>
I've managed to reproduce something similar to this diff when running
multiple test instances under Valgrind with parallel. With the attached
patch applied:
for i in {1..40}; do cp -r src/test/modules/injection_points/ src/test/modules/injection_points_$i/; sed -i.bak
"s|src/test/modules/injection_points|src/test/modules/injection_points_$i|"
src/test/modules/injection_points_$i/Makefile; done
make -s check -C src/test/modules/injection_points
and/tmp/extra.config containing:
log_min_messages = INFO
backtrace_functions = 'injection_notice'
for i in {1..10}; do np=5; echo "ITERATION $i"; parallel -j40 --linebuffer --tag PROVE_TESTS="t/099*" NO_TEMP_INSTALL=1
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/modules/injection_points_{} ::: `seq $np` || break; done
gives me:
ITERATION 1
...
4 # Failed test 'regression tests pass'
4 # at t/099_isolation_regress.pl line 52.
4 # got: '256'
4 # expected: '0'
src/test/modules/injection_points_4/tmp_check/log/regress_log_099_isolation_regress contains:
...
ok 11 - index-concurrently-upsert 5282 ms
not ok 12 - index-concurrently-upsert 6347 ms
ok 13 - index-concurrently-upsert 5723 ms
...
--- .../src/test/modules/injection_points_4/expected/index-concurrently-upsert.out 2025-11-30 14:24:29.385133831 +0200
+++ .../src/test/modules/injection_points_4/tmp_check/results/index-concurrently-upsert.out 2025-11-30
16:22:29.168920744 +0200
@@ -78,6 +78,7 @@
(1 row)
+s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end
step s4_wakeup_s2:
SELECT injection_points_detach('exec-insert-before-insert-speculative');
SELECT injection_points_wakeup('exec-insert-before-insert-speculative');
=== EOF ===
I can see in the following stack trace for this extra notice:
2025-11-30 16:22:28.465 EET|law|isolation_regression|692c531f.bb652|NOTICE: notice triggered for injection point
pre-invalidate-catalog-snapshot-end
2025-11-30 16:22:28.465 EET|law|isolation_regression|692c531f.bb652|BACKTRACE:
injection_notice at injection_points.c:278:3
InjectionPointRun at injection_point.c:555:1
InvalidateCatalogSnapshot at snapmgr.c:463:3
LocalExecuteInvalidationMessage at inval.c:831:4
ReceiveSharedInvalidMessages at sinval.c:113:18
AcceptInvalidationMessages at inval.c:970:23
LockRelationOid at lmgr.c:137:3
relation_open at relation.c:58:6
table_open at table.c:44:6
SearchCatCacheMiss at catcache.c:1550:13
SearchCatCacheInternal at catcache.c:1495:9
SearchCatCache1 at catcache.c:1368:1
SearchSysCache1 at syscache.c:227:1
build_coercion_expression at parse_coerce.c:852:8
coerce_type at parse_coerce.c:433:13
coerce_to_target_type at parse_coerce.c:105:11
transformAssignedExpr at parse_target.c:580:4
transformInsertRow at analyze.c:1121:10
transformInsertStmt at analyze.c:988:14
transformStmt at analyze.c:364:13
transformOptionalSelectInto at analyze.c:317:1
transformTopLevelStmt at analyze.c:266:11
parse_analyze_fixedparams at analyze.c:134:10
pg_analyze_and_rewrite_fixedparams at postgres.c:687:10
exec_simple_query at postgres.c:1195:20
PostgresMain at postgres.c:4777:6
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3598:8
ServerLoop at postmaster.c:1716:10
PostmasterMain at postmaster.c:1403:11
main at main.c:236:2
I also observed:
2025-11-30 15:33:45.746 EET|law|isolation_regression|692c47b4.675e7|NOTICE: notice triggered for injection point
pre-invalidate-catalog-snapshot-end
2025-11-30 15:33:45.746 EET|law|isolation_regression|692c47b4.675e7|BACKTRACE:
injection_notice at injection_points.c:278:3
InjectionPointRun at injection_point.c:555:1
InvalidateCatalogSnapshot at snapmgr.c:463:3
LocalExecuteInvalidationMessage at inval.c:831:4
ReceiveSharedInvalidMessages at sinval.c:113:18
AcceptInvalidationMessages at inval.c:970:23
LockRelationOid at lmgr.c:137:3
relation_open at relation.c:58:6
table_open at table.c:44:6
CatalogCacheInitializeCache at catcache.c:1131:13
ConditionalCatalogCacheInitializeCache at catcache.c:1092:1
SearchCatCacheInternal at catcache.c:1424:15
SearchCatCache1 at catcache.c:1368:1
SearchSysCache1 at syscache.c:227:1
check_enable_rls at rls.c:66:10
get_row_security_policies at rowsecurity.c:130:15
fireRIRrules at rewriteHandler.c:2218:3
QueryRewrite at rewriteHandler.c:4581:11
pg_rewrite_query at postgres.c:822:20
pg_analyze_and_rewrite_fixedparams at postgres.c:696:19
exec_simple_query at postgres.c:1195:20
PostgresMain at postgres.c:4777:6
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3598:8
ServerLoop at postmaster.c:1716:10
PostmasterMain at postmaster.c:1403:11
main at main.c:236:2
Could you please look if this can be fixed?
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-11-25%2021%3A55%3A00
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-11-29%2021%3A51%3A13
Best regards,
Alexander
Вложения
Hello! On Sun, Nov 30, 2025 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end > +s1: NOTICE: notice triggered for injection point pre-invalidate-catalog-snapshot-end > step s1_start_upsert: <... completed> > step s2_start_upsert: <... completed> > step s3_start_create_index: <... completed> Oh, I'm afraid it may become an endless fight.... I think it is better to implement something in isolationtester to natively support the case from [0] (in such case we don't need NOTICE on .pre-invalidate-catalog-snapshot-end). Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com
Hello! On Sun, Nov 30, 2025 at 6:26 PM Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote: > I think it is better to implement something in isolationtester to > natively support the case from [0] (in such case we don't need NOTICE > on .pre-invalidate-catalog-snapshot-end). I think I have implemented a better solution - without any additional NOTICE. It just actually waits for the other backend to hang on the injection point. Attached (together with other changes).
Вложения
On 2025-Dec-01, Mihail Nikalayeu wrote: > From af5e27d4150dd53d313122c02da7ce4d3c07f332 Mon Sep 17 00:00:00 2001 > From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com> > Date: Sun, 30 Nov 2025 16:48:55 +0100 > Subject: [PATCH v16 1/3] Modify the ExecInitPartitionInfo function to consider > partitioned indexes that are potentially processed by REINDEX CONCURRENTLY > as arbiters as well. > > This is necessary to ensure that all concurrent transactions use the same set of arbiter indexes. Thanks, pushed this one after some more editorialization. I rewrote most comments and renamed variables, but I also changed the code somewhat. For instance I made it use the ResultRelInfo's array of indexes instead of doing RelationGetIndexList, because it seemed to match better with the usage of the list index to match the indexes in the array again later. Maybe now would be a good time to dust off your stress tests and verify that everything is still working as intended. In doing these changes, I realized that there are several places in the code (this one, but also others) that are using get_partition_ancestors(), which I think might be a somewhat expensive routine. It obtains ancestors by recursing up the hierarchy, and at each step does an indexscan on pg_inherits. I bet this is not very nice for performance. I bet we can make this visible in a profile with an inheritance hierarchy not terribly deep and a few hundred partitions. We currently don't have a syscache on pg_inherits, as far as I understand because back then we didn't think it was going to give us any advantages, but maybe it would be useful for this and other operations. If not a syscache, then maybe a different way to cache ancestors for a relation, perhaps straight in the relcache entry. Thanks for working on this, -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
On 2025-Dec-01, Mihail Nikalayeu wrote: > I think I have implemented a better solution - without any additional NOTICE. > It just actually waits for the other backend to hang on the injection point. Pushed. I changed it to be a loop in a DO block rather than a procedure recursively calling itself, though, which seemed strangely complicated. Should be pretty much the same, I hope. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
> From 5dc3e4eb50e445a291a13663fc9ce93d0db96b1c Mon Sep 17 00:00:00 2001 > From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com> > Date: Sun, 30 Nov 2025 16:49:20 +0100 > Subject: [PATCH v16 2/3] Revert "Doc: cover index CONCURRENTLY causing errors > in INSERT ... ON CONFLICT." > > This reverts commit 8b18ed6dfbb8b3e4483801b513fea6b429140569. Pushed this one also, and marked the commitfest item as committed. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hello, Álvaro! > Maybe now would be a good time to dust off your > stress tests and verify that everything is still working as intended. Done, passing without any issues. > I bet we can make this visible in a profile with an > inheritance hierarchy not terribly deep and a few hundred partitions. I'll put it on the TODO list for some day. > Pushed this one also, and marked the commitfest item as committed. Thanks for pushing\reviewing\rewriting it all! Best regards, Mikhail.
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Pushed this one also, and marked the commitfest item as committed.
BF member prion has been failing since 5dee7a603 went in. Apparently
that solution doesn't work under -DRELCACHE_FORCE_RELEASE and/or
-DCATCACHE_FORCE_RELEASE (so I'm expecting the CLOBBER_CACHE_ALWAYS
animals to fail too, whenever they next report).
regards, tom lane
Hi, Tom! On Wed, Dec 3, 2025 at 5:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > BF member prion has been failing since 5dee7a603 went in. Apparently > that solution doesn't work under -DRELCACHE_FORCE_RELEASE and/or > -DCATCACHE_FORCE_RELEASE (so I'm expecting the CLOBBER_CACHE_ALWAYS > animals to fail too, whenever they next report). Oh, my bad, sorry. Attached patch with output variant for RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case. But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different kinds of wakeup loops need to be done. So, my proposals: * either drop that test - currently it looks like a test that will fail 99% of time due to relcache changes instead of actual issue * either add some kind of mechanics to skip that test in case of particular build arguments * either replace it by some kind of small stress test, without any injection_points Best regards, Mikhail.
Вложения
On 2025-Dec-03, Mihail Nikalayeu wrote: > Oh, my bad, sorry. > Attached patch with output variant for > RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case. Thanks. > But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different > kinds of wakeup loops need to be done. Hmm, maybe we can do "SET debug_discard_caches=0" in the setup block of the sessions that need it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La victoria es para quien se atreve a estar solo"
On 2025-Dec-03, Mihail Nikalayeu wrote: > Oh, my bad, sorry. > Attached patch with output variant for > RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case. I have pushed this, thanks. > But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different > kinds of wakeup loops need to be done. I ran the tests under CLOBBER_CACHE_ALWAYS, and as far as I can tell, they succeed. We'll see what the CLOBBER_CACHE_ALWAYS members say ... assuming the other problems [1] can be fixed. [1] https://postgr.es/m/baf1ae02-83bd-4f5d-872a-1d04f11a9073@vondra.me -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If you want to have good ideas, you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away." (Linus Pauling)
The "partitioned" test failed recently also on flaviventris and adder: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2025-12-04%2017%3A28%3A20 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2025-12-03%2009%3A23%3A08 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2025-12-02%2008%3A48%3A24 diff -U3 /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/modules/injection_points/expected/reindex-concurrently-upsert-partitioned.out /home/bf/bf-build/flaviventris/HEAD/pgsql.build/testrun/injection_points/isolation/results/reindex-concurrently-upsert-partitioned.out --- /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/modules/injection_points/expected/reindex-concurrently-upsert-partitioned.out 2025-12-02 14:28:16.233200773 +0100 +++ /home/bf/bf-build/flaviventris/HEAD/pgsql.build/testrun/injection_points/isolation/results/reindex-concurrently-upsert-partitioned.out 2025-12-04 18:31:29.340033507 +0100 @@ -61,7 +61,6 @@ (1 row) -step s1_start_upsert: <... completed> step s4_wakeup_s2: SELECT injection_points_detach('exec-insert-before-insert-speculative'); SELECT injection_points_wakeup('exec-insert-before-insert-speculative'); @@ -76,6 +75,7 @@ (1 row) +step s1_start_upsert: <... completed> step s2_start_upsert: <... completed> step s3_start_reindex: <... completed> -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ […] indem ich in meinem Leben oft an euch gedacht, euch glücklich zu machen. Seyd es! A menudo he pensado en vosotros, en haceros felices. ¡Sedlo, pues! Heiligenstädter Testament, L. v. Beethoven, 1802 https://de.wikisource.org/wiki/Heiligenstädter_Testament
Hello! Seems like it may be easily fixed (see attached patch). Bwt, is it possible to somehow run the whole buildfarm over some branch? Such way it will be possible to fix such issues much earlier (some of them catched by github CI, but not all). Best regards, Mikhail.
Вложения
On 2025-Dec-04, Mihail Nikalayeu wrote: > Hello! > > Seems like it may be easily fixed (see attached patch). Makes sense -- thanks, pushed. > Bwt, is it possible to somehow run the whole buildfarm over some branch? > Such way it will be possible to fix such issues much earlier (some of > them catched by github CI, but not all). Nope. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hello Álvaro and Mihail,
02.12.2025 15:07, Álvaro Herrera wrote:
02.12.2025 15:07, Álvaro Herrera wrote:
Thanks, pushed this one after some more editorialization.
I've discovered that despite removing FIXME (in 90eae926a), the error
"invalid arbiter index list" can still be triggered with:
CREATE TABLE pt (a int PRIMARY KEY) PARTITION BY RANGE (a);
CREATE TABLE p1 PARTITION OF pt FOR VALUES FROM (1) to (2) PARTITION BY RANGE (a);
CREATE TABLE p1_1 PARTITION OF p1 FOR VALUES FROM (1) TO (2);
CREATE UNIQUE INDEX ON ONLY p1 (a);
INSERT INTO p1 VALUES (1) ON CONFLICT (a) DO NOTHING;
ERROR: XX000: invalid arbiter index list
LOCATION: ExecInitPartitionInfo, execPartition.c:863
The first commit it produced on with this script is bc32a12e0.
Best regards,
Alexander
Hello, Alexander! On Sat, Dec 6, 2025 at 7:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > I've discovered that despite removing FIXME (in 90eae926a), the error > "invalid arbiter index list" can still be triggered with: Wow, thanks for finding this. > The first commit it produced on with this script is bc32a12e0. Such commands add an indisready, but not indisvalid index on pt, which is added to to the list of potential arbiters. It happens because of [0]. From my perspective, the correct solution - is to just remove the error message, because it looks obsolete now. Or somehow calculate compensation offset differently - but I am not sure it is a good idea. Let's wait for Álvaro decision - I'll prepare the fix and regress test then. Best regards, Mikhail. [0]: https://github.com/postgres/postgres/blob/bc32a12e0db2df203a9cb2315461578e08568b9c/src/backend/commands/indexcmds.c#L1225-L1235
On 2025-Dec-06, Mihail Nikalayeu wrote: > Such commands add an indisready, but not indisvalid index on pt, which > is added to to the list of potential arbiters. > It happens because of [0]. > > From my perspective, the correct solution - is to just remove the > error message, because it looks obsolete now. Or somehow calculate > compensation offset differently - but I am not sure it is a good idea. Hmm, as I recall it's quite intentional that the index on a partitioned table is marked !indisvalid; such indexes are only supposed to be marked valid once indexes on all partitions have been attached. As I recall, if you remove that prohibition, some pg_dump scenarios fail. I'd rather consider the idea of avoiding indexes marked !indisvalid on partitioned tables as arbiter lists ... but then we need to verify the scenario where there is one, and INSERT ON CONFLICT runs concurrently with ALTER INDEX ATTACH PARTITION for the last partition lacking the index (which is the point where the index is marked indisvalid on the partitioned table). There may not be a problem with that, because we grab AccessExclusiveLock on the index partition, so no query can be running concurrently ... unless the INSERT is targeting a partition other than the one where the index is being attached. (On the partitioned table and index, we only have ShareUpdateExclusiveLock). -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Crear es tan difícil como ser libre" (Elsa Triolet)
Hello! On Sun, Dec 7, 2025 at 10:07 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I'd rather consider the idea of avoiding indexes marked !indisvalid on > partitioned tables as arbiter lists ... but then we need to verify the > scenario where there is one, and INSERT ON CONFLICT runs concurrently > with ALTER INDEX ATTACH PARTITION for the last partition lacking the > index (which is the point where the index is marked indisvalid on the > partitioned table). There may not be a problem with that, because we > grab AccessExclusiveLock on the index partition, so no query can be > running concurrently ... unless the INSERT is targeting a partition > other than the one where the index is being attached. (On the > partitioned table and index, we only have ShareUpdateExclusiveLock). For my taste it feels too complicated for such a case. What is about changing the logic of this check to the next: For each valid index used as arbiter in a partitioned table we need to have a valid in particular partition (but it is okay to also have an "ready"-only as additional). If some of the arbiters is invalid in the partitioned table (but we have valid compatible in any case) - it is okay. We just have to have an appropriate "companion" for every valid arbiter. Such a check looks correct to me, at least at the very end of the weekend. Thanks, Mikhail.
On Sun, Dec 07, 2025 at 10:07:45PM +0100, Alvaro Herrera wrote: > Hmm, as I recall it's quite intentional that the index on a partitioned > table is marked !indisvalid; such indexes are only supposed to be marked > valid once indexes on all partitions have been attached. As I recall, > if you remove that prohibition, some pg_dump scenarios fail. Right. If indisvalid is true on a partitioned table, then we are sure that all its partitions have valid indexes. If indisvalid is false, some of its partitions may have an invalid index. In the false case, things can be a bit lossy as well. For example, an ALTER TABLE ONLY could switch a partition's indisvalid to be true, without switching to true the indisvalid of its partitioned table. -- Michael
Вложения
On 2025-Dec-06, Mihail Nikalayeu wrote: > From my perspective, the correct solution - is to just remove the > error message, because it looks obsolete now. Rereading this -- did you mean to propose that a possible fix was to remove the "invalid arbiter index list" error? I had understood something different. Your idea downthread of changing the way that check works (so that we don't throw an error in this case, but we continue to double-check that the arbiter list is sensible) sounds reasonable to me. Do you want to propose a specific check for it? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ <inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell <crab> inflex: you know that "amalgam" means "mixture with mercury", more or less, right? <crab> i.e., "deadly poison"
Hello, Álvaro! On Mon, Dec 8, 2025 at 10:58 AM Álvaro Herrera <alvherre@kurilemu.de> wrote: > Rereading this -- did you mean to propose that a possible fix was to > remove the "invalid arbiter index list" error? I had understood > something different. Yes, it was the initial idea. > Your idea downthread of changing the way that check works (so that we > don't throw an error in this case, but we continue to double-check that > the arbiter list is sensible) sounds reasonable to me. Do you want to > propose a specific check for it? I think the next logic is correct: * for each IS indisvalid arbiter in the parent table we should have AT LEAST ONE compatible indisvalid pair in the partition (we may have multiple or a few more ready-only) * for each NOT indisvalid arbiter in parent - nothing is expected from partition I'll try to create a patch with such later. Best regards, Mikhail.
Hello! After some investigation I ended up with a much simpler fix. It is self-explanatory in code and a commit message. Best regards, Mikhail.
Вложения
On 2025-Dec-09, Mihail Nikalayeu wrote: > Hello! > > After some investigation I ended up with a much simpler fix. > It is self-explanatory in code and a commit message. Yeah, that makes sense. It's what I was trying to say in https://postgr.es/m/202512072050.hcyysny65ugj@alvherre.pgsql Pushed your patch, thanks. I still wonder if it's possible to break things by doing something like CREATE TABLE pt (a int) PARTITION BY LISt (a); CREATE TABLE p1 PARTITION OF pt FOR VALUES IN (1); CREATE TABLE p2 PARTITION OF pt FOR VALUES IN (2); CREATE UNIQUE INDEX pti ON ONLY pt (a); CREATE UNIQUE INDEX p1i ON p1 (a); CREATE UNIQUE INDEX p2i ON p2 (a); ALTER INDEX pti ATTACH PARTITION p1i; and then do the INSERT INTO pt VALUES (1) ON CONFLICT (a) DO NOTHING; dance concurrently with ALTER INDEX pti ATTACH PARTITION p2i; This would be a much smaller problem than the already fixed ones though, I think. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Learn about compilers. Then everything looks like either a compiler or a database, and now you have two problems but one of them is fun." https://twitter.com/thingskatedid/status/1456027786158776329
Hi, I just saw a failure in CI for an unrelated patch, https://api.cirrus-ci.com/v1/artifact/task/4641105595072512/testrun/build/testrun/injection_points/isolation/regression.diffs diff -U3 /home/postgres/postgres/src/test/modules/injection_points/expected/reindex-concurrently-upsert.out /home/postgres/postgres/build/testrun/injection_points/isolation/results/reindex-concurrently-upsert.out --- /home/postgres/postgres/src/test/modules/injection_points/expected/reindex-concurrently-upsert.out 2025-12-11 12:41:23.982167232+0000 +++ /home/postgres/postgres/build/testrun/injection_points/isolation/results/reindex-concurrently-upsert.out 2025-12-1112:45:24.038078804 +0000 @@ -62,22 +62,13 @@ (1 row) step s1_start_upsert: <... completed> +step s2_start_upsert: <... completed> +step s3_start_reindex: <... completed> step s4_wakeup_s2: SELECT injection_points_detach('exec-insert-before-insert-speculative'); SELECT injection_points_wakeup('exec-insert-before-insert-speculative'); -injection_points_detach ------------------------ - -(1 row) - -injection_points_wakeup ------------------------ - -(1 row) - -step s2_start_upsert: <... completed> -step s3_start_reindex: <... completed> +ERROR: could not detach injection point "exec-insert-before-insert-speculative" starting permutation: s3_setup_wait_before_swap s3_start_reindex s1_start_upsert s4_wakeup_to_swap s2_start_upsert s4_wakeup_s2s4_wakeup_s1 injection_points_attach I haven't seen anywhere else. I don't believe this to be a problem in the patch, because the patch is about partitions and this test case does not involve partitioned tables -- so I'm inclined to think it's a timing issue in the test itself. If so, this can probably be fixed with additional constraints on on of the steps in the permutation ... probably maybe s2_start_upsert depend on s4_wakeup_s2? But since I can't easily reproduce this, I have no idea if this is a valid solution. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hello, Álvaro! On Thu, Dec 11, 2025 at 10:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I just saw a failure in CI for an unrelated patch I'll try to dive deeper tomorrow to find a fix, but it feels like we are doing something wrong here. The tests were good to prove the issue and demonstrate it was fixed after some changes. But currently we are just trying (not the first time already) to make sure OUTPUT of the test is EXACTLY equal to some variant. At the same time I think a more correct approach here - is to test something like "output does not contain `duplicate key value violates unique constraint` message". Or even better real case - pgbench of concurrent REINDEX + INSERT (takes seconds to reproduce, but CPU is high). It is a way to test something essential what we want to be not broken, not exact output of concurrent commands.... But current isolationtester does not support anything like that. I am afraid amount of time needed to stabilize such test (in its output, not the sense) is not cover potential value of it. Also, I imaging someone changing something unrelated (catalog snapshot invalidation, for example) and test starts to fail on some rear animal once a week.... Ughn. Maybe I am inclined by my main programming experience (Java, backends, distributed systems, etc.) and databases need to be much more accurate and strict even if it pains... What do you think about it? Best regards, Mikhail.
On 2025-Dec-12, Mihail Nikalayeu wrote: > Hello, Álvaro! > > On Thu, Dec 11, 2025 at 10:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > I just saw a failure in CI for an unrelated patch > > I'll try to dive deeper tomorrow to find a fix, but it feels like we > are doing something wrong here. Hmm, this is a good point. > But currently we are just trying (not the first time already) to make > sure OUTPUT of the test is EXACTLY equal to some variant. A low-cost option might be to add alternative expected file(s), which matches other variant(s). I think trying to make isolationtester "smart match" the output might be more complicated than is warranted. > I am afraid amount of time needed to stabilize such test (in its > output, not the sense) is not cover potential value of it. Yeah, could be. > Also, I imaging someone changing something unrelated (catalog snapshot > invalidation, for example) and test starts to fail on some rear animal > once a week.... Ughn. Another idea might be to rewrite these tests using BackgroundPsql under the TAP infrastructure. That's quite a bit more tedious to write, but we can be more precise on detecting whether some particular error message was thrown or not. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep." (Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
Hello, Álvaro! On Fri, Dec 12, 2025 at 11:17 AM Álvaro Herrera <alvherre@kurilemu.de> wrote: > Another idea might be to rewrite these tests using BackgroundPsql under > the TAP infrastructure. That's quite a bit more tedious to write, but > we can be more precise on detecting whether some particular error > message was thrown or not. I think I understood the race, currently thinking about two possible approaches: 1) extract LOOP waiting for injection point into some function like "injection_points_await_waiter" and add it almost between each steps to ensure it all executed by guardrails (effectively reducing concurrency instead of making isolationtester to report data the same way) 2) rewrite using TAP infra What do you think about this? Which one do you prefer? Best regards, Mikhail.
Hi, On 2025-12-11 22:36:55 +0100, Álvaro Herrera wrote: > I just saw a failure in CI for an unrelated patch, There are a *lot* of these right now. CI on master currently has a failure rate of 14 failures / 50 runs on the postgres/postgres repo. All of the failures are due to this issue from what I can tell. That means there are a lot of false failures for cfbot. Could we perhaps disable this test until it's been rewritten to be stable? Greetings, Andres Freund
Hello, Andres! On Sun, Dec 14, 2025 at 4:19 PM Andres Freund <andres@anarazel.de> wrote: > There are a *lot* of these right now. CI on master currently has a failure > rate of 14 failures / 50 runs on the postgres/postgres repo. All of the > failures are due to this issue from what I can tell. That means there are a > lot of false failures for cfbot. Oh, that's sad :( > Could we perhaps disable this test until it's been rewritten to be stable? Of course, if you can do it yourself, feel free to do so for all test from the set: index-concurrently-upsert index-concurrently-upsert-predicate reindex-concurrently-upsert reindex-concurrently-upsert-on-constraint reindex-concurrently-upsert-partitioned You may even remove them for now - I plan to rewrite them using TAP infrastructure, because "endless fight" from [0] seems to be true now. [0]: https://www.postgresql.org/message-id/flat/CADzfLwUhvaEsPbYaT3Z5cMO779JruDqBbE5nijeaBcXiNPoCYw%40mail.gmail.com#97598dff6875c788765e62b18bc3ee63
On 2025-Dec-14, Mihail Nikalayeu wrote: > Of course, if you can do it yourself, feel free to do so for all test > from the set: > > index-concurrently-upsert > index-concurrently-upsert-predicate > reindex-concurrently-upsert > reindex-concurrently-upsert-on-constraint > reindex-concurrently-upsert-partitioned I've commented them out from the meson/makefile for now. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)