Обсуждение: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

Поиск
Список
Период
Сортировка

Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, hackers.

While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.

After some time, I have realized the same issue persists on the master branch as well :)

I have prepared two TAP tests to reproduce the issues (2), also in attachment.

First one, does the next thing:

    CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
    CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster

Then it runs next scripts with pgbench concurrently:

    1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
    2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
    3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();

Also, during pgbench the next command is run in the loop:

    REINDEX INDEX CONCURRENTLY tbl_pkey;

For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:

                make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'

                # waiting for an about 3000, now is 2174, seconds passed : 84
                # waiting for an about 3000, now is 2175, seconds passed : 84
                # waiting for an about 3000, now is 2176, seconds passed : 84
                # waiting for an about 3000, now is 2177, seconds passed : 84
                # waiting for an about 3000, now is 2178, seconds passed : 84
                # waiting for an about 3000, now is 2179, seconds passed : 84
                # waiting for an about 3000, now is 2180, seconds passed : 84
                # waiting for an about 3000, now is 2181, seconds passed : 84
                # waiting for an about 3000, now is 2182, seconds passed : 84
                # waiting for an about 3000, now is 2183, seconds passed : 84
                # waiting for an about 3000, now is 2184, seconds passed : 84

                #   Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
                #   at t/006_concurrently_unique_fail.pl line 69.

                #   Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
                #   at t/006_concurrently_unique_fail.pl line 69.
                #                   'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # client 15 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(13) already exists.
                #  error: client 1 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: Run was aborted; the above results are incomplete.
                # '

Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.

-------------------------------------

The second test shows an interesting live lock state in the similar situation.

    CREATE UNLOGGED TABLE tbl(i int primary key, n int);
    CREATE INDEX idx ON tbl(i, n);
    INSERT INTO tbl VALUES(13,1);

pgbench concurrently runs single command

    INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;

And also reindexing in the loop

    REINDEX INDEX CONCURRENTLY tbl_pkey;

After the start, a little bit strange issue happens

     make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'

    # going to start reindex, num tuples in table is 1
        # reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
        # going to start reindex, num tuples in table is 7
        # reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
        # going to start reindex, num tuples in table is 647
    #  current n is 808, 808 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 811, 3 per one second
    #  current n is 917, 106 per one second
    #  current n is 1024, 107 per one second
        # reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
        # going to start reindex, num tuples in table is 1136
    #  current n is 1257, 233 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1490, 233 per one second
        # reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
        # going to start reindex, num tuples in table is 1566

In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).

Best regards,
Mikhail.

[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Tue, Jun 11, 2024 at 01:00:00PM +0200, Michail Nikolaev wrote:
> Probably something wrong with arbiter index selection for different
> backends. I am afraid it could be a symptom of a more serious issue.

ON CONFLICT selects an index that may be rebuilt in parallel of the
REINDEX happening, and its contents may be incomplete.  Isn't the
issue that we may select as arbiter indexes stuff that's !indisvalid?
Using the ccnew or ccold indexes would not be correct for the conflict
resolutions.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, Michael.

> Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid?
As far as I can see (1) !indisvalid indexes are filtered out.

But... It looks like this choice is not locked in any way (2), so index_concurrently_swap or index_concurrently_set_dead can change this index after the decision is made, even despite WaitForLockersMultiple (3). In some cases, it may cause a data loss...
But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello.

> But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.
It seems like the assumption is correct - we may use an invalid index as arbiter due to race condition.

The attached patch adds a check for that case, and now the test fails like this:

    # pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
    # DETAIL:  Key (i)=(42) already exists.
    # pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR:  ON CONFLICT does not support invalid indexes as arbiters
    # pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
    # DETAIL:  Key (i)=(69) already exists.
    # pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR:  ON CONFLICT does not support invalid indexes as arbiters
    # pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR:  ON CONFLICT does not support invalid indexes as arbiters
    # pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR:  ON CONFLICT does not support invalid indexes as arbiters

I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, everyone.

> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.

So, seems like I have found the source of the problem:

1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.
It does no lock selected indexes in any additional way which prevents index_concurrently_swap changing them (set and clear validity).

                RelationGetIndexList relcache.c:4857
                infer_arbiter_indexes plancat.c:780
                make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
                create_modifytable_plan createplan.c:2826
                create_plan_recurse createplan.c:532
                create_plan createplan.c:349
                standard_planner planner.c:421
                planner planner.c:282
                pg_plan_query postgres.c:904
                pg_plan_queries postgres.c:996
                exec_simple_query postgres.c:1193

2) other backend marks some index as invalid and commits

                index_concurrently_swap index.c:1600
                ReindexRelationConcurrently indexcmds.c:4115
                ReindexIndex indexcmds.c:2814
                ExecReindex indexcmds.c:2743
                ProcessUtilitySlow utility.c:1567
                standard_ProcessUtility utility.c:1067
                ProcessUtility utility.c:523
                PortalRunUtility pquery.c:1158
                PortalRunMulti pquery.c:1315
                PortalRun pquery.c:791
                exec_simple_query postgres.c:1274

3) first backend invalidates catalog snapshot because transactional snapshot

                InvalidateCatalogSnapshot snapmgr.c:426
                GetTransactionSnapshot snapmgr.c:278
                PortalRunMulti pquery.c:1244
                PortalRun pquery.c:791
                exec_simple_query postgres.c:1274

4) first backend copies indexes selected using previous catalog snapshot

                ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
                ExecInitNode execProcnode.c:177
                InitPlan execMain.c:966
                standard_ExecutorStart execMain.c:261
                ExecutorStart execMain.c:137
                ProcessQuery pquery.c:155
                PortalRunMulti pquery.c:1277
                PortalRun pquery.c:791
                exec_simple_query postgres.c:1274

5) then reads indexes using new fresh snapshot

              RelationGetIndexList relcache.c:4816
              ExecOpenIndices execIndexing.c:175
              ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
              ExecModifyTable nodeModifyTable.c:4059
              ExecProcNodeFirst execProcnode.c:464
              ExecProcNode executor.h:274
              ExecutePlan execMain.c:1646
              standard_ExecutorRun execMain.c:363
              ExecutorRun execMain.c:304
              ProcessQuery pquery.c:160
              PortalRunMulti pquery.c:1277
              PortalRun pquery.c:791
              exec_simple_query postgres.c:1274

5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)

            ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
            ............

            ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
            ExecModifyTable nodeModifyTable.c:4059
            ExecProcNodeFirst execProcnode.c:464
            ExecProcNode executor.h:274
            ExecutePlan execMain.c:1646
            standard_ExecutorRun execMain.c:363
            ExecutorRun execMain.c:304
            ProcessQuery pquery.c:160
            PortalRunMulti pquery.c:1277
            PortalRun pquery.c:791
            exec_simple_query postgres.c:1274


I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.

The simplest possible fix is to use ShareLock instead ShareUpdateExclusiveLock in the index_concurrently_swap

            oldClassRel = relation_open(oldIndexId, ShareLock);
            newClassRel = relation_open(newIndexId, ShareLock);

But this is not a "concurrent" way. But such update should be fast enough as far as I understand.

Best regards,
Mikhail.
Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The simplest possible fix is to use ShareLock
> instead ShareUpdateExclusiveLock in the index_concurrently_swap
>
>             oldClassRel = relation_open(oldIndexId, ShareLock);
>             newClassRel = relation_open(newIndexId, ShareLock);
>
> But this is not a "concurrent" way. But such update should be fast enough
> as far as I understand.

Nope, that won't fly far.  We should not use a ShareLock in this step
or we are going to conflict with row exclusive locks, impacting all
workloads when doing a REINDEX CONCURRENTLY.

That may be a long shot, but the issue is that we do the swap of all
the indexes in a single transaction, but do not wait for them to
complete when committing the swap's transaction in phase 4.  Your
report is telling us that we really have a good reason to wait for all
the transactions that may use these indexes to finish.  One thing
coming on top of my mind to keep things concurrent-safe while allowing
a clean use of the arbiter indexes would be to stick a
WaitForLockersMultiple() on AccessExclusiveLock just *before* the
transaction commit of phase 4, say, lacking the progress report part:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
         CommandCounterIncrement();
     }

+    WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
+
     /* Commit this transaction and make index swaps visible */
     CommitTransactionCommand();
     StartTransactionCommand();

This is a non-fresh Friday-afternoon idea, but it would make sure that
we don't have any transactions using the indexes switched to _ccold
with indisvalid that are waiting for a drop in phase 5.  Your tests
seem to pass with that, and that keeps the operation intact
concurrent-wise (I'm really wishing for isolation tests with injection
points just now, because I could use them here).

> +        Assert(indexRelation->rd_index->indislive);
> +        Assert(indexRelation->rd_index->indisvalid);
> +
>          if (!indexRelation->rd_index->indimmediate)
>              ereport(ERROR,
>                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

This kind of validation check may be a good idea in the long term.
That seems incredibly useful to me if we were to add more code paths
that do concurrent index rebuilds, to make sure that we don't rely on
an index we should not use at all.  That's a HEAD-only thing IMO,
though.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, Michael!

> This is a non-fresh Friday-afternoon idea, but it would make sure that
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5.  Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).

Yes, I also have tried that approach, but it doesn't work, unfortunately.
You may fail test increasing number of connections:

'--no-vacuum --client=10 -j 2 --transactions=1000',

The source of the issue is not the swap of the indexes (and not related to REINDEX CONCURRENTLY only), but the fact that indexes are fetched once during planning (to find the arbiter), but then later reread with a new catalog snapshot for the the actual execution.

So, other possible fixes I see:
* fallback to replanning in case we see something changed during the execution
* select arbiter indexes during actual execution

> That's a HEAD-only thing IMO,
> though.
Do you mean that it needs to be moved to a separate patch?

Best regards,
Mikhail.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.

While looking at all that, I've been also curious about this specific
point, and it is indeed possible to finish in a state where a
duplicate key would be found in one of indexes selected by the
executor during an INSERT ON CONFLICT while a concurrent set of CICs
and DICs are run, so you don't really need a REINDEX.  See for example
the attached test.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Fri, Jun 21, 2024 at 11:31:21AM +0200, Michail Nikolaev wrote:
> Yes, I also have tried that approach, but it doesn't work, unfortunately.
> You may fail test increasing number of connections:
>
> '--no-vacuum --client=10 -j 2 --transactions=1000',
>
> The source of the issue is not the swap of the indexes (and not related to
> REINDEX CONCURRENTLY only), but the fact that indexes are fetched once
> during planning (to find the arbiter), but then later reread with a new
> catalog snapshot for the the actual execution.

When I first saw this report, my main worry was that I have somewhat
managed to break the state of the indexes leading to data corruption
because of an incorrect step in the concurrent operations.  However,
as far as I can see this is not the case, as an effect of two
properties we rely on for concurrent index operations, that hold in
the executor and the planner.  Simply put:
- The planner ignores indexes with !indisvalid.
- The executor ignores indexes with !indislive.

The critical point is that we wait in DROP INDEX CONC and REINDEX CONC
for any transactions still using an index that's waiting to be marked
as !indislive, because such indexes *must* not be used in the
executor.

> So, other possible fixes I see:
> * fallback to replanning in case we see something changed during the
> execution
> * select arbiter indexes during actual execution

These two properties make ON CONFLICT react the way it should
depending on the state of the indexes selected by the planner based on
the query clauses, with changes reflecting when executing, with two
patterns involved:
- An index may be created in a concurrent operation after the planner
has selected the arbiter indexes (the index may be defined, still not
valid yet, or just created after), then the query execution would need
to handle the extra index created available at execution, with a
failure on a ccnew index.
- An index may be selected at planning phase, then a different index
could be used by a constraint once both indexes swap, with a failure
on a ccold index.

As far as I can see, it depends on what kind of query semantics and
the amount of transparency you are looking for here in your
application.  An error in the query itself can also be defined as
useful so as your application is aware of what happens as an effect of
the concurrent index build (reindex or CIC/DIC), and it is not really
clear to me why silently falling back to a re-selection of the arbiter
indexes would be always better.  Replanning could be actually
dangerous if a workload is heavily concurrently REINDEX'd, as we could
fall into a trap where a query can never decide which index to use.
I'm not saying that it cannot be improved, but it's not completely
clear to me what query semantics are the best for all users because
the behavior of HEAD and your suggestions have merits and demerits.
Anything we could come up with would be an improvement anyway, AFAIU.

>> That's a HEAD-only thing IMO,
>> though.
>
> Do you mean that it needs to be moved to a separate patch?

It should, but I'm wondering if that's necessary for two reasons.

First, a check on indisvalid would be incorrect, because indexes
marked as !indisvalid && indislive mean that there is a concurrent
operation happening, and that this concurrent operation is waiting for
all transactions working with a lock on this index to finish before
flipping the live flag and make this index invalid for decisions taken
in the executor, like HOT updates, etc.

A check on indislive may be an idea, still I'm slightly biased
regarding its additional value because any indexes opened for a
relation are fetched from the relcache with RelationGetIndexList()
explaining why indislive indexes cannot be fetched, and we rely on
that in the executor for the indexes opened by a relation.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, Michael!

> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application.  An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.

From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR:  duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.

On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.

Also, as you can see from the topic starter letter, we could see errors like this:

* ERROR:  duplicate key value violates unique constraint "tbl_pkey"
* ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"

So, the first error message does not provide any clue for the developer to understand what happened.

> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.

Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.

For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:

1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.

And, during the reindex concurrently:

1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;

Currently, some queries are still using the old one as arbiter, some queries use both.

2) WaitForLockersMultiple

Now all queries use both indexes as arbiter.

3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;

Now, some queries use only the new index, both some still use both.

4)  WaitForLockersMultiple;

Now, all queries use only the new index - we are safe to mark the old one it as !indislive.

> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:

    Assert(indexRelation->rd_index->indiscorrect);
    Assert(indexRelation->rd_index->indislive);

and it is always the valid check.

Best regards,
Mikhail.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hello, Noah!


> On your other thread, it would be useful to see stack traces from the high-CPU
> processes once the live lock has ended all query completion.

I was wrong, it is not a livelock, it is a deadlock, actually. I missed it because pgbench retries deadlocks automatically.

It looks like this:

2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl ERROR:  deadlock detected
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL:  Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT:  See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT:  while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT:  INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n

Stacktraces:

-------------------------

INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n

#0  in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2  in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3  in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4  in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5  in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6  in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7  in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8  in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9  in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274


-------------------------

INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n

#0  in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2  in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3  in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4  in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5  in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6  in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7  in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8  in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9  in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274

-------------------------

Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.

#0  in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2  in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3  in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4  in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5  in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6  in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7  in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8  in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9  in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274


It looks like a deadlock caused by different set of indexes being used as arbiter indexes (or by the different order).

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michail Nikolaev
Дата:
Hell, everyone!

Using the brand-new injection points support in specs, I created a spec to reproduce the issue.

It fails like this currently:

make -C src/test/modules/injection_points/ check

@@ -64,6 +64,7 @@
 
 step s3_s1: <... completed>
 step s2_s1: <... completed>
+ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
 
 starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
 injection_points_attach
@@ -129,3 +130,4 @@
 
 step s3_s1: <... completed>
 step s2_s1: <... completed>
+ERROR:  duplicate key value violates unique constraint "tbl_pkey"

Best regards,
Mikhail.
Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Міхаіл Нікалаеў
Дата:

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Noah Misch
Дата:
On Mon, Oct 20, 2025 at 09:27:00PM +0200, Mihail Nikalayeu wrote:
> Some tests of stabilization, discussed in [0].
> 
> Also, an issue known for more then 1.5year... Should we at least document it?

Yes, I'm happy to push a patch documenting it.  Would you like to propose the
specific doc patch?  I regret lacking the bandwidth to review the fix patches.

> [0]:
https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Noah!

Thanks for the attention!

On Mon, Oct 27, 2025 at 7:06 PM Noah Misch <noah@leadboat.com> wrote:
> Yes, I'm happy to push a patch documenting it.  Would you like to propose the
> specific doc patch?  I regret lacking the bandwidth to review the fix patches.

Of course!

First version in attachment, waiting for your comment.

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Noah Misch
Дата:
On Tue, Oct 28, 2025 at 02:19:00AM +0100, Mihail Nikalayeu wrote:
> On Mon, Oct 27, 2025 at 7:06 PM Noah Misch <noah@leadboat.com> wrote:
> > Yes, I'm happy to push a patch documenting it.  Would you like to propose the
> > specific doc patch?  I regret lacking the bandwidth to review the fix patches.
> 
> Of course!
> 
> First version in attachment, waiting for your comment.

Thanks.  Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/
concurrent REINDEX CONCURRENTLY?  A search of the thread found no mention of
"ON CONSTRAINT".  It seems safe to assume that clause would avoid problems w/
CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX.

The attached version has these changes:

- Mention ON CONSTRAINT as a workaround.  Will remove if you find or suspect
  it's not effective.
- Limit the doc change to ON CONFLICT.  I think mentioning it at the INDEX
  commands is undue emphasis.
- Use term "unique violation", a term used earlier on the same page, instead
  of "duplicate key ...".
- Remove the internals-focused point about the brief window.
- Remove some detail I considered insufficiently surprising, e.g. the point
  about "compatible with the index being built".

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Mon, Nov 3, 2025 at 12:21 AM Noah Misch <noah@leadboat.com> wrote:
> Thanks.  Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/
> concurrent REINDEX CONCURRENTLY?  A search of the thread found no mention of
> "ON CONSTRAINT".  It seems safe to assume that clause would avoid problems w/
> CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX.

It is also affected. There is a special
reindex_concurrently_upsert_on_constraint spec in the patch.
And even a special commit (0004) to fix it :)

But yes, it happens only in the case of REINDEX.

I removed the mention of  "ON CONSTRAINT" and added a small comment
near infer_arbiter_indexes.

Doc patch is 0001, other - specs and fixes for future.

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Noah Misch
Дата:
On Mon, Nov 03, 2025 at 09:41:00PM +0100, Mihail Nikalayeu wrote:
> On Mon, Nov 3, 2025 at 12:21 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks.  Does "ON CONFLICT ON CONSTRAINT constraint_name" avoid the problem w/
> > concurrent REINDEX CONCURRENTLY?  A search of the thread found no mention of
> > "ON CONSTRAINT".  It seems safe to assume that clause would avoid problems w/
> > CREATE INDEX CONCURRENTLY, but that's less certain for REINDEX.
> 
> It is also affected. There is a special
> reindex_concurrently_upsert_on_constraint spec in the patch.
> And even a special commit (0004) to fix it :)

My mistake.  I've redone the mbox search that I thought I had done.  That
search should have found it yesterday, so I must have made a typo then.

> I removed the mention of  "ON CONSTRAINT" and added a small comment
> near infer_arbiter_indexes.
> 
> Doc patch is 0001, other - specs and fixes for future.

I re-flowed the new comment to the standard 78 columns and pushed 0001.
Thanks.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Noah!

I re-flowed the new comment to the standard 78 columns and pushed 0001.

Thanks again!

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2024-Aug-24, Michail Nikolaev wrote:

Hello, I'm working on getting the 0002 fix committed, with the tests it
fixes.  I ran across this comment:

> @@ -813,7 +814,13 @@ infer_arbiter_indexes(PlannerInfo *root)
>          idxRel = index_open(indexoid, rte->rellockmode);
>          idxForm = idxRel->rd_index;
>  
> -        if (!idxForm->indisvalid)
> +        /*
> +         * We need to consider both indisvalid and indisready indexes because
> +         * them may become indisvalid before execution phase. It is required
> +         * to keep set of indexes used as arbiter to be the same for all
> +         * concurrent transactions.
> +         */
> +        if (!idxForm->indisready)
>              goto next;
>  
>          /*

I think this comment is wrong, or at least confusing.  It says "we need
to consider both indisvalid and indisready indexes", which is somewhat
dubious: perhaps it wants to say "we need to consider indexes that have
indisvalid and indisready".  But that is also wrong: I mean, if we
wanted to consider indexes that are marked indisready=false, then we
wouldn't "next" here (which essentially ignores such indexes).  So,
really, I think what this wants to say is "we need to consider indexes
that are indisready regardless of whether or not they are indisvalid,
because they may become valid later".  

Also, I think this comment lacks an explanation of _why_ indexes with
indisvalid=false are required.  You wrote earlier[1] in the thread the
following:

> The reason is that these indexes are required for correct processing during
> the execution phase.
> If "ready" indexes are skipped as arbiters by one transaction, they may
> already have become "valid" for another concurrent transaction during its
> planning phase.
> As a result, both transactions could concurrently process the UPSERT
> command with different sets of arbiters (while using the same set of
> indexes for tuple insertion later).
> This can lead to unexpected "duplicate key value violates unique
> constraint" errors and deadlocks.

I think this text is also confusing or wrong.  I think you meant 'If
"not valid" indexes are skipped as arbiters, they may have become
"valid" for another concurrent transaction'.  The text in catalogs.sgml
for these columns is:

: indisvalid bool
:
: If true, the index is currently valid for queries. False means the
: index is possibly incomplete: it must still be modified by
: INSERT/UPDATE operations, but  it cannot safely be used for queries.
: If it is unique, the uniqueness property is not guaranteed true
: either.

: indisready bool
:
: If true, the index is currently ready for inserts. False means the
: index must be ignored by INSERT/UPDATE operations.

It makes sense that indisready indexes must be ignored, because
basically the catalog state is not ready yet.  So that part seems
correct.

The other critical point is that the uniqueness must hold, and an index
with indisvalid=false would not necessarily detect that because the tree
might not be built completely yet, so the heap tuple that would be a
duplicate of the one we're writing might not have been scanned yet.  But
that's not a problem, because we require that a valid index exists, even
if we don't "infer" that one -- which means the uniqueness clause is
being enforced by that other index.

Given all of that, I have rewritten the comment thusly:

    /*
     * Ignore indexes that aren't indisready, because we cannot trust their
     * catalog structure yet.  However, if any indexes are marked
     * indisready but not yet indisvalid, we still consider them, because
     * they might turn valid while we're running.  Doing it this way
     * allows a concurrent transaction with a slightly later catalog
     * snapshot infer the same set of indexes, which is critical to
     * prevent spurious 'duplicate key' errors.
     *
     * However, another critical aspect is that a unique index that isn't
     * yet marked indisvalid=true might not be complete yet, meaning it
     * wouldn't detect possible duplicate rows.  In order to prevent false
     * negatives, we require that we include in the set of inferred indexes
     * at least one index that is marked valid.
     */
    if (!idxForm->indisready)
        goto next;

Am I understanding this correctly?

Thanks,

[1] https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
            https://twitter.com/thingskatedid/status/1456027786158776329



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

> Hello, I'm working on getting the 0002 fix committed, with the tests it
> fixes.  I ran across this comment:

Nice!

> Am I understanding this correctly?
Yes, you are right. Looks like I mixed up everything while writing the comment.

I attached an updated version, changes are:
1) your fix for comment
2) some updates for 0001 test stability (related to [0]).

I'll recheck other patches too, maybe something needs to be adjusted also.

Best regards,
Mikhail.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hello,

On 2025-Nov-22, Mihail Nikalayeu wrote:

> I attached an updated version, changes are:
> 1) your fix for comment
> 2) some updates for 0001 test stability (related to [0]).

Thank you.  I've been looking at this part of 0001 too:

> @@ -942,6 +943,8 @@ retry:
>         econtext->ecxt_scantuple = save_scantuple;
> 
>         ExecDropSingleTupleTableSlot(existing_slot);
> +       if (!conflict)
> +               INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);

and wondering whether it would make sense to pass the 'conflict' as an
argument to the injection point instead of being conditional on it.
That is, do it like

        ExecDropSingleTupleTableSlot(existing_slot);
+       INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);

However, I don't see that the SQL injection point interface has the
ability to receive arguments, so I'm guessing that this is not workable?
Maybe something to consider for the future.


BTW I've split the patches to commit differently: instead of 0001 with a
bunch of failing tests and then 0002 with fixes for some of them, I
intend to push a modified 0002 with the tests in 0001 that it fixes,
then your 0003 with the tests it fixes, and so on.  (I have already cut
it this way, so you don't need to resubmit anything at this point.  But
I'll verify what changes you have in the v13-0001 compared to the one
before, to be certain I'm not missing any later changes there.)

Thanks!

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Sat, Nov 22, 2025 at 03:16:44PM +0100, Alvaro Herrera wrote:
>         ExecDropSingleTupleTableSlot(existing_slot);
> +       INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);
>
> However, I don't see that the SQL injection point interface has the
> ability to receive arguments, so I'm guessing that this is not workable?
> Maybe something to consider for the future.

Are you referring to something like 371f2db8b05e here?  This has been
added in 18.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Nov-23, Michael Paquier wrote:

> On Sat, Nov 22, 2025 at 03:16:44PM +0100, Alvaro Herrera wrote:
> >         ExecDropSingleTupleTableSlot(existing_slot);
> > +       INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", &conflict);
> > 
> > However, I don't see that the SQL injection point interface has the
> > ability to receive arguments, so I'm guessing that this is not workable?
> > Maybe something to consider for the future.
> 
> Are you referring to something like 371f2db8b05e here?  This has been
> added in 18.

Yes, exactly that ... but can this be used by the SQL injection points
functionality?  The test is an isolation .spec file, and I didn't find a
way to say "make me sleep when I hit this injection point, but only if
conflict is false".  Or maybe I just missed it.

Thanks!

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Sun, Nov 23, 2025 at 01:14:37PM +0100, Alvaro Herrera wrote:
> Yes, exactly that ... but can this be used by the SQL injection points
> functionality?  The test is an isolation .spec file, and I didn't find a
> way to say "make me sleep when I hit this injection point, but only if
> conflict is false".  Or maybe I just missed it.

(Sorry for the low activity, last week was a crazy conference week and
I'm still recovering.)

Reading through v13-0001, there is currently no direct way with the
existing callbacks to do as you want, which would be to push down a
conditional wait inside the callback itself, based on a run-time
stack.  There would be two ways to do that, by extending the facility:
- Simple one: addition of a new dedicated callback, that accepts one
single boolean argument.
- More complicated one: extend the module injection_points so as it is
possible to pass down conditions that should be checked at run-time.
I've mentioned that in the past, folks felt meh.

Saying that, Mihail's patch to just run the injection point only if
conflict == false is OK, and that's what I have seen most hackers do
as a matter of simplicity.  This makes the injection point footprint
in the backend slightly larger but it's not that bad, englobed inside
an ifdef.  You could also use a secondary point for an else branch
defined in execIndexing.c, with a different name and a different
callback attached to it if you want to take a special action for the
conflict == true case.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Nov-22, Mihail Nikalayeu wrote:

> 2) some updates for 0001 test stability (related to [0]).

This patch would bring the committed test file up to date with what you
last submitted.  However, I didn't understand what is the problem with
the original formulation, and I haven't seen the test fail ... can you
explain?


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Mon, Nov 24, 2025 at 06:49:47PM +0100, Alvaro Herrera wrote:
> This patch would bring the committed test file up to date with what you
> last submitted.  However, I didn't understand what is the problem with
> the original formulation, and I haven't seen the test fail ... can you
> explain?

Reading through bc32a12e0db2, I am puzzled by the committed result
here:
+#ifdef USE_INJECTION_POINTS
+   if (conflict)
+       INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);
+   else
+       INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);
+#endif

The "no-conflict" point is used in the isolation test, but not the
other in the "conflict == true" path:
$ git grep check-exclusion-or-unique-constraint-conflict
src/backend/executor/execIndexing.c:
INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);

If you have no plans for it in the long-term, I'd rather remove it
from the tree, rather than keep it.  Of course, I would keep the
USE_INJECTION_POINTS block to avoid the extra boolean check in
non-USE_INJECTION_POINTS builds.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

On Mon, Nov 24, 2025 at 6:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> This patch would bring the committed test file up to date with what you
> last submitted.  However, I didn't understand what is the problem with
> the original formulation, and I haven't seen the test fail ... can you
> explain?

Yes, it looks strange - but it is the best option I have found so far.
I've seen tests fail a few times in CI due to race.

More details are available at [0] and particularly [1].

In a few words - it is an attempt to make sure the test goes to the
wake-up backend only after it actually enters to wait mode. For that
reason an additional 'notice' point is used by spec.
I have proposed another possible solution for the [0] thread.

Best regards,
Mikhail.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com
[1]: https://www.postgresql.org/message-id/flat/aREW7Qo0GqjfiHn7%40paquier.xyz#fde8593a6239a594724f8badd33f7266



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Nov-24, Mihail Nikalayeu wrote:

> In a few words - it is an attempt to make sure the test goes to the
> wake-up backend only after it actually enters to wait mode. For that
> reason an additional 'notice' point is used by spec.
> I have proposed another possible solution for the [0] thread.

That makes sense.  I pushed it now, many thanks.


On 2025-Nov-25, Michael Paquier wrote:

> Reading through bc32a12e0db2, I am puzzled by the committed result
> here:
> +#ifdef USE_INJECTION_POINTS
> +   if (conflict)
> +       INJECTION_POINT("check-exclusion-or-unique-constraint-conflict", NULL);
> +   else
> +       INJECTION_POINT("check-exclusion-or-unique-constraint-no-conflict", NULL);
> +#endif

> If you have no plans for it in the long-term, I'd rather remove it
> from the tree, rather than keep it.

Removed.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We have labored long to build a heaven, only to find it
populated with horrors"                        (Prof. Milton Glass)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hello,

We ran into one more problem with the new test, evidenced by timeouts by
buildfarm member prion.  For CATCACHE_FORCE_RELEASE builds on two of the
tests, we get a few invalidations of the catalog snapshot ahead of what
we expect, and because we have an injection point to sleep there, those
tests get stuck.

Here's one possible fix.  I had to take the attach operation on
invalidate-catalog-snapshot-end to a new step of s1, instead of
occurring in the setup block.  I understand that this is because no step
can run until the setup of all steps completes, so if one setup gets
stuck, we're out of luck.  And then, session s4 can do a conditional
wakeup of session s1.

Patch attached.  Thoughts?

Maybe there's some other way to go about this -- for instance I
considered the idea of moving the injection point somewhere else from
InvalidateCatalogSnapshot().  I don't have any ideas about that though,
but I'm willing to listen if anybody has any.


The output in both cases is different (we get more notices in the weird
build and also s1 goes to sleep in different order), so I had to add an
alternative expected file.  The diffs look okay to me -- essentially
they say, in the normal case, the injection_points_attach() returns
immediately, but in the other case, s1 goes to sleep and is awakened
when it sees the 'case' expression by session 4:

--- expected/index-concurrently-upsert-predicate.out    2025-11-26 19:13:58.702213673 +0100
+++ expected/index-concurrently-upsert-predicate_1.out    2025-11-26 19:04:16.745666956 +0100
@@ -1,8 +1,9 @@
 Parsed test spec with 5 sessions
 
 starting permutation: s1_attach_invalidate_catalog_snapshot s4_wakeup_s1_setup s5_noop s3_start_create_index
s1_start_upserts4_wakeup_define_index_before_set_valid s2_start_upsert s5_wakeup_s1_from_invalidate_catalog_snapshot
s4_wakeup_s2s4_wakeup_s1
 
+s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
 injection_points_attach
 -----------------------
                        
 (1 row)
 
@@ -14,18 +15,15 @@
 injection_points_attach
 -----------------------
                        
 (1 row)
 
+s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
+s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
 step s1_attach_invalidate_catalog_snapshot: 
     SELECT injection_points_attach('invalidate-catalog-snapshot-end', 'wait');
-
-injection_points_attach
------------------------
-                       
-(1 row)
-
+ <waiting ...>
 step s4_wakeup_s1_setup: 
     select case when
             (select pid from pg_stat_activity
                   where wait_event_type = 'InjectionPoint' and
                   wait_event = 'invalidate-catalog-snapshot-end') is not null
@@ -35,10 +33,16 @@
 case
 ----
     
 (1 row)
 
+step s1_attach_invalidate_catalog_snapshot: <... completed>
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
 step s5_noop: 
  <waiting ...>
 step s3_start_create_index: 
     CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_special_duplicate ON test.tbl(abs(i)) WHERE i < 10000;
  <waiting ...>


-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Los minutos y los segundos son mercadería de la ciudad, donde un infeliz se
afana por no perder ni siquiera un segundo y no advierto que obrando de ese
modo pierde una vida."              ("La vuelta de Don Camilo", G. Guareschi)

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Wed, Nov 26, 2025 at 7:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> We ran into one more problem with the new test, evidenced by timeouts by
> buildfarm member prion.  For CATCACHE_FORCE_RELEASE builds on two of the
> tests, we get a few invalidations of the catalog snapshot ahead of what
> we expect, and because we have an injection point to sleep there, those
> tests get stuck.

Oh, I missed that. Non-yet pushed tests are probably affected too.

> Here's one possible fix.  I had to take the attach operation on
> invalidate-catalog-snapshot-end to a new step of s1, instead of
> occurring in the setup block.  I understand that this is because no step
> can run until the setup of all steps completes, so if one setup gets
> stuck, we're out of luck.  And then, session s4 can do a conditional
> wakeup of session s1.

I have tried to move the setup of invalidate-catalog-snapshot-end to
s1_start_upsert as the first command - but for some reason it wasn't
working the way I expected. But maybe I missed something.

> Patch attached.  Thoughts?

Solution seems reasonable to me, another related ideas:
* replace "select case when" with function like
injection_points_wakeup_if_waiting to avoid the possible race between
select and wake up (but AFAIK it is not possible in the current case)
* introduce some injection_points function to enter "ignore all runs,
but still allowed to attach/detach" mode and "normal" mode.. As first
command of setup - enter such "setup mode", as last - back to normal.

> Maybe there's some other way to go about this -- for instance I
> considered the idea of moving the injection point somewhere else from
> InvalidateCatalogSnapshot().  I don't have any ideas about that though,
> but I'm willing to listen if anybody has any.

AFAIU it is the only place.

Best regard,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hi,

On 2025-Nov-27, Mihail Nikalayeu wrote:

> On Wed, Nov 26, 2025 at 7:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > We ran into one more problem with the new test, evidenced by timeouts by
> > buildfarm member prion.  For CATCACHE_FORCE_RELEASE builds on two of the
> > tests, we get a few invalidations of the catalog snapshot ahead of what
> > we expect, and because we have an injection point to sleep there, those
> > tests get stuck.
> 
> Oh, I missed that. Non-yet pushed tests are probably affected too.

Yeah, I suspect as much.

> > Here's one possible fix.
> 
> I have tried to move the setup of invalidate-catalog-snapshot-end to
> s1_start_upsert as the first command - but for some reason it wasn't
> working the way I expected. But maybe I missed something.

Right, this is why I said that this is one possible fix.  I mean, maybe
there are other ways to fix it.  I'm not sure it's the simplest or the
most robust, but I don't want to spend too much time looking for other
ways either.

> Solution seems reasonable to me, another related ideas:
> * replace "select case when" with function like
> injection_points_wakeup_if_waiting to avoid the possible race between
> select and wake up (but AFAIK it is not possible in the current case)
> * introduce some injection_points function to enter "ignore all runs,
> but still allowed to attach/detach" mode and "normal" mode.. As first
> command of setup - enter such "setup mode", as last - back to normal.

Ah, I had thought about the first one of these ideas, but not the second
one.  I noted both in the commit message, in case somebody is motivated
to implement them.

Thanks for reviewing.  I have pushed it now.  Looking at the next one.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hello,

Here's a slightly different approach for the fix proposed in your 0003.
I wasn't happy with the idea of opening all indexes twice in
infer_arbiter_indexes(), so I instead made it collect all Relations from
those indexes in an initial loop, then process them in the two places
that wanted them, and we close them all again together.  I think this
also makes the code clearer.  We no longer have the "next" goto label to
close the index at the bottom of the loop, but instead we can just do
"continue" cleanly.

I also rewrote some comments.  I may not have done all the edits I
wanted, but ran out of time today and I think this is in pretty good
shape.

I tried under CATCACHE_FORCE_RELEASE and saw no problems.


-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Hannu Krosing
Дата:
Please also check my (very raw) proposal in the thread "Revisiting
{CREATE INDEX, REINDEX} CONCURRENTLY improvements"

TL;DR - use logical decoding for adding index entries for tuples added
during CIC.

Maybe this also makes the issue with ON CONFLICT UPDATE and REINDEX
CONCURRENTLY go away.

On Fri, Nov 28, 2025 at 6:30 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> Hello,
>
> Here's a slightly different approach for the fix proposed in your 0003.
> I wasn't happy with the idea of opening all indexes twice in
> infer_arbiter_indexes(), so I instead made it collect all Relations from
> those indexes in an initial loop, then process them in the two places
> that wanted them, and we close them all again together.  I think this
> also makes the code clearer.  We no longer have the "next" goto label to
> close the index at the bottom of the loop, but instead we can just do
> "continue" cleanly.
>
> I also rewrote some comments.  I may not have done all the edits I
> wanted, but ran out of time today and I think this is in pretty good
> shape.
>
> I tried under CATCACHE_FORCE_RELEASE and saw no problems.
>
>
> --
> Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Fri, Nov 28, 2025 at 6:30 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> I wasn't happy with the idea of opening all indexes twice in
> infer_arbiter_indexes(), so I instead made it collect all Relations from
> those indexes in an initial loop, then process them in the two places
> that wanted them, and we close them all again together.  I think this
> also makes the code clearer.  We no longer have the "next" goto label to
> close the index at the bottom of the loop, but instead we can just do
> "continue" cleanly.

Yes, agreed - that looks better than my version.

Few moments:

> Second, if an attribute list was specified in the ON
>* CONFLICT clause, we use the list to find the indexes whose attributes
>* match that list.
I think we may notice expressions and predicates also.

> /*
> * Find the named constraint index to extract its attributes and
> * predicates.
> */
> foreach_ptr(RelationData, idxRel, indexRelList)

Should we consider assert to ensure we have actually found something?

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hi!

On Fri, Nov 28, 2025 at 7:09 PM Hannu Krosing <hannuk@google.com> wrote:
> Please also check my (very raw) proposal in the thread "Revisiting
> {CREATE INDEX, REINDEX} CONCURRENTLY improvements"
>
> TL;DR - use logical decoding for adding index entries for tuples added
> during CIC.

No, the current issue is caused by another reason - it is more about
switching from old to new index.

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Also, I think it is good to mention changes in parse_clause.c in the
commit message.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-11-28, Mihail Nikalayeu wrote:
> Also, I think it is good to mention changes in parse_clause.c in the
> commit message.

Oh rats, no, thanks for noticing, that's a mostly unrelated change that I intend to commit separately, mentioned in
anotherthread, that I happened to merge here accidentally. 

--
Álvaro Herrera



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hi Mihail,

Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the
attribute numbers, in case the partition has a different column layout
than the parent (e.g. in case there are dropped columns or just
different column orders)

Regards

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"We have labored long to build a heaven, only to find it
populated with horrors"                        (Prof. Milton Glass)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Sat, Nov 29, 2025 at 6:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the
> attribute numbers, in case the partition has a different column layout
> than the parent (e.g. in case there are dropped columns or just
> different column orders)

Oh, yes, you're right. I'll prepare an updated version (also it looks
like some inner loops may be refactored in a more effective way).

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

On Sun, Nov 30, 2025 at 1:11 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> On Sat, Nov 29, 2025 at 6:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > Looking at 0004, I think IsIndexCompatibleAsArbiter() should map the
> > attribute numbers, in case the partition has a different column layout
> > than the parent (e.g. in case there are dropped columns or just
> > different column orders)
>
> Oh, yes, you're right. I'll prepare an updated version (also it looks
> like some inner loops may be refactored in a more effective way).

I was wrong, function is called only for indexes from the same
relation (actual partition).
But anyway I reworked the commit - it is much clearer now (and a
little bit more effective).

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Alexander Lakhin
Дата:
Hello Alvaro,

27.11.2025 14:32, Álvaro Herrera wrote:
> Thanks for reviewing.  I have pushed it now.  Looking at the next one.

I've noticed two failures from skink you could find interesting: [1], [2].
The difference from [2]:
ok 3         - syscache-update-pruned                  94198 ms
not ok 4     - index-concurrently-upsert               14008 ms
ok 5         - reindex-concurrently-upsert             14379 ms

--- /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/modules/injection_points/expected/index-concurrently-upsert.out

2025-11-27 13:38:19.513528475 +0100
+++ 

/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/injection_points/isolation/results/index-concurrently-upsert.out

2025-11-30 00:10:01.697938769 +0100
@@ -107,6 +107,7 @@
  (1 row)

  s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
+s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
  step s1_start_upsert: <... completed>
  step s2_start_upsert: <... completed>
  step s3_start_create_index: <... completed>

I've managed to reproduce something similar to this diff when running
multiple test instances under Valgrind with parallel. With the attached
patch applied:
for i in {1..40}; do cp -r src/test/modules/injection_points/ src/test/modules/injection_points_$i/; sed -i.bak 
"s|src/test/modules/injection_points|src/test/modules/injection_points_$i|" 
src/test/modules/injection_points_$i/Makefile; done
make -s check -C src/test/modules/injection_points

and/tmp/extra.config containing:
log_min_messages = INFO
backtrace_functions = 'injection_notice'

for i in {1..10}; do np=5; echo "ITERATION $i"; parallel -j40 --linebuffer --tag PROVE_TESTS="t/099*" NO_TEMP_INSTALL=1

TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/modules/injection_points_{}  ::: `seq $np` || break; done

gives me:
ITERATION 1
...
4       #   Failed test 'regression tests pass'
4       #   at t/099_isolation_regress.pl line 52.
4       #          got: '256'
4       #     expected: '0'

src/test/modules/injection_points_4/tmp_check/log/regress_log_099_isolation_regress contains:
...
ok 11        - index-concurrently-upsert                5282 ms
not ok 12    - index-concurrently-upsert                6347 ms
ok 13        - index-concurrently-upsert                5723 ms
...
--- .../src/test/modules/injection_points_4/expected/index-concurrently-upsert.out 2025-11-30 14:24:29.385133831 +0200
+++ .../src/test/modules/injection_points_4/tmp_check/results/index-concurrently-upsert.out 2025-11-30 
16:22:29.168920744 +0200
@@ -78,6 +78,7 @@

  (1 row)

+s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
  step s4_wakeup_s2:
     SELECT injection_points_detach('exec-insert-before-insert-speculative');
     SELECT injection_points_wakeup('exec-insert-before-insert-speculative');
=== EOF ===

I can see in the following stack trace for this extra notice:
2025-11-30 16:22:28.465 EET|law|isolation_regression|692c531f.bb652|NOTICE:  notice triggered for injection point 
pre-invalidate-catalog-snapshot-end
2025-11-30 16:22:28.465 EET|law|isolation_regression|692c531f.bb652|BACKTRACE:
injection_notice at injection_points.c:278:3
InjectionPointRun at injection_point.c:555:1
InvalidateCatalogSnapshot at snapmgr.c:463:3
LocalExecuteInvalidationMessage at inval.c:831:4
ReceiveSharedInvalidMessages at sinval.c:113:18
AcceptInvalidationMessages at inval.c:970:23
LockRelationOid at lmgr.c:137:3
relation_open at relation.c:58:6
table_open at table.c:44:6
SearchCatCacheMiss at catcache.c:1550:13
SearchCatCacheInternal at catcache.c:1495:9
SearchCatCache1 at catcache.c:1368:1
SearchSysCache1 at syscache.c:227:1
build_coercion_expression at parse_coerce.c:852:8
coerce_type at parse_coerce.c:433:13
coerce_to_target_type at parse_coerce.c:105:11
transformAssignedExpr at parse_target.c:580:4
transformInsertRow at analyze.c:1121:10
transformInsertStmt at analyze.c:988:14
transformStmt at analyze.c:364:13
transformOptionalSelectInto at analyze.c:317:1
transformTopLevelStmt at analyze.c:266:11
parse_analyze_fixedparams at analyze.c:134:10
pg_analyze_and_rewrite_fixedparams at postgres.c:687:10
exec_simple_query at postgres.c:1195:20
PostgresMain at postgres.c:4777:6
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3598:8
ServerLoop at postmaster.c:1716:10
PostmasterMain at postmaster.c:1403:11
main at main.c:236:2

I also observed:
2025-11-30 15:33:45.746 EET|law|isolation_regression|692c47b4.675e7|NOTICE:  notice triggered for injection point 
pre-invalidate-catalog-snapshot-end
2025-11-30 15:33:45.746 EET|law|isolation_regression|692c47b4.675e7|BACKTRACE:
injection_notice at injection_points.c:278:3
InjectionPointRun at injection_point.c:555:1
InvalidateCatalogSnapshot at snapmgr.c:463:3
LocalExecuteInvalidationMessage at inval.c:831:4
ReceiveSharedInvalidMessages at sinval.c:113:18
AcceptInvalidationMessages at inval.c:970:23
LockRelationOid at lmgr.c:137:3
relation_open at relation.c:58:6
table_open at table.c:44:6
CatalogCacheInitializeCache at catcache.c:1131:13
ConditionalCatalogCacheInitializeCache at catcache.c:1092:1
SearchCatCacheInternal at catcache.c:1424:15
SearchCatCache1 at catcache.c:1368:1
SearchSysCache1 at syscache.c:227:1
check_enable_rls at rls.c:66:10
get_row_security_policies at rowsecurity.c:130:15
fireRIRrules at rewriteHandler.c:2218:3
QueryRewrite at rewriteHandler.c:4581:11
pg_rewrite_query at postgres.c:822:20
pg_analyze_and_rewrite_fixedparams at postgres.c:696:19
exec_simple_query at postgres.c:1195:20
PostgresMain at postgres.c:4777:6
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3598:8
ServerLoop at postmaster.c:1716:10
PostmasterMain at postmaster.c:1403:11
main at main.c:236:2

Could you please look if this can be fixed?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-11-25%2021%3A55%3A00
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-11-29%2021%3A51%3A13

Best regards,
Alexander
Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Sun, Nov 30, 2025 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>   s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
> +s1: NOTICE:  notice triggered for injection point pre-invalidate-catalog-snapshot-end
>   step s1_start_upsert: <... completed>
>   step s2_start_upsert: <... completed>
>   step s3_start_create_index: <... completed>

Oh, I'm afraid it may become an endless fight....

I think it is better to implement something in isolationtester to
natively support the case from [0] (in such case we don't need NOTICE
on .pre-invalidate-catalog-snapshot-end).

Best regards,
Mikhail.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwUc%3DjtSUEaQCtyt8zTeOJ-gHZ8%3Dw_KJsVjDOYSLqaY9Lg%40mail.gmail.com



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Sun, Nov 30, 2025 at 6:26 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> I think it is better to implement something in isolationtester to
> natively support the case from [0] (in such case we don't need NOTICE
> on .pre-invalidate-catalog-snapshot-end).

I think I have implemented a better solution - without any additional NOTICE.
It just actually waits for the other backend to hang on the injection point.

Attached (together with other changes).

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-01, Mihail Nikalayeu wrote:

> From af5e27d4150dd53d313122c02da7ce4d3c07f332 Mon Sep 17 00:00:00 2001
> From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
> Date: Sun, 30 Nov 2025 16:48:55 +0100
> Subject: [PATCH v16 1/3] Modify the ExecInitPartitionInfo function to consider
>   partitioned indexes that are potentially processed by REINDEX CONCURRENTLY
>  as  arbiters as well.
> 
> This is necessary to ensure that all concurrent transactions use the same set of arbiter indexes.

Thanks, pushed this one after some more editorialization.  I rewrote
most comments and renamed variables, but I also changed the code
somewhat.  For instance I made it use the ResultRelInfo's array of
indexes instead of doing RelationGetIndexList, because it seemed to
match better with the usage of the list index to match the indexes in
the array again later.  Maybe now would be a good time to dust off your
stress tests and verify that everything is still working as intended.

In doing these changes, I realized that there are several places in the
code (this one, but also others) that are using
get_partition_ancestors(), which I think might be a somewhat expensive
routine.  It obtains ancestors by recursing up the hierarchy, and at
each step does an indexscan on pg_inherits.  I bet this is not very nice
for performance.  I bet we can make this visible in a profile with an
inheritance hierarchy not terribly deep and a few hundred partitions.
We currently don't have a syscache on pg_inherits, as far as I
understand because back then we didn't think it was going to give us any
advantages, but maybe it would be useful for this and other operations.
If not a syscache, then maybe a different way to cache ancestors for a
relation, perhaps straight in the relcache entry.

Thanks for working on this,

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-01, Mihail Nikalayeu wrote:

> I think I have implemented a better solution - without any additional NOTICE.
> It just actually waits for the other backend to hang on the injection point.

Pushed.  I changed it to be a loop in a DO block rather than a procedure
recursively calling itself, though, which seemed strangely complicated.
Should be pretty much the same, I hope.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
> From 5dc3e4eb50e445a291a13663fc9ce93d0db96b1c Mon Sep 17 00:00:00 2001
> From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
> Date: Sun, 30 Nov 2025 16:49:20 +0100
> Subject: [PATCH v16 2/3] Revert "Doc: cover index CONCURRENTLY causing errors
>  in INSERT ... ON CONFLICT."
> 
> This reverts commit 8b18ed6dfbb8b3e4483801b513fea6b429140569.

Pushed this one also, and marked the commitfest item as committed.

Thanks!

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

> Maybe now would be a good time to dust off your
> stress tests and verify that everything is still working as intended.
Done, passing without any issues.

> I bet we can make this visible in a profile with an
> inheritance hierarchy not terribly deep and a few hundred partitions.
I'll put it on the TODO list for some day.

> Pushed this one also, and marked the commitfest item as committed.
Thanks for pushing\reviewing\rewriting it all!

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Tom Lane
Дата:
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Pushed this one also, and marked the commitfest item as committed.

BF member prion has been failing since 5dee7a603 went in.  Apparently
that solution doesn't work under -DRELCACHE_FORCE_RELEASE and/or
-DCATCACHE_FORCE_RELEASE (so I'm expecting the CLOBBER_CACHE_ALWAYS
animals to fail too, whenever they next report).

            regards, tom lane



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hi, Tom!

On Wed, Dec 3, 2025 at 5:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> BF member prion has been failing since 5dee7a603 went in.  Apparently
> that solution doesn't work under -DRELCACHE_FORCE_RELEASE and/or
> -DCATCACHE_FORCE_RELEASE (so I'm expecting the CLOBBER_CACHE_ALWAYS
> animals to fail too, whenever they next report).

Oh, my bad, sorry.
Attached patch with output variant for
RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case.

But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different
kinds of wakeup loops need to be done.

So, my proposals:
* either drop that test - currently it looks like a test that will
fail 99% of time due to relcache changes instead of actual issue
* either add some kind of mechanics to skip that test in case of
particular build arguments
* either replace it by some kind of small stress test, without any
injection_points

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-03, Mihail Nikalayeu wrote:

> Oh, my bad, sorry.
> Attached patch with output variant for
> RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case.

Thanks.

> But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different
> kinds of wakeup loops need to be done.

Hmm, maybe we can do "SET debug_discard_caches=0" in the setup block of
the sessions that need it.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-03, Mihail Nikalayeu wrote:

> Oh, my bad, sorry.
> Attached patch with output variant for
> RELCACHE_FORCE_RELEASE\CATCACHE_FORCE_RELEASE case.

I have pushed this, thanks.

> But for CLOBBER_CACHE_ALWAYS - it is just a mess, too many different
> kinds of wakeup loops need to be done.

I ran the tests under CLOBBER_CACHE_ALWAYS, and as far as I can tell,
they succeed.  We'll see what the CLOBBER_CACHE_ALWAYS members say ...
assuming the other problems [1] can be fixed.

[1] https://postgr.es/m/baf1ae02-83bd-4f5d-872a-1d04f11a9073@vondra.me


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas.  Most of them
will be wrong, and what you have to learn is which ones to throw away."
                                                         (Linus Pauling)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
The "partitioned" test failed recently also on flaviventris and adder:
 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2025-12-04%2017%3A28%3A20
 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2025-12-03%2009%3A23%3A08
 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2025-12-02%2008%3A48%3A24

diff -U3
/home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/modules/injection_points/expected/reindex-concurrently-upsert-partitioned.out
/home/bf/bf-build/flaviventris/HEAD/pgsql.build/testrun/injection_points/isolation/results/reindex-concurrently-upsert-partitioned.out
---
/home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/modules/injection_points/expected/reindex-concurrently-upsert-partitioned.out
  2025-12-02 14:28:16.233200773 +0100
 
+++
/home/bf/bf-build/flaviventris/HEAD/pgsql.build/testrun/injection_points/isolation/results/reindex-concurrently-upsert-partitioned.out
  2025-12-04 18:31:29.340033507 +0100
 
@@ -61,7 +61,6 @@
                        
 (1 row)
 
-step s1_start_upsert: <... completed>
 step s4_wakeup_s2: 
     SELECT injection_points_detach('exec-insert-before-insert-speculative');
     SELECT injection_points_wakeup('exec-insert-before-insert-speculative');
@@ -76,6 +75,7 @@
                        
 (1 row)
 
+step s1_start_upsert: <... completed>
 step s2_start_upsert: <... completed>
 step s3_start_reindex: <... completed>


-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
[…] indem ich in meinem Leben oft an euch gedacht, euch glücklich zu machen. Seyd es!
A menudo he pensado en vosotros, en haceros felices. ¡Sedlo, pues!
        Heiligenstädter Testament, L. v. Beethoven, 1802
        https://de.wikisource.org/wiki/Heiligenstädter_Testament



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

Seems like it may be easily fixed (see attached patch).
Bwt, is it possible to somehow run the whole buildfarm over some branch?
Such way it will be possible to fix such issues much earlier (some of
them catched by github CI, but not all).

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-04, Mihail Nikalayeu wrote:

> Hello!
> 
> Seems like it may be easily fixed (see attached patch).

Makes sense -- thanks, pushed.

> Bwt, is it possible to somehow run the whole buildfarm over some branch?
> Such way it will be possible to fix such issues much earlier (some of
> them catched by github CI, but not all).

Nope.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Alexander Lakhin
Дата:
Hello Álvaro and Mihail,

02.12.2025 15:07, Álvaro Herrera wrote:
Thanks, pushed this one after some more editorialization.

I've discovered that despite removing FIXME (in 90eae926a), the error
"invalid arbiter index list" can still be triggered with:
CREATE TABLE pt (a int PRIMARY KEY) PARTITION BY RANGE (a);
CREATE TABLE p1 PARTITION OF pt FOR VALUES FROM (1) to (2) PARTITION BY RANGE (a);
CREATE TABLE p1_1 PARTITION OF p1 FOR VALUES FROM (1) TO (2);
CREATE UNIQUE INDEX ON ONLY p1 (a);
INSERT INTO p1 VALUES (1) ON CONFLICT (a) DO NOTHING;
ERROR:  XX000: invalid arbiter index list
LOCATION:  ExecInitPartitionInfo, execPartition.c:863

The first commit it produced on with this script is bc32a12e0.

Best regards,
Alexander

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Alexander!

On Sat, Dec 6, 2025 at 7:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> I've discovered that despite removing FIXME (in 90eae926a), the error
> "invalid arbiter index list" can still be triggered with:

Wow, thanks for finding this.

> The first commit it produced on with this script is bc32a12e0.

Such commands add an indisready, but not indisvalid index on pt, which
is added to to the list of potential arbiters.
It happens because of [0].

From my perspective, the correct solution - is to just remove the
error message, because it looks obsolete now. Or somehow calculate
compensation offset differently - but I am not sure it is a good idea.

Let's wait for Álvaro decision - I'll prepare the fix and regress test then.

Best regards,
Mikhail.

[0]:
https://github.com/postgres/postgres/blob/bc32a12e0db2df203a9cb2315461578e08568b9c/src/backend/commands/indexcmds.c#L1225-L1235



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-06, Mihail Nikalayeu wrote:

> Such commands add an indisready, but not indisvalid index on pt, which
> is added to to the list of potential arbiters.
> It happens because of [0].
> 
> From my perspective, the correct solution - is to just remove the
> error message, because it looks obsolete now. Or somehow calculate
> compensation offset differently - but I am not sure it is a good idea.

Hmm, as I recall it's quite intentional that the index on a partitioned
table is marked !indisvalid; such indexes are only supposed to be marked
valid once indexes on all partitions have been attached.  As I recall,
if you remove that prohibition, some pg_dump scenarios fail.

I'd rather consider the idea of avoiding indexes marked !indisvalid on
partitioned tables as arbiter lists ... but then we need to verify the
scenario where there is one, and INSERT ON CONFLICT runs concurrently
with ALTER INDEX ATTACH PARTITION for the last partition lacking the
index (which is the point where the index is marked indisvalid on the
partitioned table).  There may not be a problem with that, because we
grab AccessExclusiveLock on the index partition, so no query can be
running concurrently ... unless the INSERT is targeting a partition
other than the one where the index is being attached.  (On the
partitioned table and index, we only have ShareUpdateExclusiveLock).

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Crear es tan difícil como ser libre" (Elsa Triolet)



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

On Sun, Dec 7, 2025 at 10:07 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> I'd rather consider the idea of avoiding indexes marked !indisvalid on
> partitioned tables as arbiter lists ... but then we need to verify the
> scenario where there is one, and INSERT ON CONFLICT runs concurrently
> with ALTER INDEX ATTACH PARTITION for the last partition lacking the
> index (which is the point where the index is marked indisvalid on the
> partitioned table).  There may not be a problem with that, because we
> grab AccessExclusiveLock on the index partition, so no query can be
> running concurrently ... unless the INSERT is targeting a partition
> other than the one where the index is being attached.  (On the
> partitioned table and index, we only have ShareUpdateExclusiveLock).

For my taste it feels too complicated for such a case.

What is about changing the logic of this check to the next:
For each valid index used as arbiter in a partitioned table we need to
have a valid in particular partition (but it is okay to also have an
"ready"-only as additional).

If some of the arbiters is invalid in the partitioned table (but we
have valid compatible in any case) - it is okay. We just have to have
an appropriate "companion" for every valid arbiter.

Such a check looks correct to me, at least at the very end of the weekend.

Thanks,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Michael Paquier
Дата:
On Sun, Dec 07, 2025 at 10:07:45PM +0100, Alvaro Herrera wrote:
> Hmm, as I recall it's quite intentional that the index on a partitioned
> table is marked !indisvalid; such indexes are only supposed to be marked
> valid once indexes on all partitions have been attached.  As I recall,
> if you remove that prohibition, some pg_dump scenarios fail.

Right.  If indisvalid is true on a partitioned table, then we are sure
that all its partitions have valid indexes.  If indisvalid is false,
some of its partitions may have an invalid index.  In the false case,
things can be a bit lossy as well.  For example, an ALTER TABLE ONLY
could switch a partition's indisvalid to be true, without switching to
true the indisvalid of its partitioned table.
--
Michael

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-06, Mihail Nikalayeu wrote:

> From my perspective, the correct solution - is to just remove the
> error message, because it looks obsolete now.

Rereading this -- did you mean to propose that a possible fix was to
remove the "invalid arbiter index list" error?  I had understood
something different.

Your idea downthread of changing the way that check works (so that we
don't throw an error in this case, but we continue to double-check that
the arbiter list is sensible) sounds reasonable to me.  Do you want to
propose a specific check for it?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

On Mon, Dec 8, 2025 at 10:58 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Rereading this -- did you mean to propose that a possible fix was to
> remove the "invalid arbiter index list" error?  I had understood
> something different.

Yes, it was the initial idea.

> Your idea downthread of changing the way that check works (so that we
> don't throw an error in this case, but we continue to double-check that
> the arbiter list is sensible) sounds reasonable to me.  Do you want to
> propose a specific check for it?

I think the next logic is correct:
* for each IS indisvalid arbiter in the parent table we should have AT
LEAST ONE compatible indisvalid pair in the partition (we may have
multiple or a few more ready-only)
* for each NOT indisvalid arbiter in parent  - nothing is expected
from partition

I'll try to create a patch with such later.

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello!

After some investigation I ended up with a much simpler fix.
It is self-explanatory in code and a commit message.

Best regards,
Mikhail.

Вложения

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-09, Mihail Nikalayeu wrote:

> Hello!
> 
> After some investigation I ended up with a much simpler fix.
> It is self-explanatory in code and a commit message.

Yeah, that makes sense.  It's what I was trying to say in
https://postgr.es/m/202512072050.hcyysny65ugj@alvherre.pgsql
Pushed your patch, thanks.

I still wonder if it's possible to break things by doing something like

CREATE TABLE pt (a int) PARTITION BY LISt (a);
CREATE TABLE p1 PARTITION OF pt FOR VALUES IN (1);
CREATE TABLE p2 PARTITION OF pt FOR VALUES IN (2);
CREATE UNIQUE INDEX pti ON ONLY pt (a);
CREATE UNIQUE INDEX p1i ON p1 (a);
CREATE UNIQUE INDEX p2i ON p2 (a);
ALTER INDEX pti ATTACH PARTITION p1i;

and then do the
  INSERT INTO pt VALUES (1) ON CONFLICT (a) DO NOTHING;
dance concurrently with
  ALTER INDEX pti ATTACH PARTITION p2i;

This would be a much smaller problem than the already fixed ones though,
I think.

Thanks!

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
            https://twitter.com/thingskatedid/status/1456027786158776329



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
Hi,

I just saw a failure in CI for an unrelated patch,


https://api.cirrus-ci.com/v1/artifact/task/4641105595072512/testrun/build/testrun/injection_points/isolation/regression.diffs

diff -U3 /home/postgres/postgres/src/test/modules/injection_points/expected/reindex-concurrently-upsert.out
/home/postgres/postgres/build/testrun/injection_points/isolation/results/reindex-concurrently-upsert.out
--- /home/postgres/postgres/src/test/modules/injection_points/expected/reindex-concurrently-upsert.out    2025-12-11
12:41:23.982167232+0000
 
+++ /home/postgres/postgres/build/testrun/injection_points/isolation/results/reindex-concurrently-upsert.out
2025-12-1112:45:24.038078804 +0000
 
@@ -62,22 +62,13 @@
 (1 row)
 
 step s1_start_upsert: <... completed>
+step s2_start_upsert: <... completed>
+step s3_start_reindex: <... completed>
 step s4_wakeup_s2: 
     SELECT injection_points_detach('exec-insert-before-insert-speculative');
     SELECT injection_points_wakeup('exec-insert-before-insert-speculative');
 
-injection_points_detach
------------------------
-                       
-(1 row)
-
-injection_points_wakeup
------------------------
-                       
-(1 row)
-
-step s2_start_upsert: <... completed>
-step s3_start_reindex: <... completed>
+ERROR:  could not detach injection point "exec-insert-before-insert-speculative"
 
 starting permutation: s3_setup_wait_before_swap s3_start_reindex s1_start_upsert s4_wakeup_to_swap s2_start_upsert
s4_wakeup_s2s4_wakeup_s1
 
 injection_points_attach


I haven't seen anywhere else.  I don't believe this to be a problem in
the patch, because the patch is about partitions and this test case does
not involve partitioned tables -- so I'm inclined to think it's a timing
issue in the test itself.  If so, this can probably be fixed with
additional constraints on on of the steps in the permutation ...
probably maybe s2_start_upsert depend on s4_wakeup_s2?  But since I
can't easily reproduce this, I have no idea if this is a valid
solution.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

On Thu, Dec 11, 2025 at 10:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> I just saw a failure in CI for an unrelated patch

I'll try to dive deeper tomorrow to find a fix, but it feels like we
are doing something wrong here.

The tests were good to prove the issue and demonstrate it was fixed
after some changes.

But currently we are just trying (not the first time already) to make
sure OUTPUT of the test is EXACTLY equal to some variant.
At the same time I think a more correct approach here - is to test
something like "output does not contain `duplicate key value violates
unique constraint` message". Or even better real case - pgbench of
concurrent REINDEX + INSERT (takes seconds to reproduce, but CPU is
high).
It is a way to test something essential what we want to be not broken,
not exact output of concurrent commands.... But current
isolationtester does not support anything like that.

I am afraid amount of time needed to stabilize such test (in its
output, not the sense) is not cover potential value of it.
Also, I imaging someone changing something unrelated (catalog snapshot
invalidation, for example) and test starts to fail on some rear animal
once a week.... Ughn.

Maybe I am inclined by my main programming experience (Java, backends,
distributed systems, etc.) and databases need to be much more accurate
and strict even if it pains...

What do you think about it?

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-12, Mihail Nikalayeu wrote:

> Hello, Álvaro!
> 
> On Thu, Dec 11, 2025 at 10:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > I just saw a failure in CI for an unrelated patch
> 
> I'll try to dive deeper tomorrow to find a fix, but it feels like we
> are doing something wrong here.

Hmm, this is a good point.

> But currently we are just trying (not the first time already) to make
> sure OUTPUT of the test is EXACTLY equal to some variant.

A low-cost option might be to add alternative expected file(s), which
matches other variant(s).  I think trying to make isolationtester "smart
match" the output might be more complicated than is warranted.

> I am afraid amount of time needed to stabilize such test (in its
> output, not the sense) is not cover potential value of it.

Yeah, could be.

> Also, I imaging someone changing something unrelated (catalog snapshot
> invalidation, for example) and test starts to fail on some rear animal
> once a week.... Ughn.

Another idea might be to rewrite these tests using BackgroundPsql under
the TAP infrastructure.  That's quite a bit more tedious to write, but
we can be more precise on detecting whether some particular error
message was thrown or not.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."                (Robert Davidson)
               http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
 Hello, Álvaro!

On Fri, Dec 12, 2025 at 11:17 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Another idea might be to rewrite these tests using BackgroundPsql under
> the TAP infrastructure.  That's quite a bit more tedious to write, but
> we can be more precise on detecting whether some particular error
> message was thrown or not.

I think I understood the race, currently thinking about two possible approaches:

1) extract LOOP waiting for injection point into some function like
"injection_points_await_waiter" and add it almost between each steps
to ensure it all executed by guardrails (effectively reducing
concurrency instead of making isolationtester to report data the same
way)
2) rewrite using TAP infra

What do you think about this? Which one do you prefer?

Best regards,
Mikhail.



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Andres Freund
Дата:
Hi,

On 2025-12-11 22:36:55 +0100, Álvaro Herrera wrote:
> I just saw a failure in CI for an unrelated patch,

There are a *lot* of these right now.  CI on master currently has a failure
rate of 14 failures / 50 runs on the postgres/postgres repo. All of the
failures are due to this issue from what I can tell.  That means there are a
lot of false failures for cfbot.

Could we perhaps disable this test until it's been rewritten to be stable?

Greetings,

Andres Freund



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Andres!

On Sun, Dec 14, 2025 at 4:19 PM Andres Freund <andres@anarazel.de> wrote:
> There are a *lot* of these right now.  CI on master currently has a failure
> rate of 14 failures / 50 runs on the postgres/postgres repo. All of the
> failures are due to this issue from what I can tell.  That means there are a
> lot of false failures for cfbot.

Oh, that's sad :(

> Could we perhaps disable this test until it's been rewritten to be stable?

Of course, if you can do it yourself, feel free to do so for all test
from the set:

index-concurrently-upsert
index-concurrently-upsert-predicate
reindex-concurrently-upsert
reindex-concurrently-upsert-on-constraint
reindex-concurrently-upsert-partitioned

You may even remove them for now - I plan to rewrite them using TAP
infrastructure, because "endless fight" from [0] seems to be true now.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwUhvaEsPbYaT3Z5cMO779JruDqBbE5nijeaBcXiNPoCYw%40mail.gmail.com#97598dff6875c788765e62b18bc3ee63



Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2025-Dec-14, Mihail Nikalayeu wrote:

> Of course, if you can do it yourself, feel free to do so for all test
> from the set:
> 
> index-concurrently-upsert
> index-concurrently-upsert-predicate
> reindex-concurrently-upsert
> reindex-concurrently-upsert-on-constraint
> reindex-concurrently-upsert-partitioned

I've commented them out from the meson/makefile for now.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)