Обсуждение: BUG #19355: Attempt to insert data unexpectedly during concurrent update
BUG #19355: Attempt to insert data unexpectedly during concurrent update
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 19355
Logged by: Bihua Wang
Email address: wangbihua.cn@gmail.com
PostgreSQL version: 18.1
Operating system: linux
Description:
Start two transaction and update on same tuple, raise concurrent update and
evalplanqual. It will be found out that the session with evalplanqual did
not successfully update the data, but instead attempted to insert a row of
data incorrectly.
-- postgresql version
psql (18.1)
Type "help" for help.
postgres=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 18.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (EulerOS 4.8.5-28), 64-bit
-- prepare relation
drop table t1,t2,t3;
create table t1(a int primary key, b int);
create table t2(a int, b int);
create table t3(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
insert into t3 values(3,3);
-- test sql
merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
--make sure plan like this, may need to execute "set enable_hashjoin=off"
if necessay
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
-- session 1:
postgres=# begin;
BEGIN
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
postgres=*# merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
MERGE 2
postgres=*# end;
COMMIT
-- session 2
postgres=# begin;
BEGIN
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
postgres=*# merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (a)=(1) already exists.
On Mon, 2025-12-15 at 01:40 +0000, PG Bug reporting form wrote: > Start two transaction and update on same tuple, raise concurrent update and > evalplanqual. It will be found out that the session with evalplanqual did > not successfully update the data, but instead attempted to insert a row of > data incorrectly. I'd say that is expected. If you need a guarantee that either INSERT or UPDATE succeed, you have to use INSERT ... ON CONFLICT ... DO UPDATE Yours, Laurenz Albe
The issue is that the MERGE INTO match condition is not updated.
In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
In the scenario shown above, if no primary key exists, an extra row will be inserted.
Further investigation shows that execution plans using Hash Join do not exhibit this problem.
However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
In the scenario shown above, if no primary key exists, an extra row will be inserted.
Further investigation shows that execution plans using Hash Join do not exhibit this problem.
Could you please take a look at my explanation and let me know if anything is inaccurate? I’d also appreciate it if you could check the test scenario I provided. Thanks a lot!
Laurenz Albe <laurenz.albe@cybertec.at> 于2025年12月15日周一 14:25写道:
On Mon, 2025-12-15 at 01:40 +0000, PG Bug reporting form wrote:
> Start two transaction and update on same tuple, raise concurrent update and
> evalplanqual. It will be found out that the session with evalplanqual did
> not successfully update the data, but instead attempted to insert a row of
> data incorrectly.
I'd say that is expected.
If you need a guarantee that either INSERT or UPDATE succeed, you have to use
INSERT ... ON CONFLICT ... DO UPDATE
Yours,
Laurenz Albe
On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not
changed,the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual
(EPQ)to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the
ONcondition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should
remainMATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.
Regards,
Dean
Hi,
On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.
Thanks for the detailed analysis and adding me to the thread, Dean.
I would think that a case that involves no partitioning at all would be untouchable by this code, but it looks like the logic I added is incorrectly affecting cases where pruning isn’t even relevant. I’ll need to look more carefully at why such a rowmark would exist in the rowmarks list if its relation isn’t in es_unpruned_relids. Maybe the set population is incorrect at some point, or perhaps it matters that the set is a copy in the EPQ estate.
I’m afk (on vacation) at the moment, so won’t be able to dig into this until next week.
— Amit
Amit Langote <amitlangote09@gmail.com> 于2025年12月24日周三 16:08写道:
Hi,On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote:On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.Thanks for the detailed analysis and adding me to the thread, Dean.
I would think that a case that involves no partitioning at all would be untouchable by this code, but it looks like the logic I added is incorrectly affecting cases where pruning isn’t even relevant. I’ll need to look more carefully at why such a rowmark would exist in the rowmarks list if its relation isn’t in es_unpruned_relids. Maybe the set population is incorrect at some point, or perhaps it matters that the set is a copy in the EPQ estate.
I did some debugging, and I found that:
In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes:
.....
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
{
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
glob->allRelids = bms_add_member(glob->allRelids,
list_length(glob->finalrtable));
}
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
{
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
glob->allRelids = bms_add_member(glob->allRelids,
list_length(glob->finalrtable));
}
....
The VALUE rte was not satisfied above if, so it was not added into the glob->allRelids.
Then in standard_planner(), we have:
....
result->unprunableRelids = bms_difference(glob->allRelids,
glob->prunableRelids);
glob->prunableRelids);
....
So the result->unprunableRelids contains only 1.
In InitPlan(), we have:
.....
/*
* Ignore "parent" rowmarks, because they are irrelevant at
* runtime. Also ignore the rowmarks belonging to child tables
* that have been pruned in ExecDoInitialPruning().
*/
if (rc->isParent ||
!bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
* Ignore "parent" rowmarks, because they are irrelevant at
* runtime. Also ignore the rowmarks belonging to child tables
* that have been pruned in ExecDoInitialPruning().
*/
if (rc->isParent ||
!bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
.....
the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly.
I did a quick fix as the attached patch.
Any thoughts?
Thanks,
Tender Wang
Вложения
On Wed, 24 Dec 2025 at 12:07, Tender Wang <tndrwang@gmail.com> wrote: > > I did some debugging, and I found that: > In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes: > ..... > the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly. > > I did a quick fix as the attached patch. > Any thoughts? Yes. However, it's not sufficient to only add RTE_VALUES RTEs to what gets included in PlannerGlobal.allRelids. Rowmarks can be attached to other kinds of RTEs. An obvious example is an RTE_SUBQUERY RTE that is an actual subquery that did not come from a view. So, for this approach to work robustly, it really should include *all* RTEs in PlannerGlobal.allRelids. I took a slightly different approach, which was to change the test in InitPlan() (and also in ExecInitLockRows() and ExecInitModifyTable()) to only ignore rowmarks for pruned relations that are plain RTE_RELATION relations, since those are the only relations that are ever actually pruned. So rowmarks attached to any other kind of RTE are not ignored. I also added an isolation test case. I'm somewhat conflicted as to which approach is better. I think maybe there is less chance of other unintended side-effects if the set of RTEs included in PlannerGlobal.allRelids, unprunableRelids, and es_unpruned_relids is not changed. However, as it stands, PlannerGlobal.allRelids is misnamed (it should probably have been called "relationRelids", in line with the "relationOids" field). Making it actually include all RTEs would solve that. Regards, Dean
Вложения
Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年12月25日周四 07:12写道:
On Wed, 24 Dec 2025 at 12:07, Tender Wang <tndrwang@gmail.com> wrote:
>
> I did some debugging, and I found that:
> In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes:
> .....
> the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly.
>
> I did a quick fix as the attached patch.
> Any thoughts?
Yes. However, it's not sufficient to only add RTE_VALUES RTEs to what
gets included in PlannerGlobal.allRelids. Rowmarks can be attached to
other kinds of RTEs. An obvious example is an RTE_SUBQUERY RTE that is
an actual subquery that did not come from a view. So, for this
approach to work robustly, it really should include *all* RTEs in
PlannerGlobal.allRelids.
Yes, it is. I can reproduce the same with a subquery as below:
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
or
merge into t1 using (select generate_series(1,2) as a) as t3 on (t1.a = t3.a)when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
All reported the same error. My approach obviously can not cover all cases.
I took a slightly different approach, which was to change the test in
InitPlan() (and also in ExecInitLockRows() and ExecInitModifyTable())
to only ignore rowmarks for pruned relations that are plain
RTE_RELATION relations, since those are the only relations that are
ever actually pruned. So rowmarks attached to any other kind of RTE
are not ignored. I also added an isolation test case.
I failed to apply your patch, some errors, like "error: git apply: bad git-diff - expected /dev/null on line 2"
or "Patch format detection failed."
I looked through the patch. It worked for me.
I'm somewhat conflicted as to which approach is better. I think maybe
there is less chance of other unintended side-effects if the set of
RTEs included in PlannerGlobal.allRelids, unprunableRelids, and
es_unpruned_relids is not changed. However, as it stands,
PlannerGlobal.allRelids is misnamed (it should probably have been
called "relationRelids", in line with the "relationOids" field).
Making it actually include all RTEs would solve that.
.....
/*
* RT indexes of all relation RTEs in finalrtable (RTE_RELATION and* RTE_SUBQUERY RTEs of views)
*/
Bitmapset *allRelids;
......
Per the comment, I guess Amit didn't plan to include all RTEs in his design.
Thanks,
Tender Wang
Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
От
Bernice Southey
Дата:
Hi, I've spent some time looking at the code and history of this bug because I also ran into it [1] and I was deeply curious why concurrent updates worked with a hash join + seq scan but not a nested loop + index scan. I was also curious if this bug is why Yuri didn't see deadlocks in this related bug [2]. FWIW, I found a simpler way to reproduce this: create table t(id int primary key, count int); insert into t values (1, 0), (2, 0); --session 1 begin; update t set count = count + 1; --session 2 set enable_seqscan = off; update t set count = count + 1 from (values (1), (2)) v(id) where t.id = v.id; Only one update succeeds in session 2. Tender Wang wrote: > Dean Rasheed wrote: >> I'm somewhat conflicted as to which approach is better. I think maybe >> there is less chance of other unintended side-effects if the set of >> RTEs included in PlannerGlobal.allRelids, unprunableRelids, and >> es_unpruned_relids is not changed. However, as it stands, >> PlannerGlobal.allRelids is misnamed (it should probably have been >> called "relationRelids", in line with the "relationOids" field). >> Making it actually include all RTEs would solve that. > Per the comment, I guess Amit didn't plan to include all RTEs in his design. While digging, I noticed that allRelids seems only to be used to create unprunableRelids. But unprunableRelids is used among other things, to create es_unpruned_relids and as an unpruned_relids parameter to ExecInitRangeTable. It's difficult to follow all the paths where the missing relids will matter. So I think Tender's approach is safer if amended to include all relids. I'm very new to postgres source code, so this is a very tentative opinion. Amit can best answer after his holiday. I was curious enough to test in the meantime. I made an experimental patch that just adds all rels to allRelids. This fixed all the issues I'm aware of. It caused overexplain tests to fail by adding in new non-relation relIds as expected. Apologies if I'm getting the terminology wrong. I ran Yuri's reproducer on my patched branch and indeed got several deadlocks. psql:crash.sql:3: ERROR: deadlock detected DETAIL: Process 71673 waits for ShareLock on transaction 2766; blocked by process 71643. Process 71643 waits for ShareLock on transaction 2777; blocked by process 71673. Curiously, in the test I added (based on Dean's), enable_indexscan or enable_seqscan didn't make any difference to the plan. But when I run the reproducer manually, it does change the plan. Since the bug disappears with a hash join + seq scan, the plan is critical. So I included 'set enable_seqscan to 0' for reference and future proofing. I didn't attempt to fix the comment because I don't understand it. I also didn't check all my tabs vs spaces, or run TAP tests, as this patch is just meant for info. Most of the code is still way over my head, so my apologies if none of this is helpful. As to why this works for a hash join but not a nested loop, I have some ideas, but I'm still digging. Happy New Year! Bernice [1] https://www.postgresql.org/message-id/flat/CAMbWs4-uC8DWRZvByHej%2B9E5em7V_%2BzNyRfFx-UUpR4HwifK4w%40mail.gmail.com [2] https://www.postgresql.org/message-id/ly6rss443iajiv5cfypzu6fgrmxdbwzdtuujkddr4eis7s2gcq%40gcdytigety6e
Вложения
Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
От
Bernice Southey
Дата:
> > Dean Rasheed wrote: > >> I'm somewhat conflicted as to which approach is better. I think maybe > >> there is less chance of other unintended side-effects if the set of > >> RTEs included in PlannerGlobal.allRelids, unprunableRelids, and > >> es_unpruned_relids is not changed. However, as it stands, > >> PlannerGlobal.allRelids is misnamed (it should probably have been > >> called "relationRelids", in line with the "relationOids" field). > >> Making it actually include all RTEs would solve that. I did some more digging (postgres source code is addictive). Up until v56-0004-Defer-locking-of-runtime-prunable-relations-to-e.patch [1], the existing unfiltered rtable is used to create unprunableRelids. +++ b/src/backend/optimizer/plan/planner.c @@ -555,6 +555,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, result->planTree = top_plan; result->partPruneInfos = glob->partPruneInfos; result->rtable = glob->finalrtable; + result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)), + glob->prunableRelids); In v56 [2], the filtered allRelids was added. I think this is when the bug was introduced, because the three places from Dean's patch are in both versions. I've looked much harder at the code (I'm still at stumbling-around-in-the-dark-with-a-match level) and AFAICT, the two approaches are very similar. I think equal effort is required to check that PlannerGlobal.allRelids, unprunableRelids, and es_unpruned_relids are correct, whichever approach is used. I can't find any missed cases in either approach - with my matchlight. Sorry for my ignorance: does a relId refer to a range table index and a relation to a...for lack of a better word...table+? I can really see these much appreciated extra features and enhancements don't come cheap. Thanks, Bernice [1] https://www.postgresql.org/message-id/CA%2BHiwqE7%2BiwMH4NYtFi28Pt9fT_gRW%2BGt-%3DCvOX%3DPkquo%3DAN8w%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BHiwqHRRFQN6yZ54fBydOTM6ncqZBCmewZ6n519RjRdDsO44g%40mail.gmail.com
Hi Dean, Apologies for the delay in getting back to this thread (vacation got unexpectedly long for family reasons). On Thu, Dec 25, 2025 at 8:12 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 24 Dec 2025 at 12:07, Tender Wang <tndrwang@gmail.com> wrote: > > > > I did some debugging, and I found that: > > In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes: > > ..... > > the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly. > > > > I did a quick fix as the attached patch. > > Any thoughts? > > Yes. However, it's not sufficient to only add RTE_VALUES RTEs to what > gets included in PlannerGlobal.allRelids. Rowmarks can be attached to > other kinds of RTEs. An obvious example is an RTE_SUBQUERY RTE that is > an actual subquery that did not come from a view. So, for this > approach to work robustly, it really should include *all* RTEs in > PlannerGlobal.allRelids. > > I took a slightly different approach, which was to change the test in > InitPlan() (and also in ExecInitLockRows() and ExecInitModifyTable()) > to only ignore rowmarks for pruned relations that are plain > RTE_RELATION relations, since those are the only relations that are > ever actually pruned. So rowmarks attached to any other kind of RTE > are not ignored. I also added an isolation test case. > > I'm somewhat conflicted as to which approach is better. I think maybe > there is less chance of other unintended side-effects if the set of > RTEs included in PlannerGlobal.allRelids, unprunableRelids, and > es_unpruned_relids is not changed. Yes, I think the approach in your patch seems better for the reason you mentioned, at least for back-patching sanity. I intended all of these relid sets to account for prunable RELATION RTEs only. Thanks Tender and Bernice for the additional analysis. I prefer Dean's fix-the-executor approach for back-patching. Bernice, are there other related issues you're aware of beyond this rowmark bug? Want to make sure Dean's patch covers them too. > However, as it stands, > PlannerGlobal.allRelids is misnamed (it should probably have been > called "relationRelids", in line with the "relationOids" field). > Making it actually include all RTEs would solve that. Yeah, I agree it should have been named relationRelids. Perhaps worth renaming in a separate cleanup, though not urgent. Thanks for the patch! Do you intend to commit and back-patch this yourself, or would you like me to handle it? -- Thanks, Amit Langote
Hi Bernice, On Fri, Jan 2, 2026 at 6:58 AM Bernice Southey <bernice.southey@gmail.com> wrote: > > > Dean Rasheed wrote: > > >> I'm somewhat conflicted as to which approach is better. I think maybe > > >> there is less chance of other unintended side-effects if the set of > > >> RTEs included in PlannerGlobal.allRelids, unprunableRelids, and > > >> es_unpruned_relids is not changed. However, as it stands, > > >> PlannerGlobal.allRelids is misnamed (it should probably have been > > >> called "relationRelids", in line with the "relationOids" field). > > >> Making it actually include all RTEs would solve that. > > I did some more digging (postgres source code is addictive). > > Up until v56-0004-Defer-locking-of-runtime-prunable-relations-to-e.patch > [1], the existing unfiltered rtable is used to create > unprunableRelids. > > +++ b/src/backend/optimizer/plan/planner.c > @@ -555,6 +555,8 @@ standard_planner(Query *parse, const char > *query_string, int cursorOptions, > result->planTree = top_plan; > result->partPruneInfos = glob->partPruneInfos; > result->rtable = glob->finalrtable; > + result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, > list_length(result->rtable)), > + glob->prunableRelids); > > In v56 [2], the filtered allRelids was added. I think this is when the > bug was introduced, because the three places from Dean's patch are in > both versions. > > I've looked much harder at the code (I'm still at > stumbling-around-in-the-dark-with-a-match level) and AFAICT, the two > approaches are very similar. I think equal effort is required to check > that PlannerGlobal.allRelids, unprunableRelids, and es_unpruned_relids > are correct, whichever approach is used. I can't find any missed cases > in either approach - with my matchlight. Good catch on the history -- that's exactly when the bug was introduced. I'd say Dean's approach is easier to verify since it's a simple check at the point of use, rather than ensuring allRelids is built correctly across all planner code paths. > Sorry for my ignorance: does a relId refer to a range table index and > a relation to a...for lack of a better word...table+? Depends on the context, but "relid" in the planner internal data structures refer to RT index. In the planner output data structures (plan nodes, etc.), we typically use "rti" or "rtindex". -- Thanks, Amit Langote
Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
От
Bernice Southey
Дата:
Amit Langote <amitlangote09@gmail.com> wrote:
> Bernice, are there other
> related issues you're aware of beyond this rowmark bug? Want to make
> sure Dean's patch covers them too.
What I found was this takes a combination of nested loop and index
scan (or tidscan) to lose the update. The first update succeeds and
subsequent updates disappear. Nested loop with seqscan is fine, and so
is index scan with merge and hash joins. My beginner guess is that
IndexRecheck in EPQ is returning true on the first row, and then false
on subsequent rows (due to the missing rowmarks). SeqRecheck always
returns true.
explain update t set count = count + 1 from (values (1), (2)) v(id)
where t.id = v.id;
This loses updates:
Update on t (cost=0.15..16.38 rows=0 width=0)
-> Nested Loop (cost=0.15..16.38 rows=2 width=38)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32)
-> Index Scan using t_pkey on t (cost=0.15..8.17 rows=1 width=14)
Index Cond: (id = "*VALUES*".column1)
This doesn't:
Update on t (cost=0.20..90.61 rows=0 width=0)
-> Hash Join (cost=0.20..90.61 rows=2 width=38)
Hash Cond: (t.id = "*VALUES*".column1)
-> Index Scan using t_pkey on t (cost=0.15..82.06 rows=2260 width=14)
-> Hash (cost=0.03..0.03 rows=2 width=32)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32)
and neither does:
Update on t (cost=0.00..121.73 rows=0 width=0)
-> Nested Loop (cost=0.00..121.73 rows=2 width=38)
Join Filter: (t.id = "*VALUES*".column1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32)
-> Seq Scan on t (cost=0.00..32.60 rows=2260 width=14)
Thanks, Bernice
On Wed, Jan 7, 2026 at 6:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Jan 2, 2026 at 6:58 AM Bernice Southey
> <bernice.southey@gmail.com> wrote:
> > I've looked much harder at the code (I'm still at
> > stumbling-around-in-the-dark-with-a-match level) and AFAICT, the two
> > approaches are very similar. I think equal effort is required to check
> > that PlannerGlobal.allRelids, unprunableRelids, and es_unpruned_relids
> > are correct, whichever approach is used. I can't find any missed cases
> > in either approach - with my matchlight.
>
> Good catch on the history -- that's exactly when the bug was
> introduced.
I remembered more on why I insisted on putting only relations with OID
(RTE_RELATION and RTE_SUBQUERY with relid set (views)) into
PlannerGlobal.allRelids and thus PlannedStmt.unprunableRelids. The
reason is that I did not want loops iterating over unprunableRelids to
have to skip relations that do not have an OID. For example,
AcquireExecutorLocks() currently iterates over the rtable and skips
non-RELATION RTEs; in a now-reverted commit (525392d57) I had changed
it to iterate over unprunableRelids instead, which avoided the need
for that skip logic entirely, because unprunableRelids by design only
contains OID-bearing relations:
- foreach(lc2, plannedstmt->rtable)
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
This is why adding all RTEs to allRelids, as Tender suggested, would
be problematic -- it would defeat the design intent that allows loops
over unprunableRelids to skip the filtering logic. While there are no
such loops today (I reverted the patch that added one), keeping this
design makes sense. Dean's fix addresses the buggy bms_is_member()
check directly, without changing allRelids - albeit the name's a bit
misleading as has been mentioned.
--
Thanks, Amit Langote
On Wed, 7 Jan 2026 at 09:37, Amit Langote <amitlangote09@gmail.com> wrote: > > Yes, I think the approach in your patch seems better for the reason > you mentioned, at least for back-patching sanity. > > I intended all of these relid sets to account for prunable RELATION RTEs only. Yes, I think that makes sense. > Thanks Tender and Bernice for the additional analysis. I prefer Dean's > fix-the-executor approach for back-patching. Bernice, are there other > related issues you're aware of beyond this rowmark bug? Want to make > sure Dean's patch covers them too. It looks to me as though either approach would work, so I'm happy for you to decide which approach fits best with your design. > Thanks for the patch! Do you intend to commit and back-patch this > yourself, or would you like me to handle it? It's your code, and you're more familiar with it than me, so I'm happy to leave it to you :-) Regards, Dean
On Wed, Jan 7, 2026 at 9:52 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Wed, 7 Jan 2026 at 09:37, Amit Langote <amitlangote09@gmail.com> wrote: > > > > Yes, I think the approach in your patch seems better for the reason > > you mentioned, at least for back-patching sanity. > > > > I intended all of these relid sets to account for prunable RELATION RTEs only. > > Yes, I think that makes sense. > > > Thanks Tender and Bernice for the additional analysis. I prefer Dean's > > fix-the-executor approach for back-patching. Bernice, are there other > > related issues you're aware of beyond this rowmark bug? Want to make > > sure Dean's patch covers them too. > > It looks to me as though either approach would work, so I'm happy for > you to decide which approach fits best with your design. Ok, I thought about this some more. I admit I'd be lying if I said I didn't have doubts about my original design. It might be better for PlannedStmt.unprunableRelids and es_unpruned_relids to cover *all* RTEs except those that are prunable (decided by the planner) or pruned (decided during executor startup). That way, code checking these sets wouldn't need to also verify the RTE kind, as your patch currently does. If we were to change the design, we could remove PlannerGlobal.allRelids entirely on master, since it would no longer need to be selectively populated -- unprunableRelids could just be computed directly from the full rtable. But that would mean we'd be leaving v18 with an unused field in PlannerGlobal. That's not great, but having different designs in the two branches isn't ideal either. So I'll go with your minimal fix for the back-patch. We can revisit the design cleanup on master later if desired. > > Thanks for the patch! Do you intend to commit and back-patch this > > yourself, or would you like me to handle it? > > It's your code, and you're more familiar with it than me, so I'm happy > to leave it to you :-) Surely. -- Thanks, Amit Langote
On Thu, Jan 8, 2026 at 10:16 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Jan 7, 2026 at 9:52 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 7 Jan 2026 at 09:37, Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > Yes, I think the approach in your patch seems better for the reason > > > you mentioned, at least for back-patching sanity. > > > > > > I intended all of these relid sets to account for prunable RELATION RTEs only. > > > > Yes, I think that makes sense. > > > > > Thanks Tender and Bernice for the additional analysis. I prefer Dean's > > > fix-the-executor approach for back-patching. Bernice, are there other > > > related issues you're aware of beyond this rowmark bug? Want to make > > > sure Dean's patch covers them too. > > > > It looks to me as though either approach would work, so I'm happy for > > you to decide which approach fits best with your design. > > Ok, I thought about this some more. > > I admit I'd be lying if I said I didn't have doubts about my original > design. It might be better for PlannedStmt.unprunableRelids and > es_unpruned_relids to cover *all* RTEs except those that are prunable > (decided by the planner) or pruned (decided during executor startup). > That way, code checking these sets wouldn't need to also verify the > RTE kind, as your patch currently does. > > If we were to change the design, we could remove > PlannerGlobal.allRelids entirely on master, since it would no longer > need to be selectively populated -- unprunableRelids could just be > computed directly from the full rtable. But that would mean we'd be > leaving v18 with an unused field in PlannerGlobal. That's not great, > but having different designs in the two branches isn't ideal either. > > So I'll go with your minimal fix for the back-patch. We can revisit > the design cleanup on master later if desired. > > > > Thanks for the patch! Do you intend to commit and back-patch this > > > yourself, or would you like me to handle it? > > > > It's your code, and you're more familiar with it than me, so I'm happy > > to leave it to you :-) > > Surely. Here's Dean patch with a commit message added. I plan to commit it barring objections. -- Thanks, Amit Langote
Вложения
On Thu, 15 Jan 2026 at 11:39, Amit Langote <amitlangote09@gmail.com> wrote: > > Here's Dean patch with a commit message added. I plan to commit it > barring objections. > Commit message LGTM. Regards, Dean
On Fri, Jan 16, 2026 at 1:35 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Thu, 15 Jan 2026 at 11:39, Amit Langote <amitlangote09@gmail.com> wrote: > > > > Here's Dean patch with a commit message added. I plan to commit it > > barring objections. > > Commit message LGTM. Thanks for checking, pushed. -- Thanks, Amit Langote