Обсуждение: BUG #19355: Attempt to insert data unexpectedly during concurrent update
BUG #19355: Attempt to insert data unexpectedly during concurrent update
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 19355
Logged by: Bihua Wang
Email address: wangbihua.cn@gmail.com
PostgreSQL version: 18.1
Operating system: linux
Description:
Start two transaction and update on same tuple, raise concurrent update and
evalplanqual. It will be found out that the session with evalplanqual did
not successfully update the data, but instead attempted to insert a row of
data incorrectly.
-- postgresql version
psql (18.1)
Type "help" for help.
postgres=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 18.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (EulerOS 4.8.5-28), 64-bit
-- prepare relation
drop table t1,t2,t3;
create table t1(a int primary key, b int);
create table t2(a int, b int);
create table t3(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
insert into t3 values(3,3);
-- test sql
merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
--make sure plan like this, may need to execute "set enable_hashjoin=off"
if necessay
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
-- session 1:
postgres=# begin;
BEGIN
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
postgres=*# merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
MERGE 2
postgres=*# end;
COMMIT
-- session 2
postgres=# begin;
BEGIN
postgres=*# explain merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
QUERY PLAN
-------------------------------------------------------------------------------
Merge on t1 (cost=38.41..124.75 rows=0 width=0)
-> Nested Loop Left Join (cost=38.41..124.75 rows=200 width=34)
-> Subquery Scan on t3 (cost=38.25..42.25 rows=200 width=32)
-> HashAggregate (cost=38.25..40.25 rows=200 width=4)
Group Key: t2.a
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260
width=4)
-> Index Scan using t1_pkey on t1 (cost=0.15..0.41 rows=1
width=10)
Index Cond: (a = t3.a)
(8 rows)
postgres=*# merge /*+ nestloop(t3 t1)) */ into t1 using
(select t2.a as a from t2 group by t2.a) as t3
on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
postgres-*# postgres-*# postgres-*# postgres-*# postgres-*# postgres-*#
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (a)=(1) already exists.
On Mon, 2025-12-15 at 01:40 +0000, PG Bug reporting form wrote: > Start two transaction and update on same tuple, raise concurrent update and > evalplanqual. It will be found out that the session with evalplanqual did > not successfully update the data, but instead attempted to insert a row of > data incorrectly. I'd say that is expected. If you need a guarantee that either INSERT or UPDATE succeed, you have to use INSERT ... ON CONFLICT ... DO UPDATE Yours, Laurenz Albe
The issue is that the MERGE INTO match condition is not updated.
In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
In the scenario shown above, if no primary key exists, an extra row will be inserted.
Further investigation shows that execution plans using Hash Join do not exhibit this problem.
However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
In the scenario shown above, if no primary key exists, an extra row will be inserted.
Further investigation shows that execution plans using Hash Join do not exhibit this problem.
Could you please take a look at my explanation and let me know if anything is inaccurate? I’d also appreciate it if you could check the test scenario I provided. Thanks a lot!
Laurenz Albe <laurenz.albe@cybertec.at> 于2025年12月15日周一 14:25写道:
On Mon, 2025-12-15 at 01:40 +0000, PG Bug reporting form wrote:
> Start two transaction and update on same tuple, raise concurrent update and
> evalplanqual. It will be found out that the session with evalplanqual did
> not successfully update the data, but instead attempted to insert a row of
> data incorrectly.
I'd say that is expected.
If you need a guarantee that either INSERT or UPDATE succeed, you have to use
INSERT ... ON CONFLICT ... DO UPDATE
Yours,
Laurenz Albe
On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not
changed,the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual
(EPQ)to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the
ONcondition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should
remainMATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.
Regards,
Dean
Hi,
On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.
Thanks for the detailed analysis and adding me to the thread, Dean.
I would think that a case that involves no partitioning at all would be untouchable by this code, but it looks like the logic I added is incorrectly affecting cases where pruning isn’t even relevant. I’ll need to look more carefully at why such a rowmark would exist in the rowmarks list if its relation isn’t in es_unpruned_relids. Maybe the set population is incorrect at some point, or perhaps it matters that the set is a copy in the EPQ estate.
I’m afk (on vacation) at the moment, so won’t be able to dig into this until next week.
— Amit
Amit Langote <amitlangote09@gmail.com> 于2025年12月24日周三 16:08写道:
Hi,On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote:On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
>
> The issue is that the MERGE INTO match condition is not updated.
> In the MATCHED path of MERGE INTO, when the target row satisfies the match condition and the condition itself has not changed, the system should still be able to handle concurrent updates to the same target row by relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, and then proceed with the intended update.
> However, in the current implementation, even though the concurrent update does not modify any columns relevant to the ON condition, the EPQ recheck unexpectedly results in a match condition failure, causing the update path that should remain MATCHED to be treated as NOT MATCHED.
I spent a little time looking at this, and managed to reduce the
reproducer test case down to this:
-- Setup
drop table if exists t1,t2;
create table t1(a int primary key, b int);
create table t2(a int, b int);
insert into t1 values(1,0),(2,0);
insert into t2 values(1,1),(2,2);
-- Session 1
begin;
update t1 set b = b+1;
-- Session 2
merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
when matched then
update set b = t1.b + 1
when not matched then
insert (a,b) values (1,1);
-- Session 1
commit;
This works fine in PG17, but fails with a PK violation in PG18.
Git-bisecting points to this commit:
cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
commit cbc127917e04a978a788b8bc9d35a70244396d5b
Author: Amit Langote <amitlan@postgresql.org>
Date: Fri Feb 7 17:15:09 2025 +0900
Track unpruned relids to avoid processing pruned relations
Doing a little more debugging, it looks like the problem might be this
change in InitPlan():
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
which seems to cause it to incorrectly skip a rowmark, which I suspect
is what is causing EvalPlanQual() to return the wrong result.Thanks for the detailed analysis and adding me to the thread, Dean.
I would think that a case that involves no partitioning at all would be untouchable by this code, but it looks like the logic I added is incorrectly affecting cases where pruning isn’t even relevant. I’ll need to look more carefully at why such a rowmark would exist in the rowmarks list if its relation isn’t in es_unpruned_relids. Maybe the set population is incorrect at some point, or perhaps it matters that the set is a copy in the EPQ estate.
I did some debugging, and I found that:
In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes:
.....
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
{
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
glob->allRelids = bms_add_member(glob->allRelids,
list_length(glob->finalrtable));
}
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
{
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
glob->allRelids = bms_add_member(glob->allRelids,
list_length(glob->finalrtable));
}
....
The VALUE rte was not satisfied above if, so it was not added into the glob->allRelids.
Then in standard_planner(), we have:
....
result->unprunableRelids = bms_difference(glob->allRelids,
glob->prunableRelids);
glob->prunableRelids);
....
So the result->unprunableRelids contains only 1.
In InitPlan(), we have:
.....
/*
* Ignore "parent" rowmarks, because they are irrelevant at
* runtime. Also ignore the rowmarks belonging to child tables
* that have been pruned in ExecDoInitialPruning().
*/
if (rc->isParent ||
!bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
* Ignore "parent" rowmarks, because they are irrelevant at
* runtime. Also ignore the rowmarks belonging to child tables
* that have been pruned in ExecDoInitialPruning().
*/
if (rc->isParent ||
!bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
.....
the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly.
I did a quick fix as the attached patch.
Any thoughts?
Thanks,
Tender Wang