Обсуждение: BUG #19111: Using EXPLAIN ANALYZE with MERGE causes failed assert
The following bug has been logged on the website:
Bug reference: 19111
Logged by: Oleg Tselebrovskiy
Email address: o.tselebrovskiy@postgrespro.ru
PostgreSQL version: 17.6
Operating system: Ubuntu 22.04
Description:
On PostgreSQL 17+ if you do the following:
-- first psql
CREATE TABLE target (key int primary key, val text);
INSERT INTO target VALUES (1, 'setup1');
CREATE TABLE pa_target (key integer, val text)
PARTITION BY LIST (key);
CREATE TABLE part1 (key integer, val text);
CREATE TABLE part2 (val text, key integer);
CREATE TABLE part3 (key integer, val text);
ALTER TABLE pa_target ATTACH PARTITION part1 FOR VALUES IN (1,4);
ALTER TABLE pa_target ATTACH PARTITION part2 FOR VALUES IN (2,5,6);
ALTER TABLE pa_target ATTACH PARTITION part3 DEFAULT;
INSERT INTO pa_target VALUES (1, 'initial');
INSERT INTO pa_target VALUES (2, 'initial');
BEGIN ISOLATION LEVEL READ COMMITTED;
MERGE INTO target t
USING (SELECT 1 as key, 'merge1' as val) s
ON s.key = t.key
WHEN NOT MATCHED THEN
INSERT VALUES (s.key, s.val)
WHEN MATCHED THEN
UPDATE set key = t.key + 1, val = t.val || ' updated by ' || s.val;
-- second psql
BEGIN ISOLATION LEVEL READ COMMITTED;
EXPLAIN ANALYZE MERGE INTO target t
USING (SELECT 1 as key, 'merge2a' as val) s
ON s.key = t.key
WHEN NOT MATCHED THEN
INSERT VALUES (s.key, s.val)
WHEN MATCHED THEN
UPDATE set key = t.key + 1, val = t.val || ' updated by ' || s.val
WHEN NOT MATCHED BY SOURCE THEN
UPDATE set key = t.key + 1, val = t.val || ' source not matched by
merge2a'
RETURNING merge_action(), t.*;
-- first psql again
COMMIT;
Then the backend for the second psql crashes. With asserts turned on:
TRAP: failed Assert("skipped_path >= 0"), File: "explain.c", Line: 4732,
PID: 66557
postgres: postgres postgres [local]
EXPLAIN(ExceptionalCondition+0xbb)[0x5cd80c4ea887]
postgres: postgres postgres [local] EXPLAIN(+0x34feb6)[0x5cd80bf13eb6]
postgres: postgres postgres [local] EXPLAIN(+0x349ceb)[0x5cd80bf0dceb]
postgres: postgres postgres [local]
EXPLAIN(ExplainPrintPlan+0x1f5)[0x5cd80bf09804]
postgres: postgres postgres [local]
EXPLAIN(ExplainOnePlan+0x43f)[0x5cd80bf0907d]
postgres: postgres postgres [local]
EXPLAIN(standard_ExplainOneQuery+0x30f)[0x5cd80bf08845]
postgres: postgres postgres [local] EXPLAIN(+0x344530)[0x5cd80bf08530]
postgres: postgres postgres [local]
EXPLAIN(ExplainQuery+0x15e)[0x5cd80bf08198]
postgres: postgres postgres [local]
EXPLAIN(standard_ProcessUtility+0xa8a)[0x5cd80c2ec462]
postgres: postgres postgres [local]
EXPLAIN(ProcessUtility+0x13a)[0x5cd80c2eb9d1]
postgres: postgres postgres [local] EXPLAIN(+0x72628a)[0x5cd80c2ea28a]
postgres: postgres postgres [local] EXPLAIN(+0x725fc7)[0x5cd80c2e9fc7]
postgres: postgres postgres [local] EXPLAIN(PortalRun+0x249)[0x5cd80c2e9890]
postgres: postgres postgres [local] EXPLAIN(+0x71e04c)[0x5cd80c2e204c]
postgres: postgres postgres [local]
EXPLAIN(PostgresMain+0xb43)[0x5cd80c2e7843]
postgres: postgres postgres [local] EXPLAIN(+0x719525)[0x5cd80c2dd525]
postgres: postgres postgres [local]
EXPLAIN(postmaster_child_launch+0x174)[0x5cd80c1d8676]
postgres: postgres postgres [local] EXPLAIN(+0x61b1d7)[0x5cd80c1df1d7]
postgres: postgres postgres [local] EXPLAIN(+0x6186cd)[0x5cd80c1dc6cd]
postgres: postgres postgres [local]
EXPLAIN(PostmasterMain+0x159e)[0x5cd80c1dbf97]
postgres: postgres postgres [local] EXPLAIN(main+0x38c)[0x5cd80c072cca]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x77c287829d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x77c287829e40]
postgres: postgres postgres [local] EXPLAIN(_start+0x25)[0x5cd80bcaa2c5]
On Thu, 13 Nov 2025 at 12:18, PG Bug reporting form <noreply@postgresql.org> wrote: > > On PostgreSQL 17+ if you do the following: > ... > Then the backend for the second psql crashes. With asserts turned on. Thanks for the report. What's happening here is that the MERGE in the second query has both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET actions, so it does a full join between the two tables. Initially there is a single matched row, but the concurrent update turns that into a not matched pair of rows and both actions are executed. So the ModifyTable node processes it as 2 rows, whereas its parent node only outputs 1 row, which is something the explain code doesn't like (because it computes the difference, interpreting that as the number of rows skipped). A possible solution would be something like the attached. It feels a little ugly, but I don't see any other easy fix. It's only a rough patch (it should have an isolation test case), but it fixes the problem by causing the parent (full join) node to report that it returned 2 rows, which it didn't really, but it would have done, if the other update had happened before the MERGE, rather than concurrently. (Of course we could just drop that Assert, and it wouldn't cause any harm, but it seems preferable to try to get the row counts right.) Regards, Dean
Вложения
On Thu, Nov 13, 2025 at 9:02 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 13 Nov 2025 at 12:18, PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > On PostgreSQL 17+ if you do the following:
> > ...
> > Then the backend for the second psql crashes. With asserts turned on.
>
> Thanks for the report.
>
> What's happening here is that the MERGE in the second query has both
> NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET actions, so it does a
> full join between the two tables. Initially there is a single matched
> row, but the concurrent update turns that into a not matched pair of
> rows and both actions are executed. So the ModifyTable node processes
> it as 2 rows, whereas its parent node only outputs 1 row, which is
> something the explain code doesn't like (because it computes the
> difference, interpreting that as the number of rows skipped).
>
> A possible solution would be something like the attached. It feels a
> little ugly, but I don't see any other easy fix.
>
> It's only a rough patch (it should have an isolation test case), but
> it fixes the problem by causing the parent (full join) node to report
> that it returned 2 rows, which it didn't really, but it would have
> done, if the other update had happened before the MERGE, rather than
> concurrently.
IMHO it makes sense to make a full join node to report 2 rows because
if you see internally merge is making the behavior as if it would have
returned 2 rows by parent [1], so I think it's right to fix the
instrument to report that, otherwise the plan might look confusing.
OTOH, someone might argue that we should just show in instrument what
really happened that the parent returned just 1 row and then it got
converted to 2 actions, and for doing that we may just remove the
assert.
I have attached an isolation test for the same.
[1]
if (!matched)
{
/*
* If a concurrent update turned a MATCHED case into a NOT MATCHED
* case, and we have both WHEN NOT MATCHED BY SOURCE and WHEN NOT
* MATCHED [BY TARGET] actions, and there is a RETURNING clause,
* ExecMergeMatched() may have already executed a WHEN NOT MATCHED BY
* SOURCE action, and computed the row to return. If so, we cannot
* execute a WHEN NOT MATCHED [BY TARGET] action now, so mark it as
* pending (to be processed on the next call to ExecModifyTable()).
* Otherwise, just process the action now.
*/
if (rslot == NULL)
rslot = ExecMergeNotMatched(context, resultRelInfo, canSetTag);
else
context->mtstate->mt_merge_pending_not_matched = context->planSlot;
}
--
Regards,
Dilip Kumar
Google
Вложения
On Sun, 16 Nov 2025 at 13:21, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Nov 13, 2025 at 9:02 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > A possible solution would be something like the attached. It feels a > > little ugly, but I don't see any other easy fix. > > > > It's only a rough patch (it should have an isolation test case), but > > it fixes the problem by causing the parent (full join) node to report > > that it returned 2 rows, which it didn't really, but it would have > > done, if the other update had happened before the MERGE, rather than > > concurrently. > > IMHO it makes sense to make a full join node to report 2 rows because > if you see internally merge is making the behavior as if it would have > returned 2 rows by parent [1], so I think it's right to fix the > instrument to report that, otherwise the plan might look confusing. > OTOH, someone might argue that we should just show in instrument what > really happened that the parent returned just 1 row and then it got > converted to 2 actions, and for doing that we may just remove the > assert. After thinking about this some more, I decided that it's preferable to report 2 rows rather than just removing the Assert because, in a more complex MERGE with conditional or DO NOTHING actions, there might be rows that are skipped, and doing it this way ensures that EXPLAIN ANALYZE correctly reports the number of rows skipped. > I have attached an isolation test for the same. I decided to add isolation tests to mege-update.spec, rather than adding a new spec file, because it already had the necessary setup, so the new tests just EXPLAIN ANALYZE existing test cases for which the expected results are already known. In addition, I added a function to filter the memory usage from the EXPLAIN output, because that could vary by platform (see similar code in other regression tests). Pushed, and back-patched to v17. Regards, Dean