Обсуждение: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

Поиск

Список

Период

Сортировка

BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

PG Bug reporting form

Дата:

15 июля 2025 г., 19:11:56

The following bug has been logged on the website:

Bug reference:      18986
Logged by:          Yaroslav Syrytsia
Email address:      y@lll.gd
PostgreSQL version: 17.5
Operating system:   Ubuntu 22.04.5
Description:

Hello,

During testing a flow in an app, I encountered a segfault that only happens
during parallel execution:
https://gist.github.com/joy4eg/708458e204f52129a8e54a13534586b7

Looking into the coredump, it looks like a NULL pointer dereference is
happening at

https://github.com/postgres/postgres/blob/5e2f3df49d4298c6097789364a5a53be172f6e85/src/backend/executor/nodeModifyTable.c#L752

After further investigation, that part of the code looks suspicious as well:
static void
ExecInitInsertProjection(ModifyTableState *mtstate,
                                                 ResultRelInfo
*resultRelInfo)
{
    ...
        /* Build ProjectionInfo if needed (it probably isn't). */
        if (need_projection)
        {
                TupleDesc       relDesc =
RelationGetDescr(resultRelInfo->ri_RelationDesc);

                /* need an expression context to do the projection */
                if (mtstate->ps.ps_ExprContext == NULL)
                        ExecAssignExprContext(estate, &mtstate->ps);

                resultRelInfo->ri_projectNew =
                        ExecBuildProjectionInfo(insertTargetList,
mtstate->ps.ps_ExprContext,
resultRelInfo->ri_newTupleSlot,
&mtstate->ps,
relDesc);
        }

        resultRelInfo->ri_projectNewInfoValid = true; // <--- it's set to
true even if need_projection may be NULL.
}

(gdb) frame 0
#0  ExecGetUpdateNewTuple (oldSlot=0x55defb35ae20, planSlot=0x55defb41d9d8,
relinfo=0x55defb312108) at
executor/./build/../src/backend/executor/nodeModifyTable.c:752
752             econtext = newProj->pi_exprContext;
(gdb) p relinfo
$1 = (ResultRelInfo *) 0x55defb312108
(gdb) p *relinfo
$2 = {type = T_ResultRelInfo, ri_RangeTableIndex = 15, ri_RelationDesc =
0x7f0c0eb80500, ri_NumIndices = 21, ri_IndexRelationDescs = 0x55defb358248,
ri_IndexRelationInfo = 0x55defb358350, ri_RowIdAttNo = 1,
  ri_extraUpdatedCols = 0x55defb357300, ri_projectNew = 0x0, ri_newTupleSlot
= 0x55defb323cf0, ri_oldTupleSlot = 0x55defb323ae8, ri_projectNewInfoValid =
true, ri_needLockTagTuple = false, ri_TrigDesc = 0x55defb312310,
  ri_TrigFunctions = 0x55defb31a318, ri_TrigWhenExprs = 0x55defb2d8cf8,
ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, ri_TrigOldSlot =
0x55defb35ae20, ri_TrigNewSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState =
0x0,
  ri_usesFdwDirectModify = false, ri_NumSlots = 0, ri_NumSlotsInitialized =
0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions
= 0x0, ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0,
  ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x55defb358140,
ri_NumGeneratedNeededI = 0, ri_NumGeneratedNeededU = 2, ri_returningList =
0x55defb39c630, ri_projectReturning = 0x55defb323590,
ri_onConflictArbiterIndexes = 0x0,
  ri_onConflict = 0x0, ri_MergeActions = {0x55defb324518, 0x0,
0x55defb328e50}, ri_MergeJoinCondition = 0x0, ri_PartitionCheckExpr = 0x0,
ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap =
0x0,
  ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0,
ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0,
ri_ancestorResultRels = 0x0}

Where:
- ri_projectNewInfoValid = true
- ri_projectNew = 0x0

Version tested:
- 17.5 (ubuntu, apt, with and without pgroonga)
- postgres:18beta1-bookworm (docker, without pgroonga)
- postgres:17.5-bookworm (docker, without pgroonga)

Yaroslav.

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Tom Lane

Дата:

15 июля 2025 г., 21:13:53

PG Bug reporting form <noreply@postgresql.org> writes:
> During testing a flow in an app, I encountered a segfault that only happens
> during parallel execution:
> https://gist.github.com/joy4eg/708458e204f52129a8e54a13534586b7

This report isn't terribly helpful, since you have not explained
how to trigger the crash.  If we can't duplicate it, it's hard
to decide what is the appropriate fix.

            regards, tom lane

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Michael Paquier

Дата:

16 июля 2025 г., 03:10:36

On Tue, Jul 15, 2025 at 02:13:53PM -0400, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > During testing a flow in an app, I encountered a segfault that only happens
> > during parallel execution:
> > https://gist.github.com/joy4eg/708458e204f52129a8e54a13534586b7
>
> This report isn't terribly helpful, since you have not explained
> how to trigger the crash.  If we can't duplicate it, it's hard
> to decide what is the appropriate fix.

For the sake of visibility if the information on github gets lost, the
change can be summarized by this query pattern, but we don't have any
information about the end of the query:

WITH replaced AS (
   DELETE FROM replaceable_events_before_update
    WHERE replaced_by_id = ANY($1)
    RETURNING attr_list_1, [...] ),
  source_data (attr_list_2) AS (
    SELECT * from replaced UNION ALL
      VALUES (attr_list_3 ...

That's most likely generated a MERGE by an ORM to have these many
parameters and columns listed.  Without the full pattern or an idea of
the schema as this includes triggers, what we can do is limited, as
Tom says.

And the relevant portion of the backtrace:
#7  ExecGetUpdateNewTuple (oldSlot=0x55b0842d0288,
#planSlot=0x55b084249d68, relinfo=0x55b0836a9b88) at
#executor/./build/../src/backend/executor/nodeModifyTable.c:752
#8  ExecBRUpdateTriggers (estate=0x55b08369a708,
#epqstate=0x55b0836a9a68, relinfo=0x55b0836a9b88,
#tupleid=0x7fffc373628a, fdw_trigtuple=0x0, newslot=0x55b083720eb0,
#tmresult=0x7fffc37361d0, tmfd=0x7fffc37362e0)    at
#commands/./build/../src/backend/commands/trigger.c:2979
#9  0x000055b08158628e in ExecUpdatePrologue
#(context=context@entry=0x7fffc37362c0,
#resultRelInfo=resultRelInfo@entry=0x55b0836a9b88,
#tupleid=tupleid@entry=0x7fffc373628a, oldtuple=oldtuple@entry=0x0,
#slot=slot@entry=0x55b083720eb0, result=result@entry=0x7fffc37361d0)
#at executor/./build/../src/backend/executor/nodeModifyTable.c:1949
#10 0x000055b081587e75 in ExecMergeMatched (context=<optimized out>,
#resultRelInfo=<optimized out>, tupleid=0x7fffc373628a, oldtuple=0x0,
#canSetTag=false, matched=0x7fffc3736290)    at
#executor/./build/../src/backend/executor/nodeModifyTable.c:3023

If you could extract the query string from the backtrace, that would
be one piece of information.  With triggers and MERGE in mind, it may
be possible to reproduce something.
--
Michael

Вложения

signature.asc

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Yaroslav Syrytsia

Дата:

16 июля 2025 г., 12:33:33

Hello,

I made a small reproducer here - https://github.com/joy4eg/pg-merge-crash

It creates a few tables and then executes MERGE statements in parallel. 

Yaroslav.

> On 2025-07-15 20:13 CEST Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>  
> PG Bug reporting form <noreply@postgresql.org> writes:
> > During testing a flow in an app, I encountered a segfault that only happens
> > during parallel execution:
> > https://gist.github.com/joy4eg/708458e204f52129a8e54a13534586b7
> 
> This report isn't terribly helpful, since you have not explained
> how to trigger the crash.  If we can't duplicate it, it's hard
> to decide what is the appropriate fix.
> 
>             regards, tom lane

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Dean Rasheed

Дата:

16 июля 2025 г., 12:55:44

On Wed, 16 Jul 2025 at 01:10, Michael Paquier <michael@paquier.xyz> wrote:
>
> And the relevant portion of the backtrace:
> #7  ExecGetUpdateNewTuple (oldSlot=0x55b0842d0288,
> #planSlot=0x55b084249d68, relinfo=0x55b0836a9b88) at
> #executor/./build/../src/backend/executor/nodeModifyTable.c:752
> #8  ExecBRUpdateTriggers (estate=0x55b08369a708,
> #epqstate=0x55b0836a9a68, relinfo=0x55b0836a9b88,
> #tupleid=0x7fffc373628a, fdw_trigtuple=0x0, newslot=0x55b083720eb0,
> #tmresult=0x7fffc37361d0, tmfd=0x7fffc37362e0)    at
> #commands/./build/../src/backend/commands/trigger.c:2979
> #9  0x000055b08158628e in ExecUpdatePrologue
> #(context=context@entry=0x7fffc37362c0,
> #resultRelInfo=resultRelInfo@entry=0x55b0836a9b88,
> #tupleid=tupleid@entry=0x7fffc373628a, oldtuple=oldtuple@entry=0x0,
> #slot=slot@entry=0x55b083720eb0, result=result@entry=0x7fffc37361d0)
> #at executor/./build/../src/backend/executor/nodeModifyTable.c:1949
> #10 0x000055b081587e75 in ExecMergeMatched (context=<optimized out>,
> #resultRelInfo=<optimized out>, tupleid=0x7fffc373628a, oldtuple=0x0,
> #canSetTag=false, matched=0x7fffc3736290)    at
> #executor/./build/../src/backend/executor/nodeModifyTable.c:3023

Hmm, I can see the problem. This particular stack trace should never happen:

#7  ExecGetUpdateNewTuple
#8  ExecBRUpdateTriggers
#9  ExecUpdatePrologue
#10 ExecMergeMatched

ExecBRUpdateTriggers() only calls ExecGetUpdateNewTuple() if
GetTupleForTrigger() sets epqslot_candidate to a non-null value, and
GetTupleForTrigger() should only set epqslot_candidate (*epqslot) to
non-null if we're not doing a MERGE (see commit 9321c79).

The problem, however, is that GetTupleForTrigger() tests for MERGE by doing

    if (estate->es_plannedstmt->commandType != CMD_MERGE)

which doesn't work if the MERGE is inside a CTE. So we need a
different way for ExecBRUpdateTriggers() / GetTupleForTrigger() to
know that it's being called from within a MERGE.

Regards,
Dean

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Dean Rasheed

Дата:

16 июля 2025 г., 13:42:33

On Wed, 16 Jul 2025 at 10:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> The problem, however, is that GetTupleForTrigger() tests for MERGE by doing
>
>     if (estate->es_plannedstmt->commandType != CMD_MERGE)
>
> which doesn't work if the MERGE is inside a CTE. So we need a
> different way for ExecBRUpdateTriggers() / GetTupleForTrigger() to
> know that it's being called from within a MERGE.

Attached is a reproducer using the merge-match-recheck isolation test.

I believe that the bug only goes back to v17, because MERGE could not
appear inside a CTE prior to that.

I think that best thing to do is pass the commandType to
ExecBRUpdateTriggers(), except that in v17 we shouldn't change the
signature of ExecBRUpdateTriggers(), so a wrapper function will be
needed, similar to what 9321c79 did.

Question: Is it OK to change the signature of ExecBRUpdateTriggers() in v18?

Regards,
Dean

Вложения

merge-bug.patch

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Michael Paquier

Дата:

16 июля 2025 г., 13:48:55

On Wed, Jul 16, 2025 at 11:42:33AM +0100, Dean Rasheed wrote:
> Attached is a reproducer using the merge-match-recheck isolation test.
>
> I believe that the bug only goes back to v17, because MERGE could not
> appear inside a CTE prior to that.

That was fast, nice!

> I think that best thing to do is pass the commandType to
> ExecBRUpdateTriggers(), except that in v17 we shouldn't change the
> signature of ExecBRUpdateTriggers(), so a wrapper function will be
> needed, similar to what 9321c79 did.

Yes, changing ExecBRUpdateTriggers() would not be a good idea on a
stable branch..  I can see that at least timescaledb does a direct
call to it.  A minor upgrade breakage would be bad for them.

> Question: Is it OK to change the signature of ExecBRUpdateTriggers()
> in v18?

We are still in beta, so that's not a problem for v18 and HEAD.
--
Michael

Вложения

signature.asc

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Dean Rasheed

Дата:

16 июля 2025 г., 16:05:06

On Wed, 16 Jul 2025 at 11:49, Michael Paquier <michael@paquier.xyz> wrote:
>
> > Question: Is it OK to change the signature of ExecBRUpdateTriggers()
> > in v18?
>
> We are still in beta, so that's not a problem for v18 and HEAD.

Cool. I thought so, but I wanted to check.

Attached is a patch for HEAD/v18, and a slightly different one for
v17, preserving the trigger ABI in the standard way.

I decided to do this by adding an extra "is_merge_update" boolean
parameter, rather than passing the commandType because that looked
slightly neater. It was also necessary to update
ExecBRDeleteTriggers(), since otherwise a concurrent MERGE DELETE
could do the wrong thing (execute the wrong action, rather than
seg-faulting). That was picked up by an existing isolation test case
added by 9321c79, so no need for more tests.

I haven't tested this against the OP's reproducer.

Regards,
Dean

Вложения

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Michael Paquier

Дата:

17 июля 2025 г., 08:44:53

On Wed, Jul 16, 2025 at 02:05:06PM +0100, Dean Rasheed wrote:
> I decided to do this by adding an extra "is_merge_update" boolean
> parameter, rather than passing the commandType because that looked
> slightly neater. It was also necessary to update
> ExecBRDeleteTriggers(), since otherwise a concurrent MERGE DELETE
> could do the wrong thing (execute the wrong action, rather than
> seg-faulting). That was picked up by an existing isolation test case
> added by 9321c79, so no need for more tests.

Looks sensible here, for the HEAD and v17 flavors.

> I haven't tested this against the OP's reproducer.

Except waiting for the OP to actually test the fix, I'm not sure if
there is much that we can do.  Anyway, the stack that one gets with
the isolation test is very close to what the OP has reported, so I'm
ready to bet a coffee that you have been able to narrow this one down.

I have also raised an issue to timescale to make them aware of the
issue.  I have no idea if this is an issue for them, just acting as a
reporter:
https://github.com/timescale/timescaledb/issues/8392
--
Michael

Вложения

signature.asc

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Dean Rasheed

Дата:

17 июля 2025 г., 09:46:33

On Thu, 17 Jul 2025 at 06:45, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 16, 2025 at 02:05:06PM +0100, Dean Rasheed wrote:
> > I decided to do this by adding an extra "is_merge_update" boolean
> > parameter, rather than passing the commandType because that looked
> > slightly neater. It was also necessary to update
> > ExecBRDeleteTriggers(), since otherwise a concurrent MERGE DELETE
> > could do the wrong thing (execute the wrong action, rather than
> > seg-faulting). That was picked up by an existing isolation test case
> > added by 9321c79, so no need for more tests.
>
> Looks sensible here, for the HEAD and v17 flavors.

Thanks for looking.

> > I haven't tested this against the OP's reproducer.
>
> Except waiting for the OP to actually test the fix, I'm not sure if
> there is much that we can do.

I'll push this in a day or so in any case, since it's clearly fixing
*an* issue, even if it doesn't entirely fix the OP's issue.

> Anyway, the stack that one gets with
> the isolation test is very close to what the OP has reported, so I'm
> ready to bet a coffee that you have been able to narrow this one down.

Yeah, that was my thinking too.

> I have also raised an issue to timescale to make them aware of the
> issue.  I have no idea if this is an issue for them, just acting as a
> reporter:
> https://github.com/timescale/timescaledb/issues/8392

Cool, thanks for doing that.

Regards,
Dean

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Dean Rasheed

Дата:

18 июля 2025 г., 12:19:14

On Thu, 17 Jul 2025 at 07:46, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I'll push this in a day or so in any case, since it's clearly fixing
> *an* issue, even if it doesn't entirely fix the OP's issue.
>

Pushed.

Regards,
Dean

Re: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

От

Yaroslav Syrytsia

Дата:

18 июля 2025 г., 13:36:41

Hello,

I just made a fresh build:

commit 91ad1bdef8e46983d82d2723c6e1c05e004f74c5 (HEAD -> 17_stable, origin/REL_17_STABLE)

Author: Dean Rasheed <dean.a.rasheed@gmail.com>

Date: Fri Jul 18 10:01:31 2025 +0100

Fix concurrent update trigger issues with MERGE in a CTE.

Can confirm that the issue is resolved now.

Thanks!

Yaroslav.

On Jul 18 2025, at 11:19 am, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Thu, 17 Jul 2025 at 07:46, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I'll push this in a day or so in any case, since it's clearly fixing
> *an* issue, even if it doesn't entirely fix the OP's issue.
>

Pushed.

Regards,
Dean

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #18986: SIGSEGV in nodeModifyTable.c during Parallel Execution

Вложения

Вложения

Вложения

Вложения

Вложения