Re: BUG #16213: segfault when running a query

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #16213: segfault when running a query
Дата
Msg-id 635.1579233826@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #16213: segfault when running a query  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> The above query produces an error in the server log:
> LOG:  server process (PID 108) was terminated by signal 11: Segmentation
> fault

Yeah, duplicated here.  (For anyone following along at home: you
can "create domain string as text", or just s/STRING/TEXT/g in the
given query.  That type's not relevant to the problem.)  The problem
is probably much more easily reproduced in a debug build, because it
boils down to a dangling-pointer bug.  I duplicated it back to 9.4,
and it's probably older than that.

The direct cause of the crash is that by the time we get to ExecutorEnd,
there are dangling pointers in the es_tupleTable list.  Those pointers
turn out to originate from ExecInitSubPlan's creation of TupleTableSlots
for the ProjectionInfo objects it creates when doing hashing.  And the
reason they're dangling is that the subplan is inside a VALUES list,
and nodeValuesscan.c does this remarkably bletcherous thing:

         * Build the expression eval state in the econtext's per-tuple memory.
         * This is a tad unusual, but we want to delete the eval state again
         * when we move to the next row, to avoid growth of memory
         * requirements over a long values list.

It turns out that just below that, there's already some messy hacking
to deal with subplans in the VALUES list.  But I guess we'd not hit
the case of a subplan using hashing within VALUES.

The attached draft patch fixes this by not letting ExecInitSubPlan hook
the slots it's making into the outer es_tupleTable list.  Ordinarily
that would be bad because it exposes us to leaking resources, if the
slots aren't cleared before ending execution.  But nodeSubplan.c is
already being (mostly) careful to clear those slots promptly, so it
doesn't cost us anything much to lose this backstop.

What that change fails to do is guarantee that there are no such
bugs elsewhere.  In the attached I made nodeValuesscan.c assert that
nothing has added slots to the es_tupleTable list, but of course
that only catches cases where there's a live bug.  Given how long
this case escaped detection, I don't feel like that's quite enough.
(Unsurprisingly, the existing regression tests don't trigger this
assert, even without the nodeSubplan.c fix.)

Another angle I've not run to ground is that the comments for the
existing subplan-related hacking in nodeValuesscan.c claim that
a problematic subplan could only occur in conjunction with LATERAL.
But there's no LATERAL in this example --- are those comments wrong
or obsolete, or just talking about a different case?

I didn't work on making a minimal test case for the regression tests,
either.

Anyway, thanks for the report!

            regards, tom lane

diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index ff95317..8a4084b 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -133,11 +133,15 @@ ExecHashSubPlan(SubPlanState *node,
     slot = ExecProject(node->projLeft);

     /*
-     * Note: because we are typically called in a per-tuple context, we have
-     * to explicitly clear the projected tuple before returning. Otherwise,
-     * we'll have a double-free situation: the per-tuple context will probably
-     * be reset before we're called again, and then the tuple slot will think
-     * it still needs to free the tuple.
+     * Note: we have to explicitly clear the projected tuple before returning.
+     * This guards against two different hazards.  First, because we are
+     * typically called in a per-tuple context, the data that was stored into
+     * the slot may be shorter-lived than the slot.  If that data is cleared
+     * before we're called again, but the tuple slot still thinks it needs to
+     * free the tuple, we'll have a double-free crash.  Second, because the
+     * slot isn't hooked into any estate tuple-table list, there isn't anyone
+     * else who's going to clear it before executor shutdown.  So we risk
+     * leaking buffer pins or the like if we don't clear it here.
      */

     /*
@@ -605,21 +609,17 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
         }

         /*
+         * As with projLeft, we must manually clear the slot after each use.
+         */
+        ExecClearTuple(slot);
+
+        /*
          * Reset innerecontext after each inner tuple to free any memory used
          * during ExecProject.
          */
         ResetExprContext(innerecontext);
     }

-    /*
-     * Since the projected tuples are in the sub-query's context and not the
-     * main context, we'd better clear the tuple slot before there's any
-     * chance of a reset of the sub-query's context.  Else we will have the
-     * potential for a double free attempt.  (XXX possibly no longer needed,
-     * but can't hurt.)
-     */
-    ExecClearTuple(node->projRight->pi_state.resultslot);
-
     MemoryContextSwitchTo(oldcontext);
 }

@@ -989,9 +989,15 @@ ExecInitSubPlan(SubPlan *subplan, PlanState *parent)
          * Fortunately we can just pass NULL for now and fill it in later
          * (hack alert!).  The righthand expressions will be evaluated in our
          * own innerecontext.
+         *
+         * We intentionally don't add the slots to the estate's es_tupleTable.
+         * That's to support execution of subplans within short-lived
+         * contexts: these slots might get freed before the estate ends
+         * execution.  Because of this, we have to be careful to explicitly
+         * clear each slot as soon as we're done with its current value.
          */
         tupDescLeft = ExecTypeFromTL(lefttlist);
-        slot = ExecInitExtraTupleSlot(estate, tupDescLeft, &TTSOpsVirtual);
+        slot = MakeTupleTableSlot(tupDescLeft, &TTSOpsVirtual);
         sstate->projLeft = ExecBuildProjectionInfo(lefttlist,
                                                    NULL,
                                                    slot,
@@ -999,7 +1005,7 @@ ExecInitSubPlan(SubPlan *subplan, PlanState *parent)
                                                    NULL);

         sstate->descRight = tupDescRight = ExecTypeFromTL(righttlist);
-        slot = ExecInitExtraTupleSlot(estate, tupDescRight, &TTSOpsVirtual);
+        slot = MakeTupleTableSlot(tupDescRight, &TTSOpsVirtual);
         sstate->projRight = ExecBuildProjectionInfo(righttlist,
                                                     sstate->innerecontext,
                                                     slot,
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 8d916a0..f104d8c 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -94,6 +94,7 @@ ValuesNext(ValuesScanState *node)
     {
         MemoryContext oldContext;
         List       *oldsubplans;
+        List       *oldtupletable;
         List       *exprstatelist;
         Datum       *values;
         bool       *isnull;
@@ -131,15 +132,38 @@ ValuesNext(ValuesScanState *node)
         node->ss.ps.subPlan = NIL;

         /*
+         * A related problem is that SubPlans might try to create tuple slots
+         * and attach them to the outer estate's es_tupleTable; that would
+         * result in dangling pointers in the es_tupleTable after we delete
+         * the eval state.  We can prevent the dangling-pointer problem by
+         * similarly saving and restoring the es_tupleTable list.  This is
+         * only a partial solution though: any slots thereby excluded from the
+         * outer estate might result in resource leaks, if they're not cleared
+         * before ending execution.  Therefore, we decree that SubPlans
+         * mustn't add slots to the outer es_tupleTable.  The Assert below
+         * checks that.  It's only an Assert because resource leakage is way
+         * less problematic than a dangling pointer.
+         */
+        oldtupletable = econtext->ecxt_estate->es_tupleTable;
+        econtext->ecxt_estate->es_tupleTable = NIL;
+
+        /*
          * As the expressions are only ever used once, disable JIT for them.
          * This is worthwhile because it's common to insert significant
          * amounts of data via VALUES().
          */
         saved_jit_flags = econtext->ecxt_estate->es_jit_flags;
         econtext->ecxt_estate->es_jit_flags = PGJIT_NONE;
+
+        /* OK, finally we can init the expressions */
         exprstatelist = ExecInitExprList(exprlist, &node->ss.ps);
+
+        /* Undo the hacks above */
         econtext->ecxt_estate->es_jit_flags = saved_jit_flags;

+        Assert(econtext->ecxt_estate->es_tupleTable == NIL);
+        econtext->ecxt_estate->es_tupleTable = oldtupletable;
+
         node->ss.ps.subPlan = oldsubplans;

         /* parser should have checked all sublists are the same length */

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Reorderbuffer crash during recovery
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #16214: Settings is not copy