Обсуждение: BUG #16644: null value for defaults in OLD variable for trigger

Поиск

Список

Период

Сортировка

BUG #16644: null value for defaults in OLD variable for trigger

От

PG Bug reporting form

Дата:

30 сентября 2020 г., 00:50:15

The following bug has been logged on the website:

Bug reference:      16644
Logged by:          Fedor Erastov
Email address:      fedor_erastov@mail.ru
PostgreSQL version: 13.0
Operating system:   CentOS, MacOS
Description:

Start history:
https://postgresteam.slack.com/archives/C0FS3UTAP/p1601206489174900

Found weird postgres behavior (seems to work for >11 versions):
1. There is a table with data, and trigger before update for each row
2. Add a new column with not null default value
3. When trying to update the value in the old column, raise `ERROR: null
value in column violates not-null constraint`
Most likely this is because the default values in >11 versions are not
really put into the table when adding a column. And an important feature is
that if the trigger returns NEW, then there are no problems, and if OLD,
then an error appears. Although if you check these two variables, they will
be absolutely equal. 

Full PoC:

create table test(a integer);
create or replace function set_updated_at_column() returns trigger
    language plpgsql
as
$$
BEGIN
   RAISE NOTICE 'OLD: %, NEW: %, COMPARE: %', OLD, NEW, OLD = NEW;
   RETURN OLD;
END;
$$;
create trigger update_test
    before update
    on test
    for each row
execute procedure set_updated_at_column();
insert into test values(1);
-- adds new column
alter table test add column b integer not null default 1;
-- fails with a not null constraint violation, which is not the case, since
the tuple is (1,1) not (1,null)
update test set a=1 where a=1;

Interesting observation: if you reassign the value of old.b old.b := old.b;
the error is gone.

With the help of the slack user @easteregg, it turned out to be possible to
find the first bad commit in which this error occurs, that would be:
https://github.com/postgres/postgres/commit/ff11e7f4b9ae017585c3ba146db7ba39c31f209a

In addition, I have a suspicion that it has something to do with work "lazy"
defaults
https://dataegret.com/2018/03/waiting-for-postgresql-11-pain-free-add-column-with-non-null-defaults/

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Tom Lane

Дата:

30 сентября 2020 г., 05:21:34

PG Bug reporting form <noreply@postgresql.org> writes:
> Found weird postgres behavior (seems to work for >11 versions):
> 1. There is a table with data, and trigger before update for each row
> 2. Add a new column with not null default value
> 3. When trying to update the value in the old column, raise `ERROR: null
> value in column violates not-null constraint`

I confirm this bug in v12 and up.  There's no visible bug in v11,
so the attrmissing feature (which was added in 11) is not solely
to blame.  It's a component of the problem, though.

After tracing through it, what I find is that:

1. Where ExecBRUpdateTriggers fetches the "trigtuple" (line 2695
of trigger.c, in HEAD) what it gets is a raw copy of the old
on-disk tuple, with natts = 1.  That gets passed to the trigger
function as OLD, but it reads correctly because it's being decoded
with the relation's tupdesc, which has the needed info to fill in
attrmissing columns.

2. When the trigger does "return OLD", trigtuple is what gets
passed back, and it gets jammed into the slot that ExecUpdate
passed to ExecBRUpdateTriggers.  *That slot does not have any
attrmissing info*.  Thus, subsequent examination of the slot,
in particular by ExecConstraints, concludes that the recently-
added column is NULL.

I've not tried to find where is the difference between 11 and
12 that makes it fail or not fail.  It's likely some innocent
seeming aspect of the slot management hacking that Andres
was doing around that time.  It's a bit odd though, because
AFAICS the slot in question is going to be the result slot
of the JunkFilter used by ExecModifyTable, and I'd rather have
expected that that slot would never have had any constraints,
in any version.

If you ask me, the proximate cause of this bug was the decision
to stick attrmissing info into the "constr" field of tupledescs.
That seems pretty damfool, because there are large parts of the
system that consider that the constr info is optional and can
be discarded, or never built to begin with.  Perhaps it's too late
to revisit that, but we need to take a very very hard look at an
awful lot of places if we're to stick with that design.

I think we can band-aid this immediate problem by forcing
trigger.c to materialize the "old" tuples it fetches from disk
(or whatever needs to be done to substitute missing values into
them).  I've got about zero faith that there aren't dozens of
similar bugs lurking, however.

            regards, tom lane

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Andrew Dunstan

Дата:

30 сентября 2020 г., 20:02:39

On 9/29/20 10:21 PM, Tom Lane wrote:
>
> If you ask me, the proximate cause of this bug was the decision
> to stick attrmissing info into the "constr" field of tupledescs.
> That seems pretty damfool, because there are large parts of the
> system that consider that the constr info is optional and can
> be discarded, or never built to begin with.  Perhaps it's too late
> to revisit that, but we need to take a very very hard look at an
> awful lot of places if we're to stick with that design.
>

:-(

At nearly three years remove my memory is a bit hazy, but IIRC I looked
at all the places (or all I could identify) where it might have mattered
and it looked OK. But I'm not going to bet my life I caught everything,
and maybe I unintentionally provided for a future footgun, which from
your description this appears to be.

I'm not sure what the implications of changing it would be.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Amit Langote

Дата:

23 октября 2020 г., 18:36:56

On Wed, Sep 30, 2020 at 11:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After tracing through it, what I find is that:
>
> 1. Where ExecBRUpdateTriggers fetches the "trigtuple" (line 2695
> of trigger.c, in HEAD) what it gets is a raw copy of the old
> on-disk tuple, with natts = 1.  That gets passed to the trigger
> function as OLD, but it reads correctly because it's being decoded
> with the relation's tupdesc, which has the needed info to fill in
> attrmissing columns.
>
> 2. When the trigger does "return OLD", trigtuple is what gets
> passed back, and it gets jammed into the slot that ExecUpdate
> passed to ExecBRUpdateTriggers.  *That slot does not have any
> attrmissing info*.  Thus, subsequent examination of the slot,
> in particular by ExecConstraints, concludes that the recently-
> added column is NULL.
>
> I've not tried to find where is the difference between 11 and
> 12 that makes it fail or not fail.  It's likely some innocent
> seeming aspect of the slot management hacking that Andres
> was doing around that time.  It's a bit odd though, because
> AFAICS the slot in question is going to be the result slot
> of the JunkFilter used by ExecModifyTable, and I'd rather have
> expected that that slot would never have had any constraints,
> in any version.

In v11, GetTupleForTrigger() expands the HeapTuple, with this:

    /*
     * While this is not necessary anymore after 297d627e, as a defense
     * against C code that has not recompiled for minor releases after the
     * fix, continue to expand the tuple.
     */
    if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts)
        result = heap_expand_tuple(&tuple, relation->rd_att);
    else
        result = heap_copytuple(&tuple);
    ReleaseBuffer(buffer);

So the tuple that gets slammed into JunkFilter's result slot is an
expanded copy of the on-disk tuple, so it doesn't matter that its
TupleDesc doesn't have constraints (constr) initialized.

In v12 and further, I see that the tuple fetched by
GetTupleForTrigger() is slammed into JunkFilter's result slot
unexpanded and it remains so through ExecConstraint().

> I think we can band-aid this immediate problem by forcing
> trigger.c to materialize the "old" tuples it fetches from disk
> (or whatever needs to be done to substitute missing values into
> them).

Maybe something like the attached?

--
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

expand-trigger-tuple.patch

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Tom Lane

Дата:

23 октября 2020 г., 18:52:09

Amit Langote <amitlangote09@gmail.com> writes:
> On Wed, Sep 30, 2020 at 11:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not tried to find where is the difference between 11 and
>> 12 that makes it fail or not fail.

> In v11, GetTupleForTrigger() expands the HeapTuple, with this:

>     /*
>      * While this is not necessary anymore after 297d627e, as a defense
>      * against C code that has not recompiled for minor releases after the
>      * fix, continue to expand the tuple.
>      */
>     if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts)
>         result = heap_expand_tuple(&tuple, relation->rd_att);
>     else
>         result = heap_copytuple(&tuple);
>     ReleaseBuffer(buffer);

Ah, good sleuthing.

>> I think we can band-aid this immediate problem by forcing
>> trigger.c to materialize the "old" tuples it fetches from disk
>> (or whatever needs to be done to substitute missing values into
>> them).

> Maybe something like the attached?

Probably needs some attention to memory management (e.g.,
should_free_trig) but I'm okay with doing this as a short-term
fix.  I remain worried about similar instances elsewhere, but
I have no good ideas about how to find them.

            regards, tom lane

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Tom Lane

Дата:

25 октября 2020 г., 20:59:30

I wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
>> Maybe something like the attached?

> Probably needs some attention to memory management (e.g.,
> should_free_trig) but I'm okay with doing this as a short-term
> fix.

I fixed the should_free business, and spent a fair amount of time
convincing myself that no other code paths in trigger.c need this,
and pushed it.

            regards, tom lane

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Tom Lane

Дата:

25 октября 2020 г., 23:53:35

I wrote:
> I fixed the should_free business, and spent a fair amount of time
> convincing myself that no other code paths in trigger.c need this,
> and pushed it.

No sooner had I pushed that than I thought of a potentially better
answer: why is it that the executor's slot hasn't got the right
descriptor, anyway?  The reason is that ExecInitModifyTable is relying
on ExecInitJunkFilter, and thence ExecCleanTypeFromTL, to build that
descriptor.  But we have the relation's *actual* descriptor right at
hand, and could use that instead.  This saves a few cycles ---
ExecCleanTypeFromTL isn't enormously expensive, but it's not free
either.  Moreover, it's more correct, even disregarding the problem
at hand, because the tlist isn't a perfectly accurate depiction of
the relation rowtype: ExecCleanTypeFromTL will not derive the correct
info for dropped columns.

We do have to refactor ExecInitJunkFilter a little to make this
possible, but it's not a big change.  (I initially tried to use the
existing ExecInitJunkFilterConversion function, but that does the
wrong thing for attisdropped columns.)  Otherwise, this reverts the
prior patch's code changes in triggers.c, but keeps the test case.

Thoughts?  I'm inclined to leave the previous patch alone in the
back branches, because that has fewer potential side-effects,
but I like this better for HEAD.

            regards, tom lane

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 59289f8d4d..092ac1646d 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -89,8 +89,6 @@ static bool GetTupleForTrigger(EState *estate,
                                LockTupleMode lockmode,
                                TupleTableSlot *oldslot,
                                TupleTableSlot **newSlot);
-static HeapTuple MaterializeTupleForTrigger(TupleTableSlot *slot,
-                                            bool *shouldFree);
 static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
                            Trigger *trigger, TriggerEvent event,
                            Bitmapset *modifiedCols,
@@ -2674,7 +2672,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
                 ExecCopySlot(newslot, epqslot_clean);
         }

-        trigtuple = MaterializeTupleForTrigger(oldslot, &should_free_trig);
+        trigtuple = ExecFetchSlotHeapTuple(oldslot, true, &should_free_trig);
     }
     else
     {
@@ -3043,40 +3041,6 @@ GetTupleForTrigger(EState *estate,
     return true;
 }

-/*
- * Extract a HeapTuple that we can pass off to trigger functions.
- *
- * We must materialize the tuple and make sure it is not dependent on any
- * attrmissing data.  This is needed for the old row in BEFORE UPDATE
- * triggers, since they can choose to pass back this exact tuple as the update
- * result, causing the tuple to be inserted into an executor slot that lacks
- * the attrmissing data.
- *
- * Currently we don't seem to need to remove the attrmissing dependency in any
- * other cases, but keep this as a separate function to simplify fixing things
- * if that changes.
- */
-static HeapTuple
-MaterializeTupleForTrigger(TupleTableSlot *slot, bool *shouldFree)
-{
-    HeapTuple    tup;
-    TupleDesc    tupdesc = slot->tts_tupleDescriptor;
-
-    tup = ExecFetchSlotHeapTuple(slot, true, shouldFree);
-    if (HeapTupleHeaderGetNatts(tup->t_data) < tupdesc->natts &&
-        tupdesc->constr && tupdesc->constr->missing)
-    {
-        HeapTuple    newtup;
-
-        newtup = heap_expand_tuple(tup, tupdesc);
-        if (*shouldFree)
-            heap_freetuple(tup);
-        *shouldFree = true;
-        tup = newtup;
-    }
-    return tup;
-}
-
 /*
  * Is trigger enabled to fire?
  */
diff --git a/src/backend/executor/execJunk.c b/src/backend/executor/execJunk.c
index 40d700dd9e..1a822ff24b 100644
--- a/src/backend/executor/execJunk.c
+++ b/src/backend/executor/execJunk.c
@@ -54,23 +54,48 @@
  *
  * The source targetlist is passed in.  The output tuple descriptor is
  * built from the non-junk tlist entries.
- * An optional resultSlot can be passed as well.
+ * An optional resultSlot can be passed as well; otherwise, we create one.
  */
 JunkFilter *
 ExecInitJunkFilter(List *targetList, TupleTableSlot *slot)
 {
-    JunkFilter *junkfilter;
     TupleDesc    cleanTupType;
-    int            cleanLength;
-    AttrNumber *cleanMap;
-    ListCell   *t;
-    AttrNumber    cleanResno;

     /*
      * Compute the tuple descriptor for the cleaned tuple.
      */
     cleanTupType = ExecCleanTypeFromTL(targetList);

+    /*
+     * The rest is the same as ExecInitJunkFilterInsertion, ie, we want to map
+     * every non-junk targetlist column into the output tuple.
+     */
+    return ExecInitJunkFilterInsertion(targetList, cleanTupType, slot);
+}
+
+/*
+ * ExecInitJunkFilterInsertion
+ *
+ * Initialize a JunkFilter for insertions into a table.
+ *
+ * Here, we are given the target "clean" tuple descriptor rather than
+ * inferring it from the targetlist.  Although the target descriptor can
+ * contain deleted columns, that is not of concern here, since the targetlist
+ * should contain corresponding NULL constants (cf. ExecCheckPlanOutput).
+ * It is assumed that the caller has checked that the table's columns match up
+ * with the non-junk columns of the targetlist.
+ */
+JunkFilter *
+ExecInitJunkFilterInsertion(List *targetList,
+                            TupleDesc cleanTupType,
+                            TupleTableSlot *slot)
+{
+    JunkFilter *junkfilter;
+    int            cleanLength;
+    AttrNumber *cleanMap;
+    ListCell   *t;
+    AttrNumber    cleanResno;
+
     /*
      * Use the given slot, or make a new slot if we weren't given one.
      */
@@ -93,17 +118,18 @@ ExecInitJunkFilter(List *targetList, TupleTableSlot *slot)
     if (cleanLength > 0)
     {
         cleanMap = (AttrNumber *) palloc(cleanLength * sizeof(AttrNumber));
-        cleanResno = 1;
+        cleanResno = 0;
         foreach(t, targetList)
         {
             TargetEntry *tle = lfirst(t);

             if (!tle->resjunk)
             {
-                cleanMap[cleanResno - 1] = tle->resno;
+                cleanMap[cleanResno] = tle->resno;
                 cleanResno++;
             }
         }
+        Assert(cleanResno == cleanLength);
     }
     else
         cleanMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a33423c896..29e07b7228 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2591,15 +2591,27 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                 TupleTableSlot *junkresslot;

                 subplan = mtstate->mt_plans[i]->plan;
-                if (operation == CMD_INSERT || operation == CMD_UPDATE)
-                    ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-                                        subplan->targetlist);

                 junkresslot =
                     ExecInitExtraTupleSlot(estate, NULL,
                                            table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-                j = ExecInitJunkFilter(subplan->targetlist,
-                                       junkresslot);
+
+                /*
+                 * For an INSERT or UPDATE, the result tuple must always match
+                 * the target table's descriptor.  For a DELETE, it won't
+                 * (indeed, there's probably no non-junk output columns).
+                 */
+                if (operation == CMD_INSERT || operation == CMD_UPDATE)
+                {
+                    ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+                                        subplan->targetlist);
+                    j = ExecInitJunkFilterInsertion(subplan->targetlist,
+                                                    RelationGetDescr(resultRelInfo->ri_RelationDesc),
+                                                    junkresslot);
+                }
+                else
+                    j = ExecInitJunkFilter(subplan->targetlist,
+                                           junkresslot);

                 if (operation == CMD_UPDATE || operation == CMD_DELETE)
                 {
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index b7978cd22e..0c48d2a519 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -156,6 +156,9 @@ extern void ResetTupleHashTable(TupleHashTable hashtable);
  */
 extern JunkFilter *ExecInitJunkFilter(List *targetList,
                                       TupleTableSlot *slot);
+extern JunkFilter *ExecInitJunkFilterInsertion(List *targetList,
+                                               TupleDesc cleanTupType,
+                                               TupleTableSlot *slot);
 extern JunkFilter *ExecInitJunkFilterConversion(List *targetList,
                                                 TupleDesc cleanTupType,
                                                 TupleTableSlot *slot);

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Amit Langote

Дата:

26 октября 2020 г., 06:11:23

On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > I fixed the should_free business, and spent a fair amount of time
> > convincing myself that no other code paths in trigger.c need this,
> > and pushed it.

Thanks for that.

> No sooner had I pushed that than I thought of a potentially better
> answer: why is it that the executor's slot hasn't got the right
> descriptor, anyway?  The reason is that ExecInitModifyTable is relying
> on ExecInitJunkFilter, and thence ExecCleanTypeFromTL, to build that
> descriptor.  But we have the relation's *actual* descriptor right at
> hand, and could use that instead.  This saves a few cycles ---
> ExecCleanTypeFromTL isn't enormously expensive, but it's not free
> either.

Agreed.

> Moreover, it's more correct, even disregarding the problem
> at hand, because the tlist isn't a perfectly accurate depiction of
> the relation rowtype: ExecCleanTypeFromTL will not derive the correct
> info for dropped columns.

Hmm, I don't understand.  Isn't it the planner's job to make the
targetlist correctly account for dropped columns; what
expand_targetlist() does?

> We do have to refactor ExecInitJunkFilter a little to make this
> possible, but it's not a big change.  (I initially tried to use the
> existing ExecInitJunkFilterConversion function, but that does the
> wrong thing for attisdropped columns.)

So, with the map generated by ExecInitJunkFilterConversion(), dropped
attributes of the target composite type are effectively mapped to a
NULL datum.  That to me seems to be more or less the same thing as
ExecInitJunkFilterInsertion() mapping dropped attributes of the target
table to NULL datums in the source plan's targetlist.  What am I
missing?

>  Otherwise, this reverts the
> prior patch's code changes in triggers.c, but keeps the test case.
>
> Thoughts?  I'm inclined to leave the previous patch alone in the
> back branches, because that has fewer potential side-effects,
> but I like this better for HEAD.

I do agree that this is a better fix for HEAD.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Tom Lane

Дата:

26 октября 2020 г., 06:19:04

Amit Langote <amitlangote09@gmail.com> writes:
> On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Moreover, it's more correct, even disregarding the problem
>> at hand, because the tlist isn't a perfectly accurate depiction of
>> the relation rowtype: ExecCleanTypeFromTL will not derive the correct
>> info for dropped columns.

> Hmm, I don't understand.  Isn't it the planner's job to make the
> targetlist correctly account for dropped columns; what
> expand_targetlist() does?

Yes, there are columns in the tlist to match them, but ExecCleanTypeFromTL
cannot mark those columns as "attisdropped".  The column data type
likely won't be right either.  The latter shouldn't matter, if the
column is being filled with a null ... but I'm a bit surprised that
we've gotten away this long with not being honest about attisdropped.

            regards, tom lane

Re: BUG #16644: null value for defaults in OLD variable for trigger

От

Amit Langote

Дата:

26 октября 2020 г., 11:51:40

On Mon, Oct 26, 2020 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Moreover, it's more correct, even disregarding the problem
> >> at hand, because the tlist isn't a perfectly accurate depiction of
> >> the relation rowtype: ExecCleanTypeFromTL will not derive the correct
> >> info for dropped columns.
>
> > Hmm, I don't understand.  Isn't it the planner's job to make the
> > targetlist correctly account for dropped columns; what
> > expand_targetlist() does?
>
> Yes, there are columns in the tlist to match them, but ExecCleanTypeFromTL
> cannot mark those columns as "attisdropped".

Ah, okay.

> The column data type
> likely won't be right either.  The latter shouldn't matter, if the
> column is being filled with a null ... but I'm a bit surprised that
> we've gotten away this long with not being honest about attisdropped.

Yeah, I guess most places downstream of ExecModifyTable() seem to rely
on getting that information indirectly via the isnull flag of the
tuple itself.

--
Amit Langote
EDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #16644: null value for defaults in OLD variable for trigger

Вложения