Обсуждение: BUG #16644: null value for defaults in OLD variable for trigger
The following bug has been logged on the website: Bug reference: 16644 Logged by: Fedor Erastov Email address: fedor_erastov@mail.ru PostgreSQL version: 13.0 Operating system: CentOS, MacOS Description: Start history: https://postgresteam.slack.com/archives/C0FS3UTAP/p1601206489174900 Found weird postgres behavior (seems to work for >11 versions): 1. There is a table with data, and trigger before update for each row 2. Add a new column with not null default value 3. When trying to update the value in the old column, raise `ERROR: null value in column violates not-null constraint` Most likely this is because the default values in >11 versions are not really put into the table when adding a column. And an important feature is that if the trigger returns NEW, then there are no problems, and if OLD, then an error appears. Although if you check these two variables, they will be absolutely equal. Full PoC: create table test(a integer); create or replace function set_updated_at_column() returns trigger language plpgsql as $$ BEGIN RAISE NOTICE 'OLD: %, NEW: %, COMPARE: %', OLD, NEW, OLD = NEW; RETURN OLD; END; $$; create trigger update_test before update on test for each row execute procedure set_updated_at_column(); insert into test values(1); -- adds new column alter table test add column b integer not null default 1; -- fails with a not null constraint violation, which is not the case, since the tuple is (1,1) not (1,null) update test set a=1 where a=1; Interesting observation: if you reassign the value of old.b old.b := old.b; the error is gone. With the help of the slack user @easteregg, it turned out to be possible to find the first bad commit in which this error occurs, that would be: https://github.com/postgres/postgres/commit/ff11e7f4b9ae017585c3ba146db7ba39c31f209a In addition, I have a suspicion that it has something to do with work "lazy" defaults https://dataegret.com/2018/03/waiting-for-postgresql-11-pain-free-add-column-with-non-null-defaults/
PG Bug reporting form <noreply@postgresql.org> writes: > Found weird postgres behavior (seems to work for >11 versions): > 1. There is a table with data, and trigger before update for each row > 2. Add a new column with not null default value > 3. When trying to update the value in the old column, raise `ERROR: null > value in column violates not-null constraint` I confirm this bug in v12 and up. There's no visible bug in v11, so the attrmissing feature (which was added in 11) is not solely to blame. It's a component of the problem, though. After tracing through it, what I find is that: 1. Where ExecBRUpdateTriggers fetches the "trigtuple" (line 2695 of trigger.c, in HEAD) what it gets is a raw copy of the old on-disk tuple, with natts = 1. That gets passed to the trigger function as OLD, but it reads correctly because it's being decoded with the relation's tupdesc, which has the needed info to fill in attrmissing columns. 2. When the trigger does "return OLD", trigtuple is what gets passed back, and it gets jammed into the slot that ExecUpdate passed to ExecBRUpdateTriggers. *That slot does not have any attrmissing info*. Thus, subsequent examination of the slot, in particular by ExecConstraints, concludes that the recently- added column is NULL. I've not tried to find where is the difference between 11 and 12 that makes it fail or not fail. It's likely some innocent seeming aspect of the slot management hacking that Andres was doing around that time. It's a bit odd though, because AFAICS the slot in question is going to be the result slot of the JunkFilter used by ExecModifyTable, and I'd rather have expected that that slot would never have had any constraints, in any version. If you ask me, the proximate cause of this bug was the decision to stick attrmissing info into the "constr" field of tupledescs. That seems pretty damfool, because there are large parts of the system that consider that the constr info is optional and can be discarded, or never built to begin with. Perhaps it's too late to revisit that, but we need to take a very very hard look at an awful lot of places if we're to stick with that design. I think we can band-aid this immediate problem by forcing trigger.c to materialize the "old" tuples it fetches from disk (or whatever needs to be done to substitute missing values into them). I've got about zero faith that there aren't dozens of similar bugs lurking, however. regards, tom lane
On 9/29/20 10:21 PM, Tom Lane wrote: > > If you ask me, the proximate cause of this bug was the decision > to stick attrmissing info into the "constr" field of tupledescs. > That seems pretty damfool, because there are large parts of the > system that consider that the constr info is optional and can > be discarded, or never built to begin with. Perhaps it's too late > to revisit that, but we need to take a very very hard look at an > awful lot of places if we're to stick with that design. > :-( At nearly three years remove my memory is a bit hazy, but IIRC I looked at all the places (or all I could identify) where it might have mattered and it looked OK. But I'm not going to bet my life I caught everything, and maybe I unintentionally provided for a future footgun, which from your description this appears to be. I'm not sure what the implications of changing it would be. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 30, 2020 at 11:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > After tracing through it, what I find is that: > > 1. Where ExecBRUpdateTriggers fetches the "trigtuple" (line 2695 > of trigger.c, in HEAD) what it gets is a raw copy of the old > on-disk tuple, with natts = 1. That gets passed to the trigger > function as OLD, but it reads correctly because it's being decoded > with the relation's tupdesc, which has the needed info to fill in > attrmissing columns. > > 2. When the trigger does "return OLD", trigtuple is what gets > passed back, and it gets jammed into the slot that ExecUpdate > passed to ExecBRUpdateTriggers. *That slot does not have any > attrmissing info*. Thus, subsequent examination of the slot, > in particular by ExecConstraints, concludes that the recently- > added column is NULL. > > I've not tried to find where is the difference between 11 and > 12 that makes it fail or not fail. It's likely some innocent > seeming aspect of the slot management hacking that Andres > was doing around that time. It's a bit odd though, because > AFAICS the slot in question is going to be the result slot > of the JunkFilter used by ExecModifyTable, and I'd rather have > expected that that slot would never have had any constraints, > in any version. In v11, GetTupleForTrigger() expands the HeapTuple, with this: /* * While this is not necessary anymore after 297d627e, as a defense * against C code that has not recompiled for minor releases after the * fix, continue to expand the tuple. */ if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts) result = heap_expand_tuple(&tuple, relation->rd_att); else result = heap_copytuple(&tuple); ReleaseBuffer(buffer); So the tuple that gets slammed into JunkFilter's result slot is an expanded copy of the on-disk tuple, so it doesn't matter that its TupleDesc doesn't have constraints (constr) initialized. In v12 and further, I see that the tuple fetched by GetTupleForTrigger() is slammed into JunkFilter's result slot unexpanded and it remains so through ExecConstraint(). > I think we can band-aid this immediate problem by forcing > trigger.c to materialize the "old" tuples it fetches from disk > (or whatever needs to be done to substitute missing values into > them). Maybe something like the attached? -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Amit Langote <amitlangote09@gmail.com> writes: > On Wed, Sep 30, 2020 at 11:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've not tried to find where is the difference between 11 and >> 12 that makes it fail or not fail. > In v11, GetTupleForTrigger() expands the HeapTuple, with this: > /* > * While this is not necessary anymore after 297d627e, as a defense > * against C code that has not recompiled for minor releases after the > * fix, continue to expand the tuple. > */ > if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts) > result = heap_expand_tuple(&tuple, relation->rd_att); > else > result = heap_copytuple(&tuple); > ReleaseBuffer(buffer); Ah, good sleuthing. >> I think we can band-aid this immediate problem by forcing >> trigger.c to materialize the "old" tuples it fetches from disk >> (or whatever needs to be done to substitute missing values into >> them). > Maybe something like the attached? Probably needs some attention to memory management (e.g., should_free_trig) but I'm okay with doing this as a short-term fix. I remain worried about similar instances elsewhere, but I have no good ideas about how to find them. regards, tom lane
I wrote: > Amit Langote <amitlangote09@gmail.com> writes: >> Maybe something like the attached? > Probably needs some attention to memory management (e.g., > should_free_trig) but I'm okay with doing this as a short-term > fix. I fixed the should_free business, and spent a fair amount of time convincing myself that no other code paths in trigger.c need this, and pushed it. regards, tom lane
I wrote: > I fixed the should_free business, and spent a fair amount of time > convincing myself that no other code paths in trigger.c need this, > and pushed it. No sooner had I pushed that than I thought of a potentially better answer: why is it that the executor's slot hasn't got the right descriptor, anyway? The reason is that ExecInitModifyTable is relying on ExecInitJunkFilter, and thence ExecCleanTypeFromTL, to build that descriptor. But we have the relation's *actual* descriptor right at hand, and could use that instead. This saves a few cycles --- ExecCleanTypeFromTL isn't enormously expensive, but it's not free either. Moreover, it's more correct, even disregarding the problem at hand, because the tlist isn't a perfectly accurate depiction of the relation rowtype: ExecCleanTypeFromTL will not derive the correct info for dropped columns. We do have to refactor ExecInitJunkFilter a little to make this possible, but it's not a big change. (I initially tried to use the existing ExecInitJunkFilterConversion function, but that does the wrong thing for attisdropped columns.) Otherwise, this reverts the prior patch's code changes in triggers.c, but keeps the test case. Thoughts? I'm inclined to leave the previous patch alone in the back branches, because that has fewer potential side-effects, but I like this better for HEAD. regards, tom lane diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 59289f8d4d..092ac1646d 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -89,8 +89,6 @@ static bool GetTupleForTrigger(EState *estate, LockTupleMode lockmode, TupleTableSlot *oldslot, TupleTableSlot **newSlot); -static HeapTuple MaterializeTupleForTrigger(TupleTableSlot *slot, - bool *shouldFree); static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo, Trigger *trigger, TriggerEvent event, Bitmapset *modifiedCols, @@ -2674,7 +2672,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate, ExecCopySlot(newslot, epqslot_clean); } - trigtuple = MaterializeTupleForTrigger(oldslot, &should_free_trig); + trigtuple = ExecFetchSlotHeapTuple(oldslot, true, &should_free_trig); } else { @@ -3043,40 +3041,6 @@ GetTupleForTrigger(EState *estate, return true; } -/* - * Extract a HeapTuple that we can pass off to trigger functions. - * - * We must materialize the tuple and make sure it is not dependent on any - * attrmissing data. This is needed for the old row in BEFORE UPDATE - * triggers, since they can choose to pass back this exact tuple as the update - * result, causing the tuple to be inserted into an executor slot that lacks - * the attrmissing data. - * - * Currently we don't seem to need to remove the attrmissing dependency in any - * other cases, but keep this as a separate function to simplify fixing things - * if that changes. - */ -static HeapTuple -MaterializeTupleForTrigger(TupleTableSlot *slot, bool *shouldFree) -{ - HeapTuple tup; - TupleDesc tupdesc = slot->tts_tupleDescriptor; - - tup = ExecFetchSlotHeapTuple(slot, true, shouldFree); - if (HeapTupleHeaderGetNatts(tup->t_data) < tupdesc->natts && - tupdesc->constr && tupdesc->constr->missing) - { - HeapTuple newtup; - - newtup = heap_expand_tuple(tup, tupdesc); - if (*shouldFree) - heap_freetuple(tup); - *shouldFree = true; - tup = newtup; - } - return tup; -} - /* * Is trigger enabled to fire? */ diff --git a/src/backend/executor/execJunk.c b/src/backend/executor/execJunk.c index 40d700dd9e..1a822ff24b 100644 --- a/src/backend/executor/execJunk.c +++ b/src/backend/executor/execJunk.c @@ -54,23 +54,48 @@ * * The source targetlist is passed in. The output tuple descriptor is * built from the non-junk tlist entries. - * An optional resultSlot can be passed as well. + * An optional resultSlot can be passed as well; otherwise, we create one. */ JunkFilter * ExecInitJunkFilter(List *targetList, TupleTableSlot *slot) { - JunkFilter *junkfilter; TupleDesc cleanTupType; - int cleanLength; - AttrNumber *cleanMap; - ListCell *t; - AttrNumber cleanResno; /* * Compute the tuple descriptor for the cleaned tuple. */ cleanTupType = ExecCleanTypeFromTL(targetList); + /* + * The rest is the same as ExecInitJunkFilterInsertion, ie, we want to map + * every non-junk targetlist column into the output tuple. + */ + return ExecInitJunkFilterInsertion(targetList, cleanTupType, slot); +} + +/* + * ExecInitJunkFilterInsertion + * + * Initialize a JunkFilter for insertions into a table. + * + * Here, we are given the target "clean" tuple descriptor rather than + * inferring it from the targetlist. Although the target descriptor can + * contain deleted columns, that is not of concern here, since the targetlist + * should contain corresponding NULL constants (cf. ExecCheckPlanOutput). + * It is assumed that the caller has checked that the table's columns match up + * with the non-junk columns of the targetlist. + */ +JunkFilter * +ExecInitJunkFilterInsertion(List *targetList, + TupleDesc cleanTupType, + TupleTableSlot *slot) +{ + JunkFilter *junkfilter; + int cleanLength; + AttrNumber *cleanMap; + ListCell *t; + AttrNumber cleanResno; + /* * Use the given slot, or make a new slot if we weren't given one. */ @@ -93,17 +118,18 @@ ExecInitJunkFilter(List *targetList, TupleTableSlot *slot) if (cleanLength > 0) { cleanMap = (AttrNumber *) palloc(cleanLength * sizeof(AttrNumber)); - cleanResno = 1; + cleanResno = 0; foreach(t, targetList) { TargetEntry *tle = lfirst(t); if (!tle->resjunk) { - cleanMap[cleanResno - 1] = tle->resno; + cleanMap[cleanResno] = tle->resno; cleanResno++; } } + Assert(cleanResno == cleanLength); } else cleanMap = NULL; diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index a33423c896..29e07b7228 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2591,15 +2591,27 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) TupleTableSlot *junkresslot; subplan = mtstate->mt_plans[i]->plan; - if (operation == CMD_INSERT || operation == CMD_UPDATE) - ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc, - subplan->targetlist); junkresslot = ExecInitExtraTupleSlot(estate, NULL, table_slot_callbacks(resultRelInfo->ri_RelationDesc)); - j = ExecInitJunkFilter(subplan->targetlist, - junkresslot); + + /* + * For an INSERT or UPDATE, the result tuple must always match + * the target table's descriptor. For a DELETE, it won't + * (indeed, there's probably no non-junk output columns). + */ + if (operation == CMD_INSERT || operation == CMD_UPDATE) + { + ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc, + subplan->targetlist); + j = ExecInitJunkFilterInsertion(subplan->targetlist, + RelationGetDescr(resultRelInfo->ri_RelationDesc), + junkresslot); + } + else + j = ExecInitJunkFilter(subplan->targetlist, + junkresslot); if (operation == CMD_UPDATE || operation == CMD_DELETE) { diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index b7978cd22e..0c48d2a519 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -156,6 +156,9 @@ extern void ResetTupleHashTable(TupleHashTable hashtable); */ extern JunkFilter *ExecInitJunkFilter(List *targetList, TupleTableSlot *slot); +extern JunkFilter *ExecInitJunkFilterInsertion(List *targetList, + TupleDesc cleanTupType, + TupleTableSlot *slot); extern JunkFilter *ExecInitJunkFilterConversion(List *targetList, TupleDesc cleanTupType, TupleTableSlot *slot);
On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > I fixed the should_free business, and spent a fair amount of time > > convincing myself that no other code paths in trigger.c need this, > > and pushed it. Thanks for that. > No sooner had I pushed that than I thought of a potentially better > answer: why is it that the executor's slot hasn't got the right > descriptor, anyway? The reason is that ExecInitModifyTable is relying > on ExecInitJunkFilter, and thence ExecCleanTypeFromTL, to build that > descriptor. But we have the relation's *actual* descriptor right at > hand, and could use that instead. This saves a few cycles --- > ExecCleanTypeFromTL isn't enormously expensive, but it's not free > either. Agreed. > Moreover, it's more correct, even disregarding the problem > at hand, because the tlist isn't a perfectly accurate depiction of > the relation rowtype: ExecCleanTypeFromTL will not derive the correct > info for dropped columns. Hmm, I don't understand. Isn't it the planner's job to make the targetlist correctly account for dropped columns; what expand_targetlist() does? > We do have to refactor ExecInitJunkFilter a little to make this > possible, but it's not a big change. (I initially tried to use the > existing ExecInitJunkFilterConversion function, but that does the > wrong thing for attisdropped columns.) So, with the map generated by ExecInitJunkFilterConversion(), dropped attributes of the target composite type are effectively mapped to a NULL datum. That to me seems to be more or less the same thing as ExecInitJunkFilterInsertion() mapping dropped attributes of the target table to NULL datums in the source plan's targetlist. What am I missing? > Otherwise, this reverts the > prior patch's code changes in triggers.c, but keeps the test case. > > Thoughts? I'm inclined to leave the previous patch alone in the > back branches, because that has fewer potential side-effects, > but I like this better for HEAD. I do agree that this is a better fix for HEAD. -- Amit Langote EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Moreover, it's more correct, even disregarding the problem >> at hand, because the tlist isn't a perfectly accurate depiction of >> the relation rowtype: ExecCleanTypeFromTL will not derive the correct >> info for dropped columns. > Hmm, I don't understand. Isn't it the planner's job to make the > targetlist correctly account for dropped columns; what > expand_targetlist() does? Yes, there are columns in the tlist to match them, but ExecCleanTypeFromTL cannot mark those columns as "attisdropped". The column data type likely won't be right either. The latter shouldn't matter, if the column is being filled with a null ... but I'm a bit surprised that we've gotten away this long with not being honest about attisdropped. regards, tom lane
On Mon, Oct 26, 2020 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > On Mon, Oct 26, 2020 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Moreover, it's more correct, even disregarding the problem > >> at hand, because the tlist isn't a perfectly accurate depiction of > >> the relation rowtype: ExecCleanTypeFromTL will not derive the correct > >> info for dropped columns. > > > Hmm, I don't understand. Isn't it the planner's job to make the > > targetlist correctly account for dropped columns; what > > expand_targetlist() does? > > Yes, there are columns in the tlist to match them, but ExecCleanTypeFromTL > cannot mark those columns as "attisdropped". Ah, okay. > The column data type > likely won't be right either. The latter shouldn't matter, if the > column is being filled with a null ... but I'm a bit surprised that > we've gotten away this long with not being honest about attisdropped. Yeah, I guess most places downstream of ExecModifyTable() seem to rely on getting that information indirectly via the isnull flag of the tuple itself. -- Amit Langote EDB: http://www.enterprisedb.com