Обсуждение: Second RewriteQuery complains about first RewriteQuery in edge case

Поиск
Список
Период
Сортировка

Second RewriteQuery complains about first RewriteQuery in edge case

От
Bernice Southey
Дата:
Hi,
I get an odd error if a CTE inserts a GENERATED ALWAYS AS IDENTITY
column, and then tries to modify an automatically updatable view.

create table t(i int generated always as identity);
create table base(j int);
create view v as select * from base;

with cte as (insert into t default values returning i)
delete from v using cte where j = i;
ERROR: cannot insert a non-DEFAULT value into column "i" Column "i" is
an identity column defined as GENERATED ALWAYS.

After much digging and thinking, I discovered it's because
RewriteQuery is processed twice on the CTE, because of the updatable
view recursion. The first time, the target entry is added. The second
time, it errors because it's already added.
cte parse tree: ... {TARGETENTRY :expr {NEXTVALUEEXPR: seqid ...
In rewriteTargetListIU:
apply_default = ((new_tle == NULL && commandType == CMD_INSERT) || ...
if (att_tup->attidentity == ATTRIBUTE_IDENTITY_ALWAYS && !apply_default) ...
    ereport(ERROR, (errcode(ERRCODE_GENERATED_ALWAYS)

I wondered, can anything new be added by a re-rewrite of the same
cteList? Nothing in what I could follow of the subsequent RewriteQuery
code makes me believe so. This patch is premised on this belief, and
my above investigation. I may very well be mistaken, given my
inexperience and gaping knowledge gaps.

*  As a first-time patch submitter, and first-time much else here, I
fully expect to be flamed for stupidity. This is a crazy function for
me to touch.
* I deliberately didn't indent the if-block in the patch, because the
tab diffs add 140 superfluous interleaved lines of unreadable mess.
* I was uncertain where to add my new test, and if there's more tests
that should be added.
* Comments are the hardest part.

I've learned so much from attempting this, that it was worth it just for that.

Thanks, Bernice

Вложения

Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Kirill Reshke
Дата:
On Wed, 26 Nov 2025 at 13:34, Bernice Southey <bernice.southey@gmail.com> wrote:
>
> Hi,

Hi!

> I get an odd error if a CTE inserts a GENERATED ALWAYS AS IDENTITY
> column, and then tries to modify an automatically updatable view.
>
> create table t(i int generated always as identity);
> create table base(j int);
> create view v as select * from base;
>
> with cte as (insert into t default values returning i)
> delete from v using cte where j = i;
> ERROR: cannot insert a non-DEFAULT value into column "i" Column "i" is
> an identity column defined as GENERATED ALWAYS.
>
> After much digging and thinking, I discovered it's because
> RewriteQuery is processed twice on the CTE, because of the updatable
> view recursion. The first time, the target entry is added. The second
> time, it errors because it's already added.
> cte parse tree: ... {TARGETENTRY :expr {NEXTVALUEEXPR: seqid ...
> In rewriteTargetListIU:
> apply_default = ((new_tle == NULL && commandType == CMD_INSERT) || ...
> if (att_tup->attidentity == ATTRIBUTE_IDENTITY_ALWAYS && !apply_default) ...
>     ereport(ERROR, (errcode(ERRCODE_GENERATED_ALWAYS)


Nice catch!

> I wondered, can anything new be added by a re-rewrite of the same
> cteList? Nothing in what I could follow of the subsequent RewriteQuery
> code makes me believe so. This patch is premised on this belief, and
> my above investigation. I may very well be mistaken, given my
> inexperience and gaping knowledge gaps.

I think we have a bug worth fixing here. However, I am not sure about
the approach. For me, a safer option would be
to disallow queries which try to rewrite something for a second time...

> *  As a first-time patch submitter, and first-time much else here, I
> fully expect to be flamed for stupidity. This is a crazy function for
> me to touch.
> * I deliberately didn't indent the if-block in the patch, because the
> tab diffs add 140 superfluous interleaved lines of unreadable mess.


IMHO we can make a refactoring patch here and move this big chunk of
logic to separate functions... But this is a side-quest.

> * I was uncertain where to add my new test, and if there's more tests
> that should be added.

Added test are good, but two things:
1) Why with.sql and not generated.sql ?
2) Why do you create table and view as temp relations? The bug have
nothing to do with the temporality of objects?


> I've learned so much from attempting this, that it was worth it just for that.

Sure

-- 
Best regards,
Kirill Reshke



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Wed, 26 Nov 2025 at 12:13, Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Wed, 26 Nov 2025 at 13:34, Bernice Southey <bernice.southey@gmail.com> wrote:
>
> > I get an odd error if a CTE inserts a GENERATED ALWAYS AS IDENTITY
> > column, and then tries to modify an automatically updatable view.
> >
> > create table t(i int generated always as identity);
> > create table base(j int);
> > create view v as select * from base;
> >
> > with cte as (insert into t default values returning i)
> > delete from v using cte where j = i;
> > ERROR: cannot insert a non-DEFAULT value into column "i" Column "i" is
> > an identity column defined as GENERATED ALWAYS.
>
> Nice catch!

Indeed.

> I think we have a bug worth fixing here. However, I am not sure about
> the approach. For me, a safer option would be
> to disallow queries which try to rewrite something for a second time...

No, I disagree, there's no reason a query like this should be
disallowed. The fact that the rewriter is attempting to rewrite parts
of the query multiple times is the bug, albeit one that was probably
harmless before the introduction of generated columns.

The majority of RewriteQuery() is safe if it's called a second time on
the same query, because it fully expands all non-SELECT rules and
auto-updatable target views, so the second time round, it would do
nothing of that kind. However, evidently it's not safe to call
rewriteTargetListIU() twice. Of course, we could attempt to make
rewriteTargetListIU() idempotent, but that might be more effort, and
in any case, processing parts of the query multiple times is a waste
of effort even if it doesn't fail, so I think the approach taken by
the patch is the right one.

Regards,
Dean



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> The majority of RewriteQuery() is safe if it's called a second time on
> the same query, because it fully expands all non-SELECT rules and
> auto-updatable target views, so the second time round, it would do
> nothing of that kind. However, evidently it's not safe to call
> rewriteTargetListIU() twice. Of course, we could attempt to make
> rewriteTargetListIU() idempotent, but that might be more effort, and
> in any case, processing parts of the query multiple times is a waste
> of effort even if it doesn't fail, so I think the approach taken by
> the patch is the right one.

Not sure.  As you say, RewriteQuery() was idempotent up until the
addition of generated columns.  Who knows what other places are
relying on that property, that haven't yet been tested against
cases including generated columns?  I think there's a good case for
trying to fix it by restoring that idempotency.

Of course, that might prove impractical.  Bernice's fix does have
the attraction of being short and sweet.

On the third hand ... if we need to rewrite the query twice in
the first place, is it really true that "Data-modifying WITH clauses
need only be rewritten once per query"?  How do we know that whatever
prompted the repeat rewrite didn't change the WITH clauses?

I've not dug into the code yet, just reacting to what I see in this
thread.

            regards, tom lane



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Bernice Southey
Дата:
Kirill Reshke <reshkekirill@gmail.com> wrote:
> Added test are good, but two things:
> 1) Why with.sql and not generated.sql ?
This bug is "with" in combination with "generated identity" and
"updatable view". The current fix targets "with", so that made me pick
"with". It should move to "generated_stored" if the fix is idempotent
rewriteTargetListIU.

> 2) Why do you create table and view as temp relations? The bug have
> nothing to do with the temporality of objects?
I copied the other tests in the "with" file, assuming they needed to
be temp. I agree it should be removed.

Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Of course, we could attempt to make
> rewriteTargetListIU() idempotent, but that might be more effort
This was my first thought. I still haven't figured my way through how
all the checks work.
This bug doesn't happen on a generated always column that's not
identity, because that follows a different path.

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> How do we know that whatever
> prompted the repeat rewrite didn't change the WITH clauses?
I went through the history and it seemed to me the repeat rewrite was
accidental because of the two ways this method can recurse. Doesn't
mean it's not doing anything.

Thanks, Bernice



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Bernice Southey
Дата:
Bernice Southey <bernice.southey@gmail.com> wrote:
> I went through the history and it seemed to me the repeat rewrite was
> accidental because of the two ways this method can recurse.
I mean the repeat rewrite of the cteList was accidental, not the
repeat rewrite of views. I couldn't think why a view would mean a
cteList needed re-rewriting, but I know very little about view rules.



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Wed, 26 Nov 2025 at 20:29, Bernice Southey <bernice.southey@gmail.com> wrote:
>
> Bernice Southey <bernice.southey@gmail.com> wrote:
> > I went through the history and it seemed to me the repeat rewrite was
> > accidental because of the two ways this method can recurse.
> I mean the repeat rewrite of the cteList was accidental, not the
> repeat rewrite of views. I couldn't think why a view would mean a
> cteList needed re-rewriting, but I know very little about view rules.

I agree that the repeat rewrites look accidental. However, I've been
thinking about this some more, and I realised that the patch as
written isn't correct.

I think it is true that RewriteQuery() doesn't need to rewrite
individual CTEs multiple times. However, that doesn't mean that it
doesn't need to process the cteList at all below the top-level. The
problem is that rule actions may add additional CTEs to the list,
which do need to be processed when recursing.

Here's a simple, dumb example:

---
drop table if exists t1, t1a, t1b, t2, t2a;

create table t1 (a int);
create table t1a (a int);
create table t1b (a int);
create rule t1r as on insert to t1 do instead
  with x as (insert into t1a values (1) returning *)
  insert into t1b select a from x returning *;

create table t2 (a int);
create table t2a (a int);
create rule t2r as on insert to t2 do instead
  with xx as (insert into t1 values (1) returning *)
  insert into t2a select a from xx returning *;

insert into t2 values (1);
---

Both rules should be triggered, resulting in 1 row in each of t1a,
t1b, and t2a, which is what happens in HEAD, but with the patch, it
fails to fire rule t1r when it recurses, so you get a row in t1
instead.

This could probably be fixed up fairly easily, by tracking how many of
the CTEs in the list have been processed, and only processing the new
ones, when recursing.

On the other hand, perhaps making rewriteTargetListIU() idempotent
would be a more robust solution. I've not looked in detail at what
would be needed for that though.

Regards,
Dean



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I think it is true that RewriteQuery() doesn't need to rewrite
> individual CTEs multiple times. However, that doesn't mean that it
> doesn't need to process the cteList at all below the top-level. The
> problem is that rule actions may add additional CTEs to the list,
> which do need to be processed when recursing.

Yeah, I had come to the same conclusion but not yet built an example.

I think the core of the matter may lie in the stanza headed

     * If the original query has any CTEs, copy them into the rule action. But
     * we don't need them for a utility action.

at lines 565ff in rewriteHandler.c.  What we are doing there, I think,
is to combine already-rewritten CTEs from the original query with
not-yet-rewritten queries from the rule action.  This can't work
unless the subsequent rewriting of the merged list is idempotent.

I was toying with the idea of decoupling rewriting of WITHs from
the main recursion.  This would look roughly like

1. Pull out RewriteQuery's first loop into a new function, say
RewriteQueryCTEs.  Call this from QueryRewrite just before calling
RewriteQuery.

2. Also apply it to a rule action's CTE list in fireRules, before
we merge the original query's CTE list into that list.

Not sure if there are any other call sites needed (rewriteTargetView
might need its own call), or if this would even fix the problem.

We might need some more-explicit labeling of whether rewrite
has happened.  That would mean querytree changes, making the
fix not-back-patchable.  But given the lack of complaints
to date, maybe a HEAD-only fix is acceptable.

There's an awful lot of unfinished work about CTEs in this module,
eg the two different random implementation restrictions in the
same stanza that's combining the CTE lists, or the one at the
bottom of RewriteQuery.  I wonder if we have the ambition to
actually do anything about all that.  People tend to see the
rules system as a deprecated backwater, so maybe not.

            regards, tom lane



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Wed, 26 Nov 2025 at 21:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I was toying with the idea of decoupling rewriting of WITHs from
> the main recursion.  This would look roughly like
>
> 1. Pull out RewriteQuery's first loop into a new function, say
> RewriteQueryCTEs.  Call this from QueryRewrite just before calling
> RewriteQuery.
>
> 2. Also apply it to a rule action's CTE list in fireRules, before
> we merge the original query's CTE list into that list.

Yes, I think that would work, but I think a simpler solution would be
to just have RewriteQuery() track which CTEs it had already rewritten,
which can be done just by passing around a list of CTE names.
Something like the attached.

I think this makes it more obvious how CTEs only get processed once,
and has the advantage of being a smaller, more localised change.

> given the lack of complaints
> to date, maybe a HEAD-only fix is acceptable.

I was thinking that we should try to create a back-patchable fix for
this. If it had been some complex query involving rules, then perhaps
a HEAD-only fix would be OK, but the original reproducer is a pretty
simple combination of more commonly-used features.

> There's an awful lot of unfinished work about CTEs in this module,
> eg the two different random implementation restrictions in the
> same stanza that's combining the CTE lists, or the one at the
> bottom of RewriteQuery.  I wonder if we have the ambition to
> actually do anything about all that.  People tend to see the
> rules system as a deprecated backwater, so maybe not.

Ugh, yeah, those limitations are not great. Something else I noticed
while trying to produce the test case with rules was that you can't
refer to the OLD or NEW pseudo-relations from within a CTE query in a
rule action. That pretty-much makes data-modifying CTEs in rule
actions useless, I think. But I tend to think of rules as a feature
that should be avoided at all costs in any real applications. In my
mind, that just makes them a maintenance burden, and I wouldn't want
to expend any effort trying to improve them, except if that made them
easier to maintain. That said, I know of real applications that do use
them (apparently without problems), so I can imagine we'd get pushback
if we tried to remove support for them.

Regards,
Dean

Вложения

Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Bernice Southey
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Yes, I think that would work, but I think a simpler solution would be
> to just have RewriteQuery() track which CTEs it had already rewritten,
> which can be done just by passing around a list of CTE names.
> Something like the attached.
I was playing around with an idea I had based on my first attempt to
fix this, when I got your email.

FWIW, I think I found a way to fix my bug that doesn't break your
rules example. I added a bool to RewriteQuery, and used this to stop
reprocessing updatable view CTEs. This leaves the rules recursion path
unchanged. Which means rule CTEs might still be processed repeatedly.
But as they probably can't be data-modifiable, I think
rewriteTargetListIU is effectively idempotent? The advantage of your
approach is it's consistent for all views, and it doesn't rely on
improbable data-modifiability for idempotency. Plus I'm still very
uncertain if there's more traps in rules I'm unaware of.

I also took another look at rewriteTargetListIU. From what I can make
out, stored generated columns rely on SetToDefault for idempotency.
But identity can't use this, so this is why a change to querytree
could be needed?

Thanks, Bernice

Вложения

Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Yes, I think that would work, but I think a simpler solution would be
> to just have RewriteQuery() track which CTEs it had already rewritten,
> which can be done just by passing around a list of CTE names.
> Something like the attached.

Hmmm ... I don't love this particular implementation, because it
is doubling down on the already-undesirable assumption that the
rule CTEs have no name conflicts with the outer query's CTEs.
Still, unless somebody sets out to remove that restriction,
it won't matter.  (It'd be a good idea for the comments here
to point that out though.)

I do think there's another way we could attack it.  Similarly
to the way VALUES RTEs are either processed or skipped by
checking the rangetable length, we could pass down the length
of the outer query's cteList, and assume that the last N entries
in a product query's cteList have already been processed.
(Last N not first N because of the order in which the lists are
concatenated at line 596.)  Maybe that's too fragile, but the
approach seems to have worked all right for VALUES.

            regards, tom lane



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Thu, 27 Nov 2025 at 14:31, Bernice Southey <bernice.southey@gmail.com> wrote:
>
> FWIW, I think I found a way to fix my bug that doesn't break your
> rules example. I added a bool to RewriteQuery, and used this to stop
> reprocessing updatable view CTEs. This leaves the rules recursion path
> unchanged. Which means rule CTEs might still be processed repeatedly.

Unfortunately, that means that it suffers from a bug similar to the
original one, but with an INSTEAD OF rule instead of an updatable
view:

---
drop table if exists t1, t2, t3;

create table t1 (a int generated always as identity);
create table t2 (a int);
create table t3 (a int);
create rule t3r as on insert to t3 do instead
  insert into t2 values (new.a) returning *;

with x as (insert into t1 default values returning *)
insert into t3 select * from x;
---

That fails in HEAD and with your patch with the same error as before,
but it works correctly with the v2 patch.

Regards,
Dean



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Thu, 27 Nov 2025 at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Hmmm ... I don't love this particular implementation, because it
> is doubling down on the already-undesirable assumption that the
> rule CTEs have no name conflicts with the outer query's CTEs.
> Still, unless somebody sets out to remove that restriction,
> it won't matter.  (It'd be a good idea for the comments here
> to point that out though.)
>
> I do think there's another way we could attack it.  Similarly
> to the way VALUES RTEs are either processed or skipped by
> checking the rangetable length, we could pass down the length
> of the outer query's cteList, and assume that the last N entries
> in a product query's cteList have already been processed.
> (Last N not first N because of the order in which the lists are
> concatenated at line 596.)  Maybe that's too fragile, but the
> approach seems to have worked all right for VALUES.
>

Yes, that was my original thinking, but I wasn't keen to add more code
that depended on the order in which some other function added things
to a list. It kind-of had to be that way for VALUES RTEs because they
don't have any other identifiers, but I thought that since CTEs do, it
might as well use them.

On the other hand, passing a single integer ought to be simpler than
passing a list round and iterating through it, so perhaps that way is
worth investigating.

Regards,
Dean



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Thu, 27 Nov 2025 at 17:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 27 Nov 2025 at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > I do think there's another way we could attack it.  Similarly
> > to the way VALUES RTEs are either processed or skipped by
> > checking the rangetable length, we could pass down the length
> > of the outer query's cteList, and assume that the last N entries
> > in a product query's cteList have already been processed.
> > (Last N not first N because of the order in which the lists are
> > concatenated at line 596.)  Maybe that's too fragile, but the
> > approach seems to have worked all right for VALUES.
>

Here's an update, doing it that way. It does appear somewhat neater.

Regards,
Dean

Вложения

Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Here's an update, doing it that way. It does appear somewhat neater.

Yeah, this looks pretty good.  Two nitpicky suggestions:

* Perhaps using foreach_current_index() would be better than
adding the separate loop variable "i" in RewriteQuery.

* I think it would be wise to add a comment about this in
rewriteRuleAction too, perhaps along the lines of

-       /* OK, it's safe to combine the CTE lists */
+       /*
+        * OK, it's safe to combine the CTE lists.  Beware that RewriteQuery
+        * knows we concatenate the lists in this order.
+        */
        sub_action->cteList = list_concat(sub_action->cteList,
                                          copyObject(parsetree->cteList));

            regards, tom lane



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Bernice Southey
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Here's an update, doing it that way. It does appear somewhat neater.

This is neat.
Should the test be the rules one? Updatable views are more used, but
your rules example has more coverage.

Thanks, Bernice



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Thu, 27 Nov 2025 at 21:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Yeah, this looks pretty good.  Two nitpicky suggestions:
>
> * Perhaps using foreach_current_index() would be better than
> adding the separate loop variable "i" in RewriteQuery.

Ah, good point. I'd forgotten about that macro.

> * I think it would be wise to add a comment about this in
> rewriteRuleAction too, perhaps along the lines of
>
> -       /* OK, it's safe to combine the CTE lists */
> +       /*
> +        * OK, it's safe to combine the CTE lists.  Beware that RewriteQuery
> +        * knows we concatenate the lists in this order.
> +        */
>         sub_action->cteList = list_concat(sub_action->cteList,
>                                           copyObject(parsetree->cteList));

Agreed. Done that way.

Thanks for reviewing.

Regards,
Dean



Re: Second RewriteQuery complains about first RewriteQuery in edge case

От
Dean Rasheed
Дата:
On Fri, 28 Nov 2025 at 09:21, Bernice Southey <bernice.southey@gmail.com> wrote:
>
> This is neat.
> Should the test be the rules one? Updatable views are more used, but
> your rules example has more coverage.

Yes, I think that's probably better. I've tweaked the test to use both
rules and updatable views, so that it has to deal with more than one
CTE. That way, it validates that we skip already-processed CTEs from
the right end of the list.

Pushed and back-patched.

Regards,
Dean