Обсуждение: BUG #17777: An assert failed in nodeWindowAgg.c

Поиск
Список
Период
Сортировка

BUG #17777: An assert failed in nodeWindowAgg.c

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17777
Logged by:          Anban Company
Email address:      xinwen@stu.scu.edu.cn
PostgreSQL version: 15.1
Operating system:   Ubuntu 20.04
Description:

When executing the following query,assert failed may be triggered,  which
may be related to RANDOM():


CREATE TABLE table0 ( column0 INT ) ;
INSERT INTO table0 VALUES ( 1 ) , ( 2 ) , ( 3 ) , ( 4 ) , ( 5 ) , ( 6 ) , (
7 ) , ( 8 ) , ( 9 ) , ( 10 ) ;
SELECT WIDTH_BUCKET ( SUM ( 1 ) FILTER ( WHERE ( SELECT LAG ( TRUE , CAST (
RANDOM ( ) AS INT ) , column0 <= 1 ) OVER ( ) ) ) OVER ( PARTITION BY 1
ORDER BY column0 RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING ) , 1 , 2 , 3 )
FROM table0 ;


I get a failed assertion with the following stacktrace:

Core was generated by `postgres: postgres postgres [local] SELECT
 '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fab725fa859 in __GI_abort () at abort.c:79
#2  0x0000564bcbd68a88 in ExceptionalCondition
(conditionName=conditionName@entry=0x564bcbec5788
"peraggstate->transValueCount > 0", errorType=errorType@entry=0x564bcbdc64a0
"FailedAssertion", fileName=fileName@entry=0x564bcbec56b8
"/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/nodeWindowAgg.c",
lineNumber=lineNumber@entry=475) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/utils/error/assert.c:69
#3  0x0000564bcbae93ad in advance_windowaggregate_base
(winstate=0x564bcc969c20, perfuncstate=0x564bcc9a5610,
peraggstate=0x564bcc9a56a8) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/nodeWindowAgg.c:475
#4  0x0000564bcbaec69c in eval_windowaggregates (winstate=0x564bcc969c20) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/nodeWindowAgg.c:833
#5  ExecWindowAgg (pstate=0x564bcc969c20) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/nodeWindowAgg.c:2240
#6  0x0000564bcbaacc93 in ExecProcNode (node=0x564bcc969c20) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/include/executor/executor.h:259
#7  ExecutePlan (execute_once=<optimized out>, dest=0x564bcc989240,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x564bcc969c20, estate=0x564bcc97ce10) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/execMain.c:1636
#8  standard_ExecutorRun (queryDesc=0x564bcc8c35c0, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/executor/execMain.c:363
#9  0x0000564bcbc3fe5f in PortalRunSelect (portal=0x564bcc90f560,
forward=<optimized out>, count=0, dest=<optimized out>) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/tcop/pquery.c:924
#10 0x0000564bcbc41431 in PortalRun (portal=portal@entry=0x564bcc90f560,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x564bcc989240,
altdest=altdest@entry=0x564bcc989240, qc=0x7ffebb433880) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/tcop/pquery.c:768
#11 0x0000564bcbc3d202 in exec_simple_query ( query_string=0x564bcc8a2010
"SELECT WIDTH_BUCKET ( SUM ( 1 ) FILTER ( WHERE ( SELECT LAG ( TRUE , CAST (
RANDOM ( ) AS INT ) , column0 <= 1 ) OVER ( ) ) ) OVER ( PARTITION BY 1
ORDER BY column0 RANGE BETWEEN CURRENT ROW AND 1 FOL"...) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/tcop/postgres.c:1250
#12 0x0000564bcbc3ef8c in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/tcop/postgres.c:4581
#13 0x0000564bcbbabe8a in BackendRun (port=<optimized out>, port=<optimized
out>) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/postmaster/postmaster.c:4504
#14 BackendStartup (port=<optimized out>) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/postmaster/postmaster.c:4232
#15 ServerLoop () at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/postmaster/postmaster.c:1806
#16 0x0000564bcbbacffb in PostmasterMain (argc=<optimized out>,
argv=0x564bcc89c310) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/postmaster/postmaster.c:1478
#17 0x0000564bcb8d7630 in main (argc=3, argv=0x564bcc89c310) at
/home/postgres/postgresql-15.1/original_bin-15.1/../src/backend/main/main.c:202


I also find this assert failed in 14.6, 13.9, 12.13 and 11.18 using the same
statement.


Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-06 14:48:44 +0000, PG Bug reporting form wrote:
> When executing the following query,assert failed may be triggered,  which
> may be related to RANDOM():
>
>
> CREATE TABLE table0 ( column0 INT ) ;
> INSERT INTO table0 VALUES ( 1 ) , ( 2 ) , ( 3 ) , ( 4 ) , ( 5 ) , ( 6 ) , (
> 7 ) , ( 8 ) , ( 9 ) , ( 10 ) ;
> SELECT WIDTH_BUCKET ( SUM ( 1 ) FILTER ( WHERE ( SELECT LAG ( TRUE , CAST (
> RANDOM ( ) AS INT ) , column0 <= 1 ) OVER ( ) ) ) OVER ( PARTITION BY 1
> ORDER BY column0 RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING ) , 1 , 2 , 3 )
> FROM table0 ;

Yes, you're right that this is caused by the random(), partially.

A simplified trigger of the problem is:

SELECT
  SUM (1)
    -- triggers the problem, volatile function in subplan, not removed due to correlated subquery
    FILTER (WHERE ( SELECT COALESCE(random() < 0.5, i = 0)))
    -- doesn't trigger the problem
    --FILTER (WHERE random() < 0.5 )
    OVER ( ORDER BY i RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM generate_series(0, 1000) g1(i);

If you replace 1 FOLLOWING with UNBOUNDED FOLLOWING, this crashes on 9.4-10
too, just a bit less reliably.


The problem is that normally we supress the moving aggregate optimization if a
volatile function is contained in the filter. But unfortunately,
contain_volatile_functions() doesn't descend into subplans. So we don't see
the volatile expression.

This is documented:

 * We will recursively look into Query nodes (i.e., SubLink sub-selects)
 * but not into SubPlans.  This is a bit odd, but intentional.  If we are
 * looking at a SubLink, we are probably deciding whether a query tree
 * transformation is safe, and a contained sub-select should affect that;
 * for example, duplicating a sub-select containing a volatile function
 * would be bad.  However, once we've got to the stage of having SubPlans,
 * subsequent planning need not consider volatility within those, since
 * the executor won't change its evaluation rules for a SubPlan based on
 * volatility.


Which seems, uh, a big assumption for something as generally named as
contain_volatile_functions().  It makes sense for pushdown / pullup
transformation, but in many, if not most, other cases, it seems pretty
broken. Why is that safe for equivclass.c, indxpath.c, partprune.c, etc?


Completely separately, but IMO the decision whether to use the moving
aggregate optimization ought to be a plan time decision, rather than an
execution time one...

Greetings,

Andres Freund



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

Sorry for sending this again, accidentally used an old email of David's.

On 2023-02-06 14:48:44 +0000, PG Bug reporting form wrote:
> When executing the following query,assert failed may be triggered,  which
> may be related to RANDOM():
>
>
> CREATE TABLE table0 ( column0 INT ) ;
> INSERT INTO table0 VALUES ( 1 ) , ( 2 ) , ( 3 ) , ( 4 ) , ( 5 ) , ( 6 ) , (
> 7 ) , ( 8 ) , ( 9 ) , ( 10 ) ;
> SELECT WIDTH_BUCKET ( SUM ( 1 ) FILTER ( WHERE ( SELECT LAG ( TRUE , CAST (
> RANDOM ( ) AS INT ) , column0 <= 1 ) OVER ( ) ) ) OVER ( PARTITION BY 1
> ORDER BY column0 RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING ) , 1 , 2 , 3 )
> FROM table0 ;

Yes, you're right that this is caused by the random(), partially.

A simplified trigger of the problem is:

SELECT
  SUM (1)
    -- triggers the problem, volatile function in subplan, not removed due to correlated subquery
    FILTER (WHERE ( SELECT COALESCE(random() < 0.5, i = 0)))
    -- doesn't trigger the problem
    --FILTER (WHERE random() < 0.5 )
    OVER ( ORDER BY i RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM generate_series(0, 1000) g1(i);

If you replace 1 FOLLOWING with UNBOUNDED FOLLOWING, this crashes on 9.4-10
too, just a bit less reliably.


The problem is that normally we supress the moving aggregate optimization if a
volatile function is contained in the filter. But unfortunately,
contain_volatile_functions() doesn't descend into subplans. So we don't see
the volatile expression.

This is documented:

 * We will recursively look into Query nodes (i.e., SubLink sub-selects)
 * but not into SubPlans.  This is a bit odd, but intentional.  If we are
 * looking at a SubLink, we are probably deciding whether a query tree
 * transformation is safe, and a contained sub-select should affect that;
 * for example, duplicating a sub-select containing a volatile function
 * would be bad.  However, once we've got to the stage of having SubPlans,
 * subsequent planning need not consider volatility within those, since
 * the executor won't change its evaluation rules for a SubPlan based on
 * volatility.


Which seems, uh, a big assumption for something as generally named as
contain_volatile_functions().  It makes sense for pushdown / pullup
transformation, but in many, if not most, other cases, it seems pretty
broken. Why is that safe for equivclass.c, indxpath.c, partprune.c, etc?


Completely separately, but IMO the decision whether to use the moving
aggregate optimization ought to be a plan time decision, rather than an
execution time one...

Greetings,

Andres Freund





Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> The problem is that normally we supress the moving aggregate optimization if a
> volatile function is contained in the filter. But unfortunately,
> contain_volatile_functions() doesn't descend into subplans. So we don't see
> the volatile expression.

I would say that if a volatile function in the argument crashes things,
that's an executor bug.  You won't get any sympathy from me for
complaints about whether contain_volatile_functions noticed that,
because *immutability markings can be lies*.  It is not acceptable
to crash if they're wrong.

It looks to me like maybe we could just remove the Assert and do

-    if (peraggstate->transValueCount == 1)
+    if (peraggstate->transValueCount < 2)

a few lines further down?  I've not dug into the details though.

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-10 18:12:32 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > The problem is that normally we supress the moving aggregate optimization if a
> > volatile function is contained in the filter. But unfortunately,
> > contain_volatile_functions() doesn't descend into subplans. So we don't see
> > the volatile expression.
>
> I would say that if a volatile function in the argument crashes things,
> that's an executor bug.  You won't get any sympathy from me for
> complaints about whether contain_volatile_functions noticed that,
> because *immutability markings can be lies*.  It is not acceptable
> to crash if they're wrong.

This leads to wrong query results, with correctly labeled functions. I don't
have a problem with bogus results if they're not correctly labeled, but they
should be correct, if correctly labeled.

Clearly we can't crash in production builds, due to a mislabeled function. I'm
a bit more on the fence about whether assertion failures are ok.



> It looks to me like maybe we could just remove the Assert and do
>
> -    if (peraggstate->transValueCount == 1)
> +    if (peraggstate->transValueCount < 2)
>
> a few lines further down?  I've not dug into the details though.

I'd make it an ERROR, similar to this case:
    if (peraggstate->transValueIsNull)
        elog(ERROR, "aggregate transition value is NULL before inverse transition");

afaics once we got to this point, the query results are always bogus. It's not
a reliable detection of mislabeled volatility, but still seems vastly better
than knowingly returning wrong results.

Greetings,

Andres Freund



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-02-10 18:12:32 -0500, Tom Lane wrote:
>> It looks to me like maybe we could just remove the Assert and do
>>
>> -    if (peraggstate->transValueCount == 1)
>> +    if (peraggstate->transValueCount < 2)
>>
>> a few lines further down?  I've not dug into the details though.

> I'd make it an ERROR, similar to this case:
>     if (peraggstate->transValueIsNull)
>         elog(ERROR, "aggregate transition value is NULL before inverse transition");
> afaics once we got to this point, the query results are always bogus. It's not
> a reliable detection of mislabeled volatility, but still seems vastly better
> than knowingly returning wrong results.

Yeah, after looking a little closer, the problem is that we may get
inconsistent results about whether a row passes the filter between
when we add the row to the trans state and when we remove from it.
If transValueCount would go negative then we know that that happened,
but it might happen without us being able to detect it.

We could force a restart, but that would only fix the results going
forward, not fix any bad values we emitted before detecting the
problem (and there almost certainly were some).

I'm content with replacing the Assert with something like

    /*
     * There should still be an added but not yet removed value; if there
     * isn't, we presumably got inconsistent results from the aggfilter
     * expression, probably due to an allegedly-immutable expression
     * delivering changing results.
     */
    if (peraggstate->transValueCount <= 0)
        elog(ERROR, "aggregate inverse transition failed, probably due to volatile FILTER expression");

I might be more excited about it if there were a visible use-case
for volatile filter expressions, but I can't see one.  The presented
test case is obviously just a fuzzer, not a useful query.

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-10 18:57:10 -0500, Tom Lane wrote:
> I might be more excited about it if there were a visible use-case
> for volatile filter expressions, but I can't see one.  The presented
> test case is obviously just a fuzzer, not a useful query.

I don't care about the performance of such a query, but it doesn't seem great
that the defense that we do have, doesn't work.  It's not like we don't have a
fallback execution path, it's just that we don't know that we have to use it.


Do you think all other uses of contain_volatile_functions() (and it looks like
contain_mutable_functions()) are fine with not detecting volatility in
subplans?


I don't think it's common, but I don't think it's crazy to have a volatile
function somewhere within an aggregate.  pg_current_wal_lsn(),
clock_timestamp(), pg_relation_size() or such.


I think we could just add a !contain_subplans() to the code deciding whether
it's safe to use the movable window optimization?

Greetings,

Andres Freund



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I think we could just add a !contain_subplans() to the code deciding whether
> it's safe to use the movable window optimization?

Yeah, perhaps.  That doesn't seem like a mainstream use-case either.

Another idea, tying into your previous point, is to try to check
contain_volatile_functions in the planner before we've reduced
sublinks to subplans.  I'm not sure that would be convenient to do
though; subplan-conversion happens pretty early.

(I'm quite hesitant to move the goalposts on what
contain_volatile_functions detects.  As that comment indicates,
some thought has gone into its current behavior, and I think
we might hit some unwanted side-effects if we change it.)

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-10 20:08:09 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think we could just add a !contain_subplans() to the code deciding whether
> > it's safe to use the movable window optimization?
>
> Yeah, perhaps.  That doesn't seem like a mainstream use-case either.

I suspect we ought to backpatch a fix and compared to some other ideas, that
seems not terribly invasive.


> Another idea, tying into your previous point, is to try to check
> contain_volatile_functions in the planner before we've reduced
> sublinks to subplans.  I'm not sure that would be convenient to do
> though; subplan-conversion happens pretty early.

Yea, that doesn't seem too promising.


What I was referencing is that we already moved most aggregate processing to
the planner, c.f. preprocess_aggref(), but we didn't do the same for window
functions. Even though there's a lot of similar code there.

To fix the bug, we could just do a minimal version of that, I think, and add a
new field to WindowFunc, that we populate somewhere around
optimize_window_clauses().


Perhaps we ought to add something similar to parallel_safe to SubPlan?


> (I'm quite hesitant to move the goalposts on what
> contain_volatile_functions detects.  As that comment indicates,
> some thought has gone into its current behavior, and I think
> we might hit some unwanted side-effects if we change it.)

Agreed. We could have a second "interface" function, using the same caller
though. But afaics, without adding information to the SubPlan nodes, we can't
really do better anyway?


Greetings,

Andres Freund



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
David Rowley
Дата:
On Sat, 11 Feb 2023 at 13:49, Andres Freund <andres@anarazel.de> wrote:
> I think we could just add a !contain_subplans() to the code deciding whether
> it's safe to use the movable window optimization?

I think this is a fair way to fix the bug.  I think we're pretty
unlikely to disappoint too many people by just disabling the inverse
transitions when the filter has a subplan. I think it'll be rare that
a WindowFunc would even have a filter, let alone one with a subplan.

I think there are other reasons that a subplan might not return the
same thing when it's executed again.  Maybe a synchronous seq scan
looking for some tuple with a subquery containing a LIMIT 1. If the
sync scan started in a different place each time then some other tuple
could be returned.  So I think checking if the filter contains
subplans is a good fix.

Here's a patch for that.

David

Вложения

Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 11 Feb 2023 at 13:49, Andres Freund <andres@anarazel.de> wrote:
>> I think we could just add a !contain_subplans() to the code deciding whether
>> it's safe to use the movable window optimization?

> I think this is a fair way to fix the bug.

Agreed.

> Here's a patch for that.

Why is it okay to check only the filter, and not the rest of the
WindowFunc's subexpressions?  The arguments we've just run through
seem to apply to a subplan in the direct or aggregated arguments
as well.

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 05:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Why is it okay to check only the filter, and not the rest of the
> WindowFunc's subexpressions?  The arguments we've just run through
> seem to apply to a subplan in the direct or aggregated arguments
> as well.

Good point.  I had just been thinking in terms of the reported bug to
make sure we inverse transition the same rows we transition. We also
need to make sure the transition value matches in both transition
directions.

I've adjusted the patch accordingly.

David

Вложения

Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-13 13:31:54 +1300, David Rowley wrote:
> On Mon, 13 Feb 2023 at 05:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Why is it okay to check only the filter, and not the rest of the
> > WindowFunc's subexpressions?  The arguments we've just run through
> > seem to apply to a subplan in the direct or aggregated arguments
> > as well.
> 
> Good point.  I had just been thinking in terms of the reported bug to
> make sure we inverse transition the same rows we transition. We also
> need to make sure the transition value matches in both transition
> directions.

What about find_compatible_agg()? I don't think there's as severe
consequences, but it also doesn't seem right as-is.

Greetings,

Andres Freund



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 13 Feb 2023 at 05:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Why is it okay to check only the filter, and not the rest of the
>> WindowFunc's subexpressions?  The arguments we've just run through
>> seem to apply to a subplan in the direct or aggregated arguments
>> as well.

> Good point.  I had just been thinking in terms of the reported bug to
> make sure we inverse transition the same rows we transition. We also
> need to make sure the transition value matches in both transition
> directions.
> I've adjusted the patch accordingly.

Code looks good now, but the comment still claims this is only
important in the FILTER clause.  I'd rewrite the whole thing
perhaps:

    * We also don't risk using moving aggregates when there are subplans
    * in the arguments or FILTER clause.  This is partly because
    * contain_volatile_functions() doesn't look inside subplans; but
    * there are other reasons why a subplan's output might be volatile.
    * For example, syncscan mode can render the results nonrepeatable.

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 13:46, Andres Freund <andres@anarazel.de> wrote:
> What about find_compatible_agg()? I don't think there's as severe
> consequences, but it also doesn't seem right as-is.

This does not seem related to the issue being reported here. It seems
to me this would only affect the aggregate deduplication logic and
nothing else.

I agree it would be nice if we were more consistent with when volatile
functions are evaluated and when they are not. There are also a bunch
of weird issues with SRFs as I highlighted in [1].

The following shows another case where we change the volatile function
evaluation. This one we do on purpose, which might or might not be
good, depending on if you wanted the random() in the targetlist for
the same purpose as the random() in the ORDER BY.

# select random(), random(), random();
       random        |       random        |       random
---------------------+---------------------+--------------------
 0.15295030404849452 | 0.34841025373624745 | 0.7195551260981239

# select random(), random(), random() order by random();
       random       |       random       |       random
--------------------+--------------------+--------------------
 0.8569677358482639 | 0.8569677358482639 | 0.8569677358482639

It was just last week I wrote a query to fetch a random row from a
table by using ORDER BY random() LIMIT 1;  I also needed to call
random() somewhere else in the query for something unrelated but this
deduplication behaviour caused that not to work the way I wanted it
to. This caused my unrelated random() value to be consistently very
low due to it being shared with the random() for the sort. I had to
add a subquery for the 2nd random() to fix the problem.

For the case you're complaining about, I don't think shifting the
goalposts randomly in back-branches is a good idea.  For the one you
reported we might upset someone who has come to rely on the aggstate
deduplication either for performance or for the number of volatile
function evaluations. I'm not really on board with changing that
unless someone highlights that it's causing an actual problem.

David

[1] https://postgr.es/m/CAApHDvqKO00nNQtchBs65VTfjEMUsYiB5r2P4VUreTdXE9RY1g@mail.gmail.com



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
David Rowley
Дата:
On Mon, 13 Feb 2023 at 13:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Code looks good now, but the comment still claims this is only
> important in the FILTER clause.  I'd rewrite the whole thing
> perhaps:
>
>     * We also don't risk using moving aggregates when there are subplans
>     * in the arguments or FILTER clause.  This is partly because
>     * contain_volatile_functions() doesn't look inside subplans; but
>     * there are other reasons why a subplan's output might be volatile.
>     * For example, syncscan mode can render the results nonrepeatable.

That seems better, so I pushed it with that. Thanks.

David



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 13 Feb 2023 at 13:46, Andres Freund <andres@anarazel.de> wrote:
>> What about find_compatible_agg()? I don't think there's as severe
>> consequences, but it also doesn't seem right as-is.

> This does not seem related to the issue being reported here. It seems
> to me this would only affect the aggregate deduplication logic and
> nothing else.

> For the case you're complaining about, I don't think shifting the
> goalposts randomly in back-branches is a good idea.  For the one you
> reported we might upset someone who has come to rely on the aggstate
> deduplication either for performance or for the number of volatile
> function evaluations. I'm not really on board with changing that
> unless someone highlights that it's causing an actual problem.

Yeah.  This behavior is at least semi-documented, IIRC.  We need to fix
moving-aggregate mode so it's not used when its assumptions don't hold,
because there's no way that that doesn't lead to insane behavior.  But
redefining the deduplication rules seems way more likely to break
existing queries than make anybody happy.

            regards, tom lane



Re: BUG #17777: An assert failed in nodeWindowAgg.c

От
Andres Freund
Дата:
Hi,

On 2023-02-13 16:21:11 +1300, David Rowley wrote:
> On Mon, 13 Feb 2023 at 13:46, Andres Freund <andres@anarazel.de> wrote:
> > What about find_compatible_agg()? I don't think there's as severe
> > consequences, but it also doesn't seem right as-is.
> 
> This does not seem related to the issue being reported here. It seems
> to me this would only affect the aggregate deduplication logic and
> nothing else.

Fair enough.

I guess I was influenced by my thinking that we ought to combine more of the
relevant logic between "plain" aggregates and window functions.

Greetings,

Andres Freund