Обсуждение: sublink [exists (select xxx group by grouping sets ())] causes an assertion error

Поиск
Список
Период
Сортировка

sublink [exists (select xxx group by grouping sets ())] causes an assertion error

От
"赵庭海(庭章)"
Дата:

Hello postgres hackers:

I recently notice these sql can lead to a assertion error in pg14 and older version. Here is an example:

postgres=> CREATE TABLE t1 (a int);
CREATE TABLE
postgres=> INSERT INTO t1 VALUES (1);
INSERT 0 1
postgres=> SELECT
EXISTS (
SELECT
* FROM t1
GROUP BY
GROUPING SETS ((a), generate_series (1, 262144))
) AS result;
server closed the connection unexpectedly      This probably means the server terminated abnormally      before or while processing the request.
Here is breaktrace (release v14.11)
#0  0x00007fdc4d58d387 in raise () from /lib64/libc.so.6
#1  0x00007fdc4d58ea78 in abort () from /lib64/libc.so.6
#2  0x00000000009479aa in ExceptionalCondition (conditionName=conditionName@entry=0xb27bf0 "!lt->writing || lt->buffer == NULL", errorType=errorType@entry=0x99a00b "FailedAssertion", fileName=fileName@entry=0xb27989 "logtape.c", lineNumber=lineNumber@en
try=1279) at assert.c:69
#3  0x000000000097715b in LogicalTapeSetBlocks (lts=<optimized out>) at logtape.c:1279
#4  0x00000000009793ab in tuplesort_free (state=0x1ee30e0) at tuplesort.c:1402
#5  0x000000000097f2b9 in tuplesort_end	(state=0x1ee30e0) at tuplesort.c:1468
#6  0x00000000006cd3a1 in ExecEndAgg (node=0x1ecf2c8) at nodeAgg.c:4401
#7  0x00000000006be3b1 in ExecEndNode (node=<optimized out>) at execProcnode.c:733
#8  0x00000000006b8514 in ExecEndPlan (estate=0x1ecf050, planstate=<optimized out>) at execMain.c:1416
#9  standard_ExecutorEnd (queryDesc=0x1e18af0) at execMain.c:493
#10 0x0000000000673779 in PortalCleanup	(portal=<optimized out>) at portalcmds.c:299
#11 0x0000000000972b34 in PortalDrop (portal=0x1e586c0, isTopCommit=<optimized out>) at portalmem.c:502
#12 0x0000000000834140 in exec_simple_query (query_string=0x1df5020 "SELECT\nEXISTS (\nSELECT\n* FROM t1\nGROUP BY\nGROUPING SETS ((a), generate_series (1, 26214400))\n) AS result;") at postgres.c:1223
#13 0x0000000000835e37 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7fff746c4470, dbname=<optimized out>, username=<optimized out>) at postgres.c:4513
#14 0x00000000007acdcb in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4540
#15 BackendStartup (port=<optimized out>) at postmaster.c:4262
#16 ServerLoop () at postmaster.c:1748
#17 0x00000000007adc45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1defb50) at postmaster.c:1420
#18 0x000000000050a544 in main (argc=3, argv=0x1defb50) at main.c:209

The reason could be that,  there are mutiple phase in grouping sets in  this exists sublink. In executing phase, once function ExecAgg return a valid tupleslot, ExecSubPlan won't call exector_node repeatedly, ineed there is no needed. It causes unexpected status in sort_out and sort_in, so they don't pass assertion checking in ExecEndAgg.


I haven’t thought of a good solution yet. Only in my opinion, it is unreasonable to add processing for other specific types of execution nodes in function ExecSubplan, but there is no way to know in ExecAgg whether it is executed again.


Best regards,

Tinghai Zhao


Вложения
"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" <zhaotinghai.zth@alibaba-inc.com> writes:
> I recently notice these sql can lead to a assertion error in pg14 and older version. Here is an example:
> postgres=> CREATE TABLE t1 (a int);
> postgres=> INSERT INTO t1 VALUES (1);
> postgres=> SELECT EXISTS ( SELECT * FROM t1 GROUP BY GROUPING SETS ((a), generate_series (1, 262144)) ) AS result;
> server closed the connection unexpectedly

In current v14 this produces:
TRAP: FailedAssertion("!lt->writing || lt->buffer == NULL", File: "logtape.c", Line: 1279, PID: 928622)

Thanks for the report.  I did some bisecting and found that the crash
appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
and disappears at Heikki's c4649cce3 (which removed it).  So I would
say that the problem is "this assertion is wrong", and we should fix
the problem by fixing the assertion, not by hacking around in distant
calling code.  On the whole, since this code has been dead for
several versions, I'd be inclined to just remove the assertion.
I think it's quite risky because of the possibility that we reach
this function during post-transaction-abort cleanup, when there's
no very good reason to assume that the tapeset's been closed down
cleanly.  (To be clear, that's not what's happening in the given
test case ... but I fear that it could.)

            regards, tom lane



On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote:
> Thanks for the report.  I did some bisecting and found that the crash
> appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
> and disappears at Heikki's c4649cce3 (which removed it).  So I would
> say that the problem is "this assertion is wrong", and we should fix
> the problem by fixing the assertion, not by hacking around in distant
> calling code.  On the whole, since this code has been dead for
> several versions, I'd be inclined to just remove the assertion.

c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm
not sure if the removal of the Assert was related to his changes, or if
he just realized the assertion was wrong and removed it along the way?

Also, without the assertion, the word "should" in the comment is
ambiguous (does it mean "must not" or something else), and it still
exists in master. Do we care about the calculation being wrong if
there's an unfinished write? If not, I'll just clarify that the
calculation doesn't take into account still-buffered data. If we do
care, then something might need to be fixed.

Regards,
    Jeff Davis




Jeff Davis <pgsql@j-davis.com> writes:
> On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote:
>> Thanks for the report.  I did some bisecting and found that the crash
>> appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
>> and disappears at Heikki's c4649cce3 (which removed it).  So I would
>> say that the problem is "this assertion is wrong", and we should fix
>> the problem by fixing the assertion, not by hacking around in distant
>> calling code.  On the whole, since this code has been dead for
>> several versions, I'd be inclined to just remove the assertion.

> c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm
> not sure if the removal of the Assert was related to his changes, or if
> he just realized the assertion was wrong and removed it along the way?

My guess is he just zapped it because the code block was dependent
on the "tape" abstraction which was going away.  Heikki?

> Also, without the assertion, the word "should" in the comment is
> ambiguous (does it mean "must not" or something else), and it still
> exists in master. Do we care about the calculation being wrong if
> there's an unfinished write?

I think it's actually fine.  The callers of LogicalTapeSetBlocks only
use the number for statistics or trace reporting, so precision isn't
critical in the first place, but I think what they care about is the
amount of data that's really been written out to files.  The
comment should be clarified, for sure.

            regards, tom lane