Обсуждение: sublink [exists (select xxx group by grouping sets ())] causes an assertion error
Hello postgres hackers:
I recently notice these sql can lead to a assertion error in pg14 and older version. Here is an example:
postgres=> CREATE TABLE t1 (a int);
CREATE TABLE
postgres=> INSERT INTO t1 VALUES (1);
INSERT 0 1
postgres=> SELECT
EXISTS (
SELECT
* FROM t1
GROUP BY
GROUPING SETS ((a), generate_series (1, 262144))
) AS result;
server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Here is breaktrace (release v14.11)
#0 0x00007fdc4d58d387 in raise () from /lib64/libc.so.6
#1 0x00007fdc4d58ea78 in abort () from /lib64/libc.so.6
#2 0x00000000009479aa in ExceptionalCondition (conditionName=conditionName@entry=0xb27bf0 "!lt->writing || lt->buffer == NULL", errorType=errorType@entry=0x99a00b "FailedAssertion", fileName=fileName@entry=0xb27989 "logtape.c", lineNumber=lineNumber@en
try=1279) at assert.c:69
#3 0x000000000097715b in LogicalTapeSetBlocks (lts=<optimized out>) at logtape.c:1279
#4 0x00000000009793ab in tuplesort_free (state=0x1ee30e0) at tuplesort.c:1402
#5 0x000000000097f2b9 in tuplesort_end (state=0x1ee30e0) at tuplesort.c:1468
#6 0x00000000006cd3a1 in ExecEndAgg (node=0x1ecf2c8) at nodeAgg.c:4401
#7 0x00000000006be3b1 in ExecEndNode (node=<optimized out>) at execProcnode.c:733
#8 0x00000000006b8514 in ExecEndPlan (estate=0x1ecf050, planstate=<optimized out>) at execMain.c:1416
#9 standard_ExecutorEnd (queryDesc=0x1e18af0) at execMain.c:493
#10 0x0000000000673779 in PortalCleanup (portal=<optimized out>) at portalcmds.c:299
#11 0x0000000000972b34 in PortalDrop (portal=0x1e586c0, isTopCommit=<optimized out>) at portalmem.c:502
#12 0x0000000000834140 in exec_simple_query (query_string=0x1df5020 "SELECT\nEXISTS (\nSELECT\n* FROM t1\nGROUP BY\nGROUPING SETS ((a), generate_series (1, 26214400))\n) AS result;") at postgres.c:1223
#13 0x0000000000835e37 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7fff746c4470, dbname=<optimized out>, username=<optimized out>) at postgres.c:4513
#14 0x00000000007acdcb in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4540
#15 BackendStartup (port=<optimized out>) at postmaster.c:4262
#16 ServerLoop () at postmaster.c:1748
#17 0x00000000007adc45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1defb50) at postmaster.c:1420
#18 0x000000000050a544 in main (argc=3, argv=0x1defb50) at main.c:209
The reason could be that, there are mutiple phase in grouping sets in this exists sublink. In executing phase, once function ExecAgg return a valid tupleslot, ExecSubPlan won't call exector_node repeatedly, ineed there is no needed. It causes unexpected status in sort_out and sort_in, so they don't pass assertion checking in ExecEndAgg.
I haven’t thought of a good solution yet. Only in my opinion, it is unreasonable to add processing for other specific types of execution nodes in function ExecSubplan, but there is no way to know in ExecAgg whether it is executed again.
Best regards,
Tinghai Zhao
Вложения
"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" <zhaotinghai.zth@alibaba-inc.com> writes: > I recently notice these sql can lead to a assertion error in pg14 and older version. Here is an example: > postgres=> CREATE TABLE t1 (a int); > postgres=> INSERT INTO t1 VALUES (1); > postgres=> SELECT EXISTS ( SELECT * FROM t1 GROUP BY GROUPING SETS ((a), generate_series (1, 262144)) ) AS result; > server closed the connection unexpectedly In current v14 this produces: TRAP: FailedAssertion("!lt->writing || lt->buffer == NULL", File: "logtape.c", Line: 1279, PID: 928622) Thanks for the report. I did some bisecting and found that the crash appears at Jeff's commit c8aeaf3ab (which introduced this assertion) and disappears at Heikki's c4649cce3 (which removed it). So I would say that the problem is "this assertion is wrong", and we should fix the problem by fixing the assertion, not by hacking around in distant calling code. On the whole, since this code has been dead for several versions, I'd be inclined to just remove the assertion. I think it's quite risky because of the possibility that we reach this function during post-transaction-abort cleanup, when there's no very good reason to assume that the tapeset's been closed down cleanly. (To be clear, that's not what's happening in the given test case ... but I fear that it could.) regards, tom lane
On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote: > Thanks for the report. I did some bisecting and found that the crash > appears at Jeff's commit c8aeaf3ab (which introduced this assertion) > and disappears at Heikki's c4649cce3 (which removed it). So I would > say that the problem is "this assertion is wrong", and we should fix > the problem by fixing the assertion, not by hacking around in distant > calling code. On the whole, since this code has been dead for > several versions, I'd be inclined to just remove the assertion. c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm not sure if the removal of the Assert was related to his changes, or if he just realized the assertion was wrong and removed it along the way? Also, without the assertion, the word "should" in the comment is ambiguous (does it mean "must not" or something else), and it still exists in master. Do we care about the calculation being wrong if there's an unfinished write? If not, I'll just clarify that the calculation doesn't take into account still-buffered data. If we do care, then something might need to be fixed. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote: >> Thanks for the report. I did some bisecting and found that the crash >> appears at Jeff's commit c8aeaf3ab (which introduced this assertion) >> and disappears at Heikki's c4649cce3 (which removed it). So I would >> say that the problem is "this assertion is wrong", and we should fix >> the problem by fixing the assertion, not by hacking around in distant >> calling code. On the whole, since this code has been dead for >> several versions, I'd be inclined to just remove the assertion. > c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm > not sure if the removal of the Assert was related to his changes, or if > he just realized the assertion was wrong and removed it along the way? My guess is he just zapped it because the code block was dependent on the "tape" abstraction which was going away. Heikki? > Also, without the assertion, the word "should" in the comment is > ambiguous (does it mean "must not" or something else), and it still > exists in master. Do we care about the calculation being wrong if > there's an unfinished write? I think it's actually fine. The callers of LogicalTapeSetBlocks only use the number for statistics or trace reporting, so precision isn't critical in the first place, but I think what they care about is the amount of data that's really been written out to files. The comment should be clarified, for sure. regards, tom lane