Обсуждение: BUG #19082: Failing assert in index scan

Поиск
Список
Период
Сортировка

BUG #19082: Failing assert in index scan

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19082
Logged by:          Matthijs van der Vleuten
Email address:      postgresql@zr40.nl
PostgreSQL version: 16.10
Operating system:   Debian sid
Description:

(16.10-1.pgdg+1 from the pgdg repository, which was built with
--enable-cassert)

Last week, I've noticed a crash due to a failed assertion that was
consistently being triggered by an application every 10 seconds. Here's the
log:

---
TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File:
"./build/../src/backend/access/nbtree/nbtutils.c", Line: 1782, PID: 394782
postgres: mastodon mastodon_production [local]
SELECT(ExceptionalCondition+0x75) [0x55f9c3027ea5]
postgres: mastodon mastodon_production [local] SELECT(_bt_killitems+0x550)
[0x55f9c2b9d710]
postgres: mastodon mastodon_production [local] SELECT(+0x1eee95)
[0x55f9c2b94e95]
postgres: mastodon mastodon_production [local] SELECT(_bt_next+0x92)
[0x55f9c2b96472]
postgres: mastodon mastodon_production [local] SELECT(btgettuple+0x7c)
[0x55f9c2b92ddc]
postgres: mastodon mastodon_production [local]
SELECT(index_getnext_tid+0x56) [0x55f9c2b83b06]
postgres: mastodon mastodon_production [local]
SELECT(index_getnext_slot+0x7b) [0x55f9c2b83d4b]
postgres: mastodon mastodon_production [local] SELECT(+0x37bfbd)
[0x55f9c2d21fbd]
postgres: mastodon mastodon_production [local] SELECT(ExecScan+0xf8)
[0x55f9c2d03bc8]
postgres: mastodon mastodon_production [local] SELECT(+0x38bbd6)
[0x55f9c2d31bd6]
postgres: mastodon mastodon_production [local] SELECT(+0x37e3e9)
[0x55f9c2d243e9]
postgres: mastodon mastodon_production [local]
SELECT(standard_ExecutorRun+0x19b) [0x55f9c2cf913b]
postgres: mastodon mastodon_production [local] SELECT(+0x531141)
[0x55f9c2ed7141]
postgres: mastodon mastodon_production [local] SELECT(PortalRun+0x2b0)
[0x55f9c2ed8a10]
postgres: mastodon mastodon_production [local] SELECT(PostgresMain+0x19d9)
[0x55f9c2ed6449]
postgres: mastodon mastodon_production [local] SELECT(+0x49036b)
[0x55f9c2e3636b]
postgres: mastodon mastodon_production [local] SELECT(PostmasterMain+0xeee)
[0x55f9c2e3750e]
postgres: mastodon mastodon_production [local] SELECT(main+0x21c)
[0x55f9c2b1106c]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7effa4c2dca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7effa4c2dd65]
postgres: mastodon mastodon_production [local] SELECT(_start+0x21)
[0x55f9c2b11631]
2025-10-02 22:39:14 CEST 68db1e13.5c0 | LOG:  server process (PID 394782)
was terminated by signal 6: Aborted
2025-10-02 22:39:14 CEST 68db1e13.5c0 | DETAIL:  Failed process was running:
SELECT "mentions"."id" FROM "mentions" WHERE "mentions"."account_id" = $1
AND 1=1 ORDER BY "mentions"."id" ASC LIMIT $2
2025-10-02 22:39:14 CEST 68db1e13.5c0 | LOG:  terminating any other active
server processes
---

Since I noticed that the stack trace mentioned btrees, I reindexed all
indexes in this database and that appeared to resolve the symptom.

However, today I noticed it's happening again (although less frequently).
Logs:

---
TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File:
"./build/../src/backend/access/nbtree/nbtutils.c", Line: 1782, PID: 27448
postgres: mastodon mastodon_production [local]
SELECT(ExceptionalCondition+0x75) [0x562490c9dea5]
postgres: mastodon mastodon_production [local] SELECT(_bt_killitems+0x550)
[0x562490813710]
postgres: mastodon mastodon_production [local] SELECT(+0x1eee95)
[0x56249080ae95]
postgres: mastodon mastodon_production [local] SELECT(_bt_next+0x92)
[0x56249080c472]
postgres: mastodon mastodon_production [local] SELECT(btgettuple+0x7c)
[0x562490808ddc]
postgres: mastodon mastodon_production [local]
SELECT(index_getnext_tid+0x56) [0x5624907f9b06]
postgres: mastodon mastodon_production [local]
SELECT(index_getnext_slot+0x7b) [0x5624907f9d4b]
postgres: mastodon mastodon_production [local] SELECT(+0x37bfbd)
[0x562490997fbd]
postgres: mastodon mastodon_production [local] SELECT(ExecScan+0xf8)
[0x562490979bc8]
postgres: mastodon mastodon_production [local] SELECT(+0x38bbd6)
[0x5624909a7bd6]
postgres: mastodon mastodon_production [local] SELECT(+0x37e3e9)
[0x56249099a3e9]
postgres: mastodon mastodon_production [local]
SELECT(standard_ExecutorRun+0x19b) [0x56249096f13b]
postgres: mastodon mastodon_production [local] SELECT(+0x531141)
[0x562490b4d141]
postgres: mastodon mastodon_production [local] SELECT(PortalRun+0x2b0)
[0x562490b4ea10]
postgres: mastodon mastodon_production [local] SELECT(PostgresMain+0x19d9)
[0x562490b4c449]
postgres: mastodon mastodon_production [local] SELECT(+0x49036b)
[0x562490aac36b]
postgres: mastodon mastodon_production [local] SELECT(PostmasterMain+0xeee)
[0x562490aad50e]
postgres: mastodon mastodon_production [local] SELECT(main+0x21c)
[0x56249078706c]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f60b802eca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f60b802ed65]
postgres: mastodon mastodon_production [local] SELECT(_start+0x21)
[0x562490787631]
2025-10-10 11:19:10 CEST 68e84d11.5e7 | LOG:  server process (PID 27448) was
terminated by signal 6: Aborted
2025-10-10 11:19:10 CEST 68e84d11.5e7 | DETAIL:  Failed process was running:
SELECT "mentions"."id" FROM "mentions" WHERE "mentions"."account_id" = $1
AND 1=1 ORDER BY "mentions"."id" ASC LIMIT $2
2025-10-10 11:19:10 CEST 68e84d11.5e7 | LOG:  terminating any other active
server processes
---

Note that the "mentions" table isn't the only one apparently triggering this
assertion failure, here's another one I found in the logs:
(I'm not sure if the concurrent DELETE by another connection is relevant,
but since it happened the same second and touches the same table I've
included anyway.)

---
2025-10-06 03:01:17 CEST 68e30f7a.93684 | mastodon@mastodon_production |
00000 LOG:  duration: 77.732 ms  execute <unnamed>: DELETE FROM "statuses"
WHERE ("statuses"."id") IN (SELECT "statuses"."id" FROM "statuses" WHERE
"statuses"."deleted_at" IS NULL AND "statuses"."id" IN ($1, $2, $3, $4, $5,
$6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21,
$22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36,
$37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50) ORDER
BY "statuses"."id" DESC)
2025-10-06 03:01:17 CEST 68e30f7a.93684 | mastodon@mastodon_production |
00000 DETAIL:  parameters: (... removed potentially sensitive data ...)
2025-10-06 03:01:17 CEST 68db1e13.5c0 | LOG:  server process (PID 603780)
was terminated by signal 6: Aborted
2025-10-06 03:01:17 CEST 68db1e13.5c0 | DETAIL:  Failed process was running:
DELETE FROM "statuses" WHERE ("statuses"."id") IN (SELECT "statuses"."id"
FROM "statuses" WHERE "statuses"."deleted_at" IS NULL AND "statuses"."id" IN
($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
ORDER BY "statuses"."id" DESC)
TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File:
"./build/../src/backend/access/nbtree/nbtutils.c", Line: 1782, PID: 603780
postgres: mastodon mastodon_production [local]
DELETE(ExceptionalCondition+0x75) [0x55f9c3027ea5]
postgres: mastodon mastodon_production [local] DELETE(_bt_killitems+0x550)
[0x55f9c2b9d710]
postgres: mastodon mastodon_production [local] DELETE(+0x1eee95)
[0x55f9c2b94e95]
postgres: mastodon mastodon_production [local] DELETE(_bt_next+0x92)
[0x55f9c2b96472]
postgres: mastodon mastodon_production [local] DELETE(btgettuple+0x7c)
[0x55f9c2b92ddc]
postgres: mastodon mastodon_production [local]
DELETE(index_getnext_tid+0x56) [0x55f9c2b83b06]
postgres: mastodon mastodon_production [local]
DELETE(index_getnext_slot+0x7b) [0x55f9c2b83d4b]
postgres: mastodon mastodon_production [local] DELETE(+0x37bfbd)
[0x55f9c2d21fbd]
postgres: mastodon mastodon_production [local] DELETE(ExecScan+0xf8)
[0x55f9c2d03bc8]
postgres: mastodon mastodon_production [local] DELETE(+0x3864d7)
[0x55f9c2d2c4d7]
postgres: mastodon mastodon_production [local]
DELETE(standard_ExecutorRun+0x19b) [0x55f9c2cf913b]
postgres: mastodon mastodon_production [local] DELETE(+0x398108)
[0x55f9c2d3e108]
postgres: mastodon mastodon_production [local]
DELETE(SPI_execute_snapshot+0xe6) [0x55f9c2d3ec76]
postgres: mastodon mastodon_production [local] DELETE(+0x60c750)
[0x55f9c2fb2750]
postgres: mastodon mastodon_production [local] DELETE(+0x60d730)
[0x55f9c2fb3730]
postgres: mastodon mastodon_production [local]
DELETE(RI_FKey_setnull_del+0x34) [0x55f9c2fb4a64]
postgres: mastodon mastodon_production [local] DELETE(+0x327ade)
[0x55f9c2ccdade]
postgres: mastodon mastodon_production [local] DELETE(+0x329f4c)
[0x55f9c2ccff4c]
postgres: mastodon mastodon_production [local]
DELETE(AfterTriggerEndQuery+0x88) [0x55f9c2cd52e8]
postgres: mastodon mastodon_production [local]
DELETE(standard_ExecutorFinish+0x147) [0x55f9c2cf9417]
postgres: mastodon mastodon_production [local] DELETE(+0x531678)
[0x55f9c2ed7678]
postgres: mastodon mastodon_production [local] DELETE(+0x53248f)
[0x55f9c2ed848f]
postgres: mastodon mastodon_production [local] DELETE(PortalRun+0x1a3)
[0x55f9c2ed8903]
postgres: mastodon mastodon_production [local] DELETE(PostgresMain+0x19d9)
[0x55f9c2ed6449]
postgres: mastodon mastodon_production [local] DELETE(+0x49036b)
[0x55f9c2e3636b]
postgres: mastodon mastodon_production [local] DELETE(PostmasterMain+0xeee)
[0x55f9c2e3750e]
postgres: mastodon mastodon_production [local] DELETE(main+0x21c)
[0x55f9c2b1106c]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7effa4c2dca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7effa4c2dd65]
postgres: mastodon mastodon_production [local] DELETE(_start+0x21)
[0x55f9c2b11631]
2025-10-06 03:01:17 CEST 68db1e13.5c0 | LOG:  terminating any other active
server processes
---


Re: BUG #19082: Failing assert in index scan

От
Álvaro Herrera
Дата:
On 2025-Oct-10, PG Bug reporting form wrote:

> Last week, I've noticed a crash due to a failed assertion that was
> consistently being triggered by an application every 10 seconds. Here's the
> log:
> 
> ---
> TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File:
> "./build/../src/backend/access/nbtree/nbtutils.c", Line: 1782, PID: 394782
> postgres: mastodon mastodon_production [local]
> SELECT(ExceptionalCondition+0x75) [0x55f9c3027ea5]

AFAICS this assertion is new as of commit c7f25feb3862.  Peter?

I find it rather odd that you're running a production instance compiled
with assertions enabled, though.  They have a considerable impact on
performance, and should be used for development or testing only, not
production.  I would recommend to recompile with assertions disabled.
Also, it might well be that the assertion is buggy, and that nothing
would go wrong if that assertion is removed and just let the `if` test
below handle the case.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: BUG #19082: Failing assert in index scan

От
Peter Geoghegan
Дата:
On Fri, Oct 10, 2025 at 5:51 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> Last week, I've noticed a crash due to a failed assertion that was
> consistently being triggered by an application every 10 seconds. Here's the
> log:
>
> ---
> TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File:
> "./build/../src/backend/access/nbtree/nbtutils.c", Line: 1782, PID: 394782
> postgres: mastodon mastodon_production [local]

Does this involve an unlogged table/index?

> Since I noticed that the stack trace mentioned btrees, I reindexed all
> indexes in this database and that appeared to resolve the symptom.

That would be effective in the short term, though only because it'll
reduce the number of dead tuples that can be LP_DEAD set in the index
in the short term.

--
Peter Geoghegan



Re: BUG #19082: Failing assert in index scan

От
"Matthijs van der Vleuten"
Дата:
On Fri, Oct 10, 2025, at 17:40, Peter Geoghegan wrote:
> Does this involve an unlogged table/index?

No, none of the tables and indexes in that database are unlogged or temporary.



Re: BUG #19082: Failing assert in index scan

От
Peter Geoghegan
Дата:
On Fri, Oct 10, 2025 at 12:13 PM Matthijs van der Vleuten
<postgresql@zr40.nl> wrote:
> No, none of the tables and indexes in that database are unlogged or temporary.

Are these backwards scans? In other words, does EXPLAIN show "Index
Scan Backwards" for these queries? I see that the first query "ORDER
BY "mentions"."id" ASC", but perhaps that index column is declared as
DESC (in which case it'd be a backwards scan)?

--
Peter Geoghegan



Re: BUG #19082: Failing assert in index scan

От
"Matthijs van der Vleuten"
Дата:
On Fri, Oct 10, 2025, at 18:28, Peter Geoghegan wrote:
> On Fri, Oct 10, 2025 at 12:13 PM Matthijs van der Vleuten
> <postgresql@zr40.nl> wrote:
>> No, none of the tables and indexes in that database are unlogged or temporary.
>
> Are these backwards scans? In other words, does EXPLAIN show "Index
> Scan Backwards" for these queries? I see that the first query "ORDER
> BY "mentions"."id" ASC", but perhaps that index column is declared as
> DESC (in which case it'd be a backwards scan)?

No, the index columns aren't declared as DESC, and, as far as I've been able to verify, the query plan doesn't use
backwardscans. 

I've isolated one particular row that triggers this assert. It's a different table but it fails the assertion the same
way.

zr40@[local]:5432 mastodon_production=# explain SELECT "preview_cards_statuses".* FROM "preview_cards_statuses" WHERE
"preview_cards_statuses"."status_id"= 114337794133378371 LIMIT 1; 
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..2.84 rows=1 width=81)
   ->  Index Scan using preview_cards_statuses_pkey on preview_cards_statuses  (cost=0.42..2.84 rows=1 width=81)
         Index Cond: (status_id = '114337794133378371'::bigint)
(3 rows)

zr40@[local]:5432 mastodon_production=# select "preview_cards_statuses".* FROM "preview_cards_statuses" WHERE
"preview_cards_statuses"."status_id"= 114337794133378371 LIMIT 1; 
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

Server logs:

TRAP: failed Assert("!XLogRecPtrIsInvalid(so->currPos.lsn)"), File: "./build/../src/backend/access/nbtree/nbtutils.c",
Line:1782, PID: 50833 
postgres: zr40 mastodon_production [local] SELECT(ExceptionalCondition+0x75) [0x5654f247eea5]
postgres: zr40 mastodon_production [local] SELECT(_bt_killitems+0x550) [0x5654f1ff4710]
postgres: zr40 mastodon_production [local] SELECT(+0x1eee95) [0x5654f1febe95]
postgres: zr40 mastodon_production [local] SELECT(_bt_next+0x92) [0x5654f1fed472]
postgres: zr40 mastodon_production [local] SELECT(btgettuple+0x7c) [0x5654f1fe9ddc]
postgres: zr40 mastodon_production [local] SELECT(index_getnext_tid+0x56) [0x5654f1fdab06]
postgres: zr40 mastodon_production [local] SELECT(index_getnext_slot+0x7b) [0x5654f1fdad4b]
postgres: zr40 mastodon_production [local] SELECT(+0x37bfbd) [0x5654f2178fbd]
postgres: zr40 mastodon_production [local] SELECT(+0x37e3e9) [0x5654f217b3e9]
postgres: zr40 mastodon_production [local] SELECT(standard_ExecutorRun+0x19b) [0x5654f215013b]
postgres: zr40 mastodon_production [local] SELECT(+0x531141) [0x5654f232e141]
postgres: zr40 mastodon_production [local] SELECT(PortalRun+0x2b0) [0x5654f232fa10]
postgres: zr40 mastodon_production [local] SELECT(+0x52e651) [0x5654f232b651]
postgres: zr40 mastodon_production [local] SELECT(PostgresMain+0x1618) [0x5654f232d088]
postgres: zr40 mastodon_production [local] SELECT(+0x49036b) [0x5654f228d36b]
postgres: zr40 mastodon_production [local] SELECT(PostmasterMain+0xeee) [0x5654f228e50e]
postgres: zr40 mastodon_production [local] SELECT(main+0x21c) [0x5654f1f6806c]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f0b6922eca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f0b6922ed65]
postgres: zr40 mastodon_production [local] SELECT(_start+0x21) [0x5654f1f68631]
2025-10-10 19:03:26 CEST 68e8cff4.6ce1 | LOG:  server process (PID 50833) was terminated by signal 6: Aborted
2025-10-10 19:03:26 CEST 68e8cff4.6ce1 | DETAIL:  Failed process was running: select "preview_cards_statuses".* FROM
"preview_cards_statuses"WHERE "preview_cards_statuses"."status_id" = 114337794133378371 LIMIT 1; 
2025-10-10 19:03:26 CEST 68e8cff4.6ce1 | LOG:  terminating any other active server processes



Re: BUG #19082: Failing assert in index scan

От
Peter Geoghegan
Дата:
On Fri, Oct 10, 2025 at 1:09 PM Matthijs van der Vleuten
<postgresql@zr40.nl> wrote:
> No, the index columns aren't declared as DESC, and, as far as I've been able to verify, the query plan doesn't use
backwardscans. 
>
> I've isolated one particular row that triggers this assert. It's a different table but it fails the assertion the
sameway. 

Are you using wal_level = minimal, then?

--
Peter Geoghegan



Re: BUG #19082: Failing assert in index scan

От
Peter Geoghegan
Дата:
On Fri, Oct 10, 2025 at 2:10 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Are you using wal_level = minimal, then?

Matthijs confirms (off list) that wal_level = minimal has been used.
So it looks like the assertion itself is wrong; it isn't accounting
for the possibility that the LSN from currPos is InvalidXLogRecPtr/0
due solely to the index build never needing to be logged.

I actually copied this _bt_killitems assertion from the similar code
path in GiST (the one at the top of gistkillitems). That won't cause
an assertion failure following an index build with wal_level = minimal
because the index build will still set the page LSN to GistBuildLSN.
That's not the case in nbtree -- nbtree doesn't use fake LSNs with
unlogged relations (although I think that it should, specifically to
remove special case handling of unlogged relations within
_bt_killitems).

Anyway, the fix is simple: remove the overzealous assertion. I'll take
care of that shortly.

--
Peter Geoghegan