Обсуждение: Re: PG17.6 wal apply bug (SIGSEGV)

Поиск
Список
Период
Сортировка

Re: PG17.6 wal apply bug (SIGSEGV)

От
"badfilez@gmail.com"
Дата:
backrtace

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000057eff2 in _bt_restore_page (page=0x7f6f48fd1000 "", from=0x7f6fe2eccd80 "", len=<optimized out>) at nbtxlog.c:63
63            itemsz = MAXALIGN(itemsz);
(gdb) bt full
#0  0x000000000057eff2 in _bt_restore_page (page=0x7f6f48fd1000 "", from=0x7f6fe2eccd80 "", len=<optimized out>) at nbtxlog.c:63
        itupdata = <optimized out>
        itemsz = 0
        end = 0x7f6fe2ecd8c0 "(\265/\375`\260\005\205\023"
        items = {0x0 <repeats 227 times>, 0x7f6f00000000 "\211\243\362hw\366\371\003\b", 0x7f6fe2eccd80 "" <repeats 180 times>}
        itemsizes = {24 <repeats 33 times>, 0 <repeats 375 times>}
        i = 1318
        nitems = <optimized out>
        __func__ = "_bt_restore_page"
        __errno_location = <optimized out>


On 20/10/2025 13:58, badfilez@gmail.com wrote:
Hello,

Postgres 17 cluster from official repo on RHEL8 (master and 2 replicas)

on both replicas, I get

2025-10-18 15:40:50.843 MSK [1448] LOG:  entering standby mode
2025-10-18 15:40:50.865 MSK [1448] LOG:  redo starts at 1F35/D08DE298
2025-10-18 15:41:14.553 MSK [1381] LOG:  startup process (PID 1448) was terminated by signal 11: Segmentation fault
2025-10-18 15:41:14.553 MSK [1381] LOG:  terminating any other active server processes
2025-10-18 15:41:14.555 MSK [1381] LOG:  shutting down due to startup process failure
2025-10-18 15:41:14.677 MSK [1381] LOG:  database system is shut down

After debugging,

replica recovery creates corrupted index file from wal,
waldump does not show any wal corruption, no prior io errors in logs
master has not crashed and working ok, no errors in log

the operation on which segfault happens is (if i stop recovery on previous operation it does not trigger segfault)

rmgr: Btree len (rec/tot): 3758/ 5774, tx: 1711720455, lsn: 1F36/30E3C7B8, prev 1F36/30E3C760, desc: SPLIT_L level: 0, firstrightoff: 140, newitemoff: 140, postingoff: 0, blkref #0: rel 1663/16385/151181595 blk 63203 FPW, blkref #1: rel 1663/16385/151181595 blk 112208, blkref #2: rel 1663/16385/151181595 blk 108144 FPW

the wal segment containing the instruction attached



Re: PG17.6 wal apply bug (SIGSEGV)

От
Peter Geoghegan
Дата:
On Mon, Oct 20, 2025 at 1:07 PM badfilez@gmail.com <badfilez@gmail.com> wrote:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000000000057eff2 in _bt_restore_page (page=0x7f6f48fd1000 "", from=0x7f6fe2eccd80 "", len=<optimized out>) at
nbtxlog.c:63
> 63            itemsz = MAXALIGN(itemsz);
> (gdb) bt full

"itemsz = 0" suggests that the index was already corrupt, before the
WAL record is applied.

I suggest that you use contrib/amcheck (or the pg_amcheck frontend
program) to ascertain the extent of any index corruption on this
database.

--
Peter Geoghegan



Re: PG17.6 wal apply bug (SIGSEGV)

От
"badfilez@gmail.com"
Дата:
Hi,

Thank you,
there still are 2 broken indexes in master DB,
one of them exactly matches the said relation 151181595.

still,
is it proper wal apply procedure, to segfault in such a case?


On 20/10/2025 20:18, Peter Geoghegan wrote:
> On Mon, Oct 20, 2025 at 1:07 PM badfilez@gmail.com <badfilez@gmail.com> wrote:
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x000000000057eff2 in _bt_restore_page (page=0x7f6f48fd1000 "", from=0x7f6fe2eccd80 "", len=<optimized out>) at
nbtxlog.c:63
>> 63            itemsz = MAXALIGN(itemsz);
>> (gdb) bt full
> "itemsz = 0" suggests that the index was already corrupt, before the
> WAL record is applied.
>
> I suggest that you use contrib/amcheck (or the pg_amcheck frontend
> program) to ascertain the extent of any index corruption on this
> database.
>




Re: PG17.6 wal apply bug (SIGSEGV)

От
"badfilez@gmail.com"
Дата:
Follow up:

the error detected on master is:

ERROR: XX002: item order invariant violated for index "2722_1401_trends_uint_pkey" DETAIL: Lower index tid=(63203,46) (points to heap tid=(96487,80)) higher index tid=(63203,47) (points to heap tid=(0,0)) page lsn=1F5B/CB8F8098. LOCATION: bt_target_page_check, verify_nbtree.c:1773 query was: SELECT "public".bt_index_check(index := c.oid, heapallindexed := false ) FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE c.oid = 151181595 AND c.oid = i.indexrelid AND c.relpersistence != 't' AND i.indisready AND i.indisvalid AND i.indislivebtree index "zabbix._timescaledb_internal._hyper_9_2722_chunk_trends_uint_clock_idx":

Do I get in right, 
this corruption was somehow transferred to replicas first, and then wal was tried to apply over corrupted index?

Why it did not crash the master then?


On 22/10/2025 09:19, badfilez@gmail.com wrote:
Hi,

Thank you,
there still are 2 broken indexes in master DB,
one of them exactly matches the said relation 151181595.

still,
is it proper wal apply procedure, to segfault in such a case?


On 20/10/2025 20:18, Peter Geoghegan wrote:
On Mon, Oct 20, 2025 at 1:07 PM badfilez@gmail.com <badfilez@gmail.com> wrote:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000057eff2 in _bt_restore_page (page=0x7f6f48fd1000 "", from=0x7f6fe2eccd80 "", len=<optimized out>) at nbtxlog.c:63
63            itemsz = MAXALIGN(itemsz);
(gdb) bt full
"itemsz = 0" suggests that the index was already corrupt, before the
WAL record is applied.

I suggest that you use contrib/amcheck (or the pg_amcheck frontend
program) to ascertain the extent of any index corruption on this
database.



Re: PG17.6 wal apply bug (SIGSEGV)

От
Peter Geoghegan
Дата:
On Wed, Oct 22, 2025 at 4:44 AM badfilez@gmail.com <badfilez@gmail.com> wrote:
> Do I get in right,
> this corruption was somehow transferred to replicas first, and then wal was tried to apply over corrupted index?

I don't know what happened. There are just too many possibilities for
me to even guess.

> Why it did not crash the master then?

Probably because the REDO routine just doesn't run there. The
corruption on the primary *might* have led to a crash in some other
place.

In short, it's hard (perhaps impossible) to make a strong guarantee that
there won't be a SIGSEGV when the database is corrupt.

> On 22/10/2025 09:19, badfilez@gmail.com wrote:
>
> Hi,
>
> Thank you,
> there still are 2 broken indexes in master DB,
> one of them exactly matches the said relation 151181595.
>
> still,
> is it proper wal apply procedure, to segfault in such a case?

It's not ideal. We try to avoid that. But even if the REDO routine
didn't SIGSEGV, it would still have to fail in some other way (given
the kind of corruption that we see here).

The only advantage of not segfaulting is that the standby can at least
continue to accept queries for a while. But it will still inevitably
fall further and further behind, and so you'd still have to recreate
the replica (possibly only after resolving the corruption on the
primary) to get things working again. The important thing is to try to
determine where the corruption came from, to avoid the same underlying
problem causing more corruption in the future. And that you repair the
corruption.

--
Peter Geoghegan