The End of the WAL

Поиск
Список
Период
Сортировка
От Christoph Berg
Тема The End of the WAL
Дата
Msg-id YTsuKcwWwMOii+WB@msg.credativ.de
обсуждение исходный текст
Ответ на Re: trap instead of error on 32 TiB table  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
Re: Heikki Linnakangas
> On 09/09/2021 17:25, Tom Lane wrote:
> > Having done that, the check in md.c could be reduced to an Assert,
> > although there's something to be said for leaving it as-is as an
> > extra layer of defense.
> 
> Some operations call smgrextend() directly, like B-tree index creation. We
> don't want those operations to hit an assertion either.

Thanks, I can now see a proper error on 15devel with cassert enabled:

# insert into huge select generate_series(1,1000);
FEHLER:  54000: cannot extend relation base/13550/16384 beyond 4294967295 blocks
ORT:  ReadBuffer_common, bufmgr.c:831



Some months ago I had already tried what happens on running into
another limit, namely the end of WAL. Back then I was attributing the
result to "won't happen anyway", but since we are talking asserts,
there is the result:

/usr/lib/postgresql/15/bin/pg_resetwal -l 00000001FFFFFFFF000000FF -D $PWD
 select pg_current_wal_lsn();
 pg_current_wal_lsn
────────────────────
 FFFFFFFF/FF000150
create table foo (id bigint);
repeat a few times: insert into foo select generate_series(1,200000);

15devel, cassert:

TRAP: FailedAssertion("XLogRecPtrToBytePos(*EndPos) == endbytepos", File:
"./build/../src/backend/access/transam/xlog.c",Line: 1324, PID: 1651661)
 
postgres: 15/regress: cbe postgres ::1(45564) INSERT(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x223022)[0x564ad115f022]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(heap_insert+0x3ae)[0x564ad10f0a2e]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x1bf8e9)[0x564ad10fb8e9]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35e30c)[0x564ad129a30c]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35eedc)[0x564ad129aedc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(standard_ExecutorRun+0x115)[0x564ad12695b5]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4da312)[0x564ad1416312]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4db0ee)[0x564ad14170ee]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PortalRun+0x2ec)[0x564ad14176bc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4d72b6)[0x564ad14132b6]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PostgresMain+0x181c)[0x564ad1414edc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x43fd80)[0x564ad137bd80]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PostmasterMain+0xca0)[0x564ad137cd10]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(main+0x221)[0x564ad10973d1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f63bcbe1d0a]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(_start+0x2a)[0x564ad10979ca]
2021-09-10 11:44:40.997 CEST [1651617] LOG:  request to flush past end of generated WAL; request FFFFFFFF/FFFFE000,
currentposition 0/50
 
2021-09-10 11:44:42.019 CEST [1651613] LOG:  Serverprozess (PID 1651661) wurde von Signal 6 beendet: Abgebrochen
2021-09-10 11:44:42.019 CEST [1651613] DETAIL:  Der fehlgeschlagene Prozess führte aus: insert into foo select
generate_series(1,200000);

The system properly (?) recovers, resuming at some FFFFFFFF/FFE614B8
position (i.e. discarding the part that was overflowing). However, if
I push it by moving closer to the end by doing smaller inserts, I can
get it into an infinite recovery loop:

2021-09-10 11:48:41.050 CEST [1652403] LOG:  Datenbanksystem wurde beim Herunterfahren unterbrochen; letzte bekannte
Aktionam 2021-09-10 11:48:40 CEST
 
2021-09-10 11:48:41.051 CEST [1652403] LOG:  Datenbanksystem wurde nicht richtig heruntergefahren; automatische
Wiederherstellungläuft
 
2021-09-10 11:48:41.051 CEST [1652403] LOG:  Redo beginnt bei FFFFFFFF/FFFFDF78
2021-09-10 11:48:41.051 CEST [1652403] LOG:  ungültige Datensatzlänge bei FFFFFFFF/FFFFDFB0: 24 erwartet, 0 erhalten
2021-09-10 11:48:41.051 CEST [1652403] LOG:  redo done at FFFFFFFF/FFFFDF78 system usage: CPU: Benutzer: 0,00 s,
System:0,00 s, verstrichen: 0,00 s
 
TRAP: FailedAssertion("((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC", File:
"./build/../src/backend/access/transam/xlog.c",Line: 1982, PID: 1652404)
 
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x2221e8)[0x564ad115e1e8]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(XLogInsertRecord+0x587)[0x564ad115ea27]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(CreateCheckPoint+0x64e)[0x564ad11608ee]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(CheckpointerMain+0x3d4)[0x564ad136db34]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(AuxiliaryProcessMain+0xef)[0x564ad136bacf]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x43d116)[0x564ad1379116]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x43f71a)[0x564ad137b71a]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(PostmasterMain+0xca0)[0x564ad137cd10]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(main+0x221)[0x564ad10973d1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f63bcbe1d0a]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(_start+0x2a)[0x564ad10979ca]
2021-09-10 11:48:41.791 CEST [1651613] LOG:  Checkpointer-Prozess (PID 1652404) wurde von Signal 6 beendet:
Abgebrochen
2021-09-10 11:48:41.791 CEST [1651613] LOG:  aktive Serverprozesse werden abgebrochen
2021-09-10 11:48:41.791 CEST [1651613] LOG:  alle Serverprozesse beendet; initialisiere neu
... goto 10

Doing the same test on 15devel without cassert, the inserting query
gets stuck in a busy loop (no wait_event) that ^C won't terminate.

Two backtraces from that running process:

(gdb) bt
#0  0x0000563604fdb63b in memset (__len=8192, __ch=0, __dest=0x7f710f9b6000) at
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#1  AdvanceXLInsertBuffer (upto=upto@entry=18446744073709543424, opportunistic=opportunistic@entry=false)
    at ./build/../src/backend/access/transam/xlog.c:2220
#2  0x0000563604fdb920 in GetXLogBuffer (ptr=18446744073709543424) at
./build/../src/backend/access/transam/xlog.c:1959
#3  0x0000563604fdd1b1 in CopyXLogRecordToWAL (EndPos=18446744073709543480, StartPos=18446744073709543392,
    rdata=0x56360562b1c0 <hdr_rdt>, isLogSwitch=false, write_len=63) at
./build/../src/backend/access/transam/xlog.c:1558
#4  XLogInsertRecord (rdata=rdata@entry=0x56360562b1c0 <hdr_rdt>, fpw_lsn=fpw_lsn@entry=18446744073709543392,
flags=<optimizedout>,
 
    num_fpi=num_fpi@entry=0) at ./build/../src/backend/access/transam/xlog.c:1123
#5  0x0000563604fe774a in XLogInsert (rmid=rmid@entry=10 '\n', info=info@entry=0 '\000')
    at ./build/../src/backend/access/transam/xloginsert.c:480
#6  0x0000563604f87b27 in heap_insert (relation=relation@entry=0x7f710f6d7938, tup=tup@entry=0x5636061bc288,
cid=cid@entry=0,
    options=options@entry=0, bistate=bistate@entry=0x0) at ./build/../src/backend/access/heap/heapam.c:2208
#7  0x0000563604f8fe89 in heapam_tuple_insert (relation=0x7f710f6d7938, slot=0x5636061bc1f8, cid=0, options=0,
bistate=0x0)
    at ./build/../src/backend/access/heap/heapam_handler.c:252
#8  0x00005636050ff35c in table_tuple_insert (bistate=0x0, options=0, cid=<optimized out>, slot=0x5636061bc1f8,
rel=<optimizedout>)
 
    at ./build/../src/include/access/tableam.h:1374
#9  ExecInsert (mtstate=0x5636061a5ad8, resultRelInfo=0x5636061a5ce8, slot=0x5636061bc1f8, planSlot=0x5636061bb778,
    estate=0x5636061a5868, canSetTag=<optimized out>) at ./build/../src/backend/executor/nodeModifyTable.c:934
#10 0x00005636051004c7 in ExecModifyTable (pstate=<optimized out>) at
./build/../src/backend/executor/nodeModifyTable.c:2561
#11 0x00005636050d599d in ExecProcNode (node=0x5636061a5ad8) at ./build/../src/include/executor/executor.h:257
#12 ExecutePlan (execute_once=<optimized out>, dest=0x5636061b22f8, direction=<optimized out>, numberTuples=0,
    sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>, planstate=0x5636061a5ad8,
estate=0x5636061a5868)
    at ./build/../src/backend/executor/execMain.c:1551
#13 standard_ExecutorRun (queryDesc=0x563606109d08, direction=<optimized out>, count=0, execute_once=<optimized out>)
    at ./build/../src/backend/executor/execMain.c:361
#14 0x00005636052501e2 in ProcessQuery (plan=0x5636061b2218,
    sourceText=0x5636060e7128 "insert into foo select generate_series(1,500000);", params=0x0, queryEnv=0x0,
dest=0x5636061b22f8,
    qc=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:160
#15 0x0000563605250dd9 in PortalRunMulti (portal=portal@entry=0x56360614b3a8, isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x5636061b22f8, altdest=altdest@entry=0x5636061b22f8,
    qc=qc@entry=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:1266
#16 0x000056360525129c in PortalRun (portal=portal@entry=0x56360614b3a8, count=count@entry=9223372036854775807,
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x5636061b22f8,
    altdest=altdest@entry=0x5636061b22f8, qc=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:786
#17 0x000056360524d2dd in exec_simple_query (query_string=0x5636060e7128 "insert into foo select
generate_series(1,500000);")
    at ./build/../src/backend/tcop/postgres.c:1214
#18 0x000056360524f20f in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffdbe809190, dbname=<optimized out>,
    username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4488
#19 0x00005636051cc178 in BackendRun (port=<optimized out>, port=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:4521
#20 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4243
#21 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1765
#22 0x00005636051cd000 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x5636060e1ea0)
    at ./build/../src/backend/postmaster/postmaster.c:1437
#23 0x0000563604f457f1 in main (argc=5, argv=0x5636060e1ea0) at ./build/../src/backend/main/main.c:199

(gdb) bt
#0  0x0000563604fdb6cf in AdvanceXLInsertBuffer (upto=upto@entry=18446744073709543424,
opportunistic=opportunistic@entry=false)
    at ./build/../src/backend/access/transam/xlog.c:2147
#1  0x0000563604fdb920 in GetXLogBuffer (ptr=18446744073709543424) at
./build/../src/backend/access/transam/xlog.c:1959
#2  0x0000563604fdd1b1 in CopyXLogRecordToWAL (EndPos=18446744073709543480, StartPos=18446744073709543392,
    rdata=0x56360562b1c0 <hdr_rdt>, isLogSwitch=false, write_len=63) at
./build/../src/backend/access/transam/xlog.c:1558
#3  XLogInsertRecord (rdata=rdata@entry=0x56360562b1c0 <hdr_rdt>, fpw_lsn=fpw_lsn@entry=18446744073709543392,
flags=<optimizedout>,
 
    num_fpi=num_fpi@entry=0) at ./build/../src/backend/access/transam/xlog.c:1123
#4  0x0000563604fe774a in XLogInsert (rmid=rmid@entry=10 '\n', info=info@entry=0 '\000')
    at ./build/../src/backend/access/transam/xloginsert.c:480
#5  0x0000563604f87b27 in heap_insert (relation=relation@entry=0x7f710f6d7938, tup=tup@entry=0x5636061bc288,
cid=cid@entry=0,
    options=options@entry=0, bistate=bistate@entry=0x0) at ./build/../src/backend/access/heap/heapam.c:2208
...


Christoph
-- 
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer, Geoff Richardson,
Peter Lilley; Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails