Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

Поиск
Список
Период
Сортировка
От hubert depesz lubaczewski
Тема Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary
Дата
Msg-id YzZ1ngq++Tazursx@depesz.com
обсуждение исходный текст
Ответ на Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
On Fri, Sep 30, 2022 at 09:49:28AM +0900, Michael Paquier wrote:
> On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote:
> > What am I missing? what is wrong? How can I fix it? The problem is not fixing
> > *this server*, because we are in process of upgrading LOTS and LOTS of servers,
> > and I need to know what is broken/how to work around it.
> So, your backup_label file points to 00000001000007E4000000A0 as being
> the first segment to begin recovery from.  This means that the startup

Well, *now* I don't have backup_label:

root@c273-pg2258:/var/lib/postgresql/14/main# ls -l
total 68520
-rw------- 1 postgres postgres        3 Sep 28 01:38 PG_VERSION
-rw------- 1 postgres postgres 61264030 Sep 30 04:42 autoprewarm.blocks
-rw------- 1 postgres postgres      231 Sep 28 01:38 backup_label.old
-rw------- 1 postgres postgres  8783060 Sep 28 01:38 backup_manifest
drwx------ 6 postgres postgres       74 Sep 28 01:38 base
-rw------- 1 postgres postgres       61 Sep 30 04:00 current_logfiles
drwx------ 2 postgres postgres     4096 Sep 29 14:25 global
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_commit_ts
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_dynshmem
drwx------ 4 postgres postgres       84 Sep 28 03:38 pg_logical
drwx------ 4 postgres postgres       48 Sep 28 01:38 pg_multixact
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_notify
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_replslot
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_serial
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_snapshots
drwx------ 2 postgres postgres        6 Sep 29 14:25 pg_stat
drwx------ 2 postgres postgres       43 Sep 29 14:25 pg_stat_tmp
drwx------ 2 postgres postgres      122 Sep 29 14:25 pg_subtrans
drwx------ 2 postgres postgres       27 Sep 28 01:38 pg_tblspc
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_twophase
drwx------ 3 postgres postgres   204800 Sep 28 03:38 pg_wal
drwx------ 2 postgres postgres     4096 Sep 28 01:38 pg_xact
-rw------- 1 postgres postgres      118 Sep 28 01:38 postgresql.auto.conf
-rw------- 1 postgres postgres      130 Sep 29 14:25 postmaster.opts
-rw------- 1 postgres postgres      103 Sep 29 14:25 postmaster.pid
-rw-r--r-- 1 postgres postgres        0 Sep 28 01:38 standby.signal

backup_label.old is:
root@c273-pg2258:/var/lib/postgresql/14/main# cat backup_label.old
START WAL LOCATION: 7D5/3B068338 (file 00000001000007D50000003B)
CHECKPOINT LOCATION: 7D5/3B076F78
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2022-09-27 21:06:00 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

> process is able to correctly kick recovery using the contents of the
> local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066.
> Once 00000001000007E800000067 cannot be read, the startup process
> fails to read this segment and decides to switch the source to get WAL
> from as streaming.  The order should be local pg_wal/, then archiving
> and finally streaming if all are configured, as far as I recall.
> What's the error you are seeing when reading 00000001000007E800000067?

I don't see any errors. Let me show you some data from log:
2022-09-30 04:44:43.833 UTC,"postgres","postgres",2716081,"[local]",6336743b.2971b1,3,"authentication",2022-09-30
04:44:43UTC,4/14269,0,LOG,00000,"connection authorized: user=postgres database=postgres
application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:44:43.917 UTC,,,2716082,,6336743b.2971b2,1,,2022-09-30 04:44:43 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:43.918 UTC,,,2716082,,6336743b.2971b2,2,,2022-09-30 04:44:43 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,1,,2022-09-30 04:44:48 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,2,,2022-09-30 04:44:48 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:51.867 UTC,,,2716207,"[local]",63367443.29722f,1,"",2022-09-30 04:44:51 UTC,,0,LOG,00000,"connection
received:host=[local]",,,,,,,,,"","not initialized",,0 
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,2,"authentication",2022-09-30 04:44:51
UTC,4/14271,0,LOG,00000,"connectionauthenticated: identity=""postgres"" method=peer
(/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,-659920678270393094 
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,3,"authentication",2022-09-30 04:44:51
UTC,4/14271,0,LOG,00000,"connectionauthorized: user=postgres database=db application_name=psql",,,,,,,,,"","client
backend",,-659920678270393094
2022-09-30 04:44:53.751 UTC,,,2716218,"127.0.0.1:39078",63367445.29723a,1,"",2022-09-30 04:44:53
UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=39078",,,,,,,,,"","not initialized",,0 
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,1,,2022-09-30 04:44:53 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,2,,2022-09-30 04:44:53 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,1,,2022-09-30 04:44:58 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,2,,2022-09-30 04:44:58 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,1,,2022-09-30 04:45:03 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,2,,2022-09-30 04:45:03 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:08.751 UTC,,,2716401,"127.0.0.1:45926",63367454.2972f1,1,"",2022-09-30 04:45:08
UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=45926",,,,,,,,,"","not initialized",,0 
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,1,,2022-09-30 04:45:08 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,2,,2022-09-30 04:45:08 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:12.960 UTC,,,2716425,"[local]",63367458.297309,1,"",2022-09-30 04:45:12 UTC,,0,LOG,00000,"connection
received:host=[local]",,,,,,,,,"","not initialized",,0 
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,2,"authentication",2022-09-30
04:45:12UTC,4/14277,0,LOG,00000,"connection authenticated: identity=""postgres"" method=peer
(/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,3,"authentication",2022-09-30
04:45:12UTC,4/14277,0,LOG,00000,"connection authorized: user=postgres database=postgres
application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,1,,2022-09-30 04:45:13 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,2,,2022-09-30 04:45:13 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0

> Is pg_waldump able to understand its internals?  This could point out
> to an issue with xlogreader.c, for example.

So... it can, but no records:

postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000067
pg_waldump: fatal: could not find a valid record after 7E8/67000000
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000068
pg_waldump: fatal: could not find a valid record after 7E8/68000000

but previous one works:
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000066 | tail -n 20
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005495, lsn: 7E8/66A0B010, prev 7E8/66A0AFC8, desc: COMMIT
2022-09-2801:38:20.782352 UTC; origin: node 1, lsn CE64/A5086920, at 2022-09-28 01:38:20.780890 UTC 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005496, lsn: 7E8/66A0B058, prev 7E8/66A0B010, desc: LOCK off
55:xid 140005496: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195315099 blk 83995 
rmgr: Heap        len (rec/tot):    165/   165, tx:  140005496, lsn: 7E8/66A0B090, prev 7E8/66A0B058, desc: HOT_UPDATE
off55 xmax 140005496 flags 0x10 KEYS_UPDATED ; new off 74 xmax 140005496, blkref #0: rel 17963/16644/195315099 blk
83995
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005496, lsn: 7E8/66A0B138, prev 7E8/66A0B090, desc: COMMIT
2022-09-2801:38:20.783036 UTC; origin: node 3, lsn CE64/A5087E00, at 2022-09-28 01:38:20.780992 UTC 
rmgr: XLOG        len (rec/tot):     51/   980, tx:          0, lsn: 7E8/66A0B180, prev 7E8/66A0B138, desc:
FPI_FOR_HINT, blkref #0: rel 17963/16644/195326137 blk 271085 FPW 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005497, lsn: 7E8/66A0B558, prev 7E8/66A0B180, desc: LOCK off
13:xid 140005497: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195326137 blk 271085 
rmgr: Heap        len (rec/tot):    261/   261, tx:  140005497, lsn: 7E8/66A0B590, prev 7E8/66A0B558, desc: HOT_UPDATE
off13 xmax 140005497 flags 0x10 KEYS_UPDATED ; new off 15 xmax 140005497, blkref #0: rel 17963/16644/195326137 blk
271085
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005497, lsn: 7E8/66A0B698, prev 7E8/66A0B590, desc: COMMIT
2022-09-2801:38:20.784762 UTC; origin: node 1, lsn CE64/A508C7F0, at 2022-09-28 01:38:20.782858 UTC 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005498, lsn: 7E8/66A0B6E0, prev 7E8/66A0B698, desc: LOCK off 8:
xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005498, lsn: 7E8/66A0B718, prev 7E8/66A0B6E0, desc: LOCK off 8:
xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 
rmgr: Heap        len (rec/tot):   1821/  1821, tx:  140005498, lsn: 7E8/66A0B750, prev 7E8/66A0B718, desc: UPDATE off
8xmax 140005498 flags 0x10 KEYS_UPDATED ; new off 5 xmax 140005498, blkref #0: rel 17963/16644/195328502 blk 3038837,
blkref#1: rel 17963/16644/195328502 blk 3038500 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BE70, prev 7E8/66A0B750, desc: INSERT_LEAF
off158, blkref #0: rel 17963/16644/195349680 blk 78396 
rmgr: Btree       len (rec/tot):     72/    72, tx:  140005498, lsn: 7E8/66A0BEB0, prev 7E8/66A0BE70, desc: INSERT_LEAF
off72, blkref #0: rel 17963/16644/419942198 blk 82241 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BEF8, prev 7E8/66A0BEB0, desc: INSERT_LEAF
off268, blkref #0: rel 17963/16644/419946394 blk 18486 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BF38, prev 7E8/66A0BEF8, desc: INSERT_LEAF
off69, blkref #0: rel 17963/16644/419946856 blk 56634 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BF78, prev 7E8/66A0BF38, desc: INSERT_LEAF
off98, blkref #0: rel 17963/16644/419947709 blk 19044 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BFB8, prev 7E8/66A0BF78, desc: INSERT_LEAF
off53, blkref #0: rel 17963/16644/419948176 blk 10715 
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005498, lsn: 7E8/66A0BFF8, prev 7E8/66A0BFB8, desc: COMMIT
2022-09-2801:38:20.785822 UTC; origin: node 1, lsn CE64/A5091128, at 2022-09-28 01:38:20.784288 UTC 
rmgr: XLOG        len (rec/tot):     34/    34, tx:          0, lsn: 7E8/66A0C058, prev 7E8/66A0BFF8, desc: BACKUP_END
7D5/3B068338
rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 7E8/66A0C080, prev 7E8/66A0C058, desc: SWITCH


> Another thing that has changed in this area is related to continuation
> records.  What are the WAL contents at the end of
> 00000001000007E800000066?  Do you spot an OVERWRITE_CONTRECORD when
> using pg_waldump on it?

No. There is no such thing in pg_waldump on *66 file.

If I understand correctly - this would suggest that the .66 file is the end of
wal taken from pg_basebackup. What are the 67 and next files then?

Best regards,

depesz




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary
Следующее
От: Frank van Vugt
Дата:
Сообщение: v15rc1 shadowbuild fails when using src from symlink