Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary
От | hubert depesz lubaczewski |
---|---|
Тема | Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary |
Дата | |
Msg-id | YzZ1ngq++Tazursx@depesz.com обсуждение исходный текст |
Ответ на | Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-bugs |
On Fri, Sep 30, 2022 at 09:49:28AM +0900, Michael Paquier wrote: > On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote: > > What am I missing? what is wrong? How can I fix it? The problem is not fixing > > *this server*, because we are in process of upgrading LOTS and LOTS of servers, > > and I need to know what is broken/how to work around it. > So, your backup_label file points to 00000001000007E4000000A0 as being > the first segment to begin recovery from. This means that the startup Well, *now* I don't have backup_label: root@c273-pg2258:/var/lib/postgresql/14/main# ls -l total 68520 -rw------- 1 postgres postgres 3 Sep 28 01:38 PG_VERSION -rw------- 1 postgres postgres 61264030 Sep 30 04:42 autoprewarm.blocks -rw------- 1 postgres postgres 231 Sep 28 01:38 backup_label.old -rw------- 1 postgres postgres 8783060 Sep 28 01:38 backup_manifest drwx------ 6 postgres postgres 74 Sep 28 01:38 base -rw------- 1 postgres postgres 61 Sep 30 04:00 current_logfiles drwx------ 2 postgres postgres 4096 Sep 29 14:25 global drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_commit_ts drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_dynshmem drwx------ 4 postgres postgres 84 Sep 28 03:38 pg_logical drwx------ 4 postgres postgres 48 Sep 28 01:38 pg_multixact drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_notify drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_replslot drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_serial drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_snapshots drwx------ 2 postgres postgres 6 Sep 29 14:25 pg_stat drwx------ 2 postgres postgres 43 Sep 29 14:25 pg_stat_tmp drwx------ 2 postgres postgres 122 Sep 29 14:25 pg_subtrans drwx------ 2 postgres postgres 27 Sep 28 01:38 pg_tblspc drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_twophase drwx------ 3 postgres postgres 204800 Sep 28 03:38 pg_wal drwx------ 2 postgres postgres 4096 Sep 28 01:38 pg_xact -rw------- 1 postgres postgres 118 Sep 28 01:38 postgresql.auto.conf -rw------- 1 postgres postgres 130 Sep 29 14:25 postmaster.opts -rw------- 1 postgres postgres 103 Sep 29 14:25 postmaster.pid -rw-r--r-- 1 postgres postgres 0 Sep 28 01:38 standby.signal backup_label.old is: root@c273-pg2258:/var/lib/postgresql/14/main# cat backup_label.old START WAL LOCATION: 7D5/3B068338 (file 00000001000007D50000003B) CHECKPOINT LOCATION: 7D5/3B076F78 BACKUP METHOD: streamed BACKUP FROM: primary START TIME: 2022-09-27 21:06:00 UTC LABEL: pg_basebackup base backup START TIMELINE: 1 > process is able to correctly kick recovery using the contents of the > local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066. > Once 00000001000007E800000067 cannot be read, the startup process > fails to read this segment and decides to switch the source to get WAL > from as streaming. The order should be local pg_wal/, then archiving > and finally streaming if all are configured, as far as I recall. > What's the error you are seeing when reading 00000001000007E800000067? I don't see any errors. Let me show you some data from log: 2022-09-30 04:44:43.833 UTC,"postgres","postgres",2716081,"[local]",6336743b.2971b1,3,"authentication",2022-09-30 04:44:43UTC,4/14269,0,LOG,00000,"connection authorized: user=postgres database=postgres application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 2022-09-30 04:44:43.917 UTC,,,2716082,,6336743b.2971b2,1,,2022-09-30 04:44:43 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:43.918 UTC,,,2716082,,6336743b.2971b2,2,,2022-09-30 04:44:43 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,1,,2022-09-30 04:44:48 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,2,,2022-09-30 04:44:48 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:51.867 UTC,,,2716207,"[local]",63367443.29722f,1,"",2022-09-30 04:44:51 UTC,,0,LOG,00000,"connection received:host=[local]",,,,,,,,,"","not initialized",,0 2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,2,"authentication",2022-09-30 04:44:51 UTC,4/14271,0,LOG,00000,"connectionauthenticated: identity=""postgres"" method=peer (/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,-659920678270393094 2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,3,"authentication",2022-09-30 04:44:51 UTC,4/14271,0,LOG,00000,"connectionauthorized: user=postgres database=db application_name=psql",,,,,,,,,"","client backend",,-659920678270393094 2022-09-30 04:44:53.751 UTC,,,2716218,"127.0.0.1:39078",63367445.29723a,1,"",2022-09-30 04:44:53 UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=39078",,,,,,,,,"","not initialized",,0 2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,1,,2022-09-30 04:44:53 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,2,,2022-09-30 04:44:53 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,1,,2022-09-30 04:44:58 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,2,,2022-09-30 04:44:58 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,1,,2022-09-30 04:45:03 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,2,,2022-09-30 04:45:03 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:08.751 UTC,,,2716401,"127.0.0.1:45926",63367454.2972f1,1,"",2022-09-30 04:45:08 UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=45926",,,,,,,,,"","not initialized",,0 2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,1,,2022-09-30 04:45:08 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,2,,2022-09-30 04:45:08 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:12.960 UTC,,,2716425,"[local]",63367458.297309,1,"",2022-09-30 04:45:12 UTC,,0,LOG,00000,"connection received:host=[local]",,,,,,,,,"","not initialized",,0 2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,2,"authentication",2022-09-30 04:45:12UTC,4/14277,0,LOG,00000,"connection authenticated: identity=""postgres"" method=peer (/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,1147616880456321454 2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,3,"authentication",2022-09-30 04:45:12UTC,4/14277,0,LOG,00000,"connection authorized: user=postgres database=postgres application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,1,,2022-09-30 04:45:13 UTC,,0,LOG,00000,"started streaming WAL fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,2,,2022-09-30 04:45:13 UTC,,0,FATAL,08P01,"could not receive datafrom WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0 > Is pg_waldump able to understand its internals? This could point out > to an issue with xlogreader.c, for example. So... it can, but no records: postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000067 pg_waldump: fatal: could not find a valid record after 7E8/67000000 postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000068 pg_waldump: fatal: could not find a valid record after 7E8/68000000 but previous one works: postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000066 | tail -n 20 rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005495, lsn: 7E8/66A0B010, prev 7E8/66A0AFC8, desc: COMMIT 2022-09-2801:38:20.782352 UTC; origin: node 1, lsn CE64/A5086920, at 2022-09-28 01:38:20.780890 UTC rmgr: Heap len (rec/tot): 54/ 54, tx: 140005496, lsn: 7E8/66A0B058, prev 7E8/66A0B010, desc: LOCK off 55:xid 140005496: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195315099 blk 83995 rmgr: Heap len (rec/tot): 165/ 165, tx: 140005496, lsn: 7E8/66A0B090, prev 7E8/66A0B058, desc: HOT_UPDATE off55 xmax 140005496 flags 0x10 KEYS_UPDATED ; new off 74 xmax 140005496, blkref #0: rel 17963/16644/195315099 blk 83995 rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005496, lsn: 7E8/66A0B138, prev 7E8/66A0B090, desc: COMMIT 2022-09-2801:38:20.783036 UTC; origin: node 3, lsn CE64/A5087E00, at 2022-09-28 01:38:20.780992 UTC rmgr: XLOG len (rec/tot): 51/ 980, tx: 0, lsn: 7E8/66A0B180, prev 7E8/66A0B138, desc: FPI_FOR_HINT, blkref #0: rel 17963/16644/195326137 blk 271085 FPW rmgr: Heap len (rec/tot): 54/ 54, tx: 140005497, lsn: 7E8/66A0B558, prev 7E8/66A0B180, desc: LOCK off 13:xid 140005497: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195326137 blk 271085 rmgr: Heap len (rec/tot): 261/ 261, tx: 140005497, lsn: 7E8/66A0B590, prev 7E8/66A0B558, desc: HOT_UPDATE off13 xmax 140005497 flags 0x10 KEYS_UPDATED ; new off 15 xmax 140005497, blkref #0: rel 17963/16644/195326137 blk 271085 rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005497, lsn: 7E8/66A0B698, prev 7E8/66A0B590, desc: COMMIT 2022-09-2801:38:20.784762 UTC; origin: node 1, lsn CE64/A508C7F0, at 2022-09-28 01:38:20.782858 UTC rmgr: Heap len (rec/tot): 54/ 54, tx: 140005498, lsn: 7E8/66A0B6E0, prev 7E8/66A0B698, desc: LOCK off 8: xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 rmgr: Heap len (rec/tot): 54/ 54, tx: 140005498, lsn: 7E8/66A0B718, prev 7E8/66A0B6E0, desc: LOCK off 8: xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 rmgr: Heap len (rec/tot): 1821/ 1821, tx: 140005498, lsn: 7E8/66A0B750, prev 7E8/66A0B718, desc: UPDATE off 8xmax 140005498 flags 0x10 KEYS_UPDATED ; new off 5 xmax 140005498, blkref #0: rel 17963/16644/195328502 blk 3038837, blkref#1: rel 17963/16644/195328502 blk 3038500 rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BE70, prev 7E8/66A0B750, desc: INSERT_LEAF off158, blkref #0: rel 17963/16644/195349680 blk 78396 rmgr: Btree len (rec/tot): 72/ 72, tx: 140005498, lsn: 7E8/66A0BEB0, prev 7E8/66A0BE70, desc: INSERT_LEAF off72, blkref #0: rel 17963/16644/419942198 blk 82241 rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BEF8, prev 7E8/66A0BEB0, desc: INSERT_LEAF off268, blkref #0: rel 17963/16644/419946394 blk 18486 rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BF38, prev 7E8/66A0BEF8, desc: INSERT_LEAF off69, blkref #0: rel 17963/16644/419946856 blk 56634 rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BF78, prev 7E8/66A0BF38, desc: INSERT_LEAF off98, blkref #0: rel 17963/16644/419947709 blk 19044 rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BFB8, prev 7E8/66A0BF78, desc: INSERT_LEAF off53, blkref #0: rel 17963/16644/419948176 blk 10715 rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005498, lsn: 7E8/66A0BFF8, prev 7E8/66A0BFB8, desc: COMMIT 2022-09-2801:38:20.785822 UTC; origin: node 1, lsn CE64/A5091128, at 2022-09-28 01:38:20.784288 UTC rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 7E8/66A0C058, prev 7E8/66A0BFF8, desc: BACKUP_END 7D5/3B068338 rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 7E8/66A0C080, prev 7E8/66A0C058, desc: SWITCH > Another thing that has changed in this area is related to continuation > records. What are the WAL contents at the end of > 00000001000007E800000066? Do you spot an OVERWRITE_CONTRECORD when > using pg_waldump on it? No. There is no such thing in pg_waldump on *66 file. If I understand correctly - this would suggest that the .66 file is the end of wal taken from pg_basebackup. What are the 67 and next files then? Best regards, depesz
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary