Re: BUG #13822: Slave terminated - WAL contains references to invalid page
От | |
---|---|
Тема | Re: BUG #13822: Slave terminated - WAL contains references to invalid page |
Дата | |
Msg-id | 1451506840099.98963@tieto.com обсуждение исходный текст |
Ответ на | Re: BUG #13822: Slave terminated - WAL contains references to invalid page (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #13822: Slave terminated - WAL contains references to
invalid page
(Michael Paquier <michael.paquier@gmail.com>)
|
Список | pgsql-bugs |
Hi,=0A= =0A= I created new replica (original one crashed 2015-12-15 13:05:39 CET) via: = =0A= =0A= $ pg_basebackup -h <master_IP> -D $PGDATA -R -P -U replication --xlog-metho= d=3Dstream=0A= $ cat $PGDATA/recovery.conf =0A= standby_mode =3D 'on'=0A= primary_conninfo =3D 'user=3Dreplication password=3D*** host=3D<master_IP> = port=3D5432 sslmode=3Dprefer sslcompression=3D1 krbsrvname=3Dpostgres'=0A= # service postgresql-9.4 start=0A= =0A= This new replica crashed 2015-12-22 00:25:11 CET.=0A= =0A= Ordinary base backups are done by:=0A= =0A= /usr/pgsql-9.4/bin/pg_basebackup -h <master_IP> -D <local_pg_basebackup_dir= > -Ft -z -v -Xf -U postgres=0A= =0A= =0A= Running Master pg_controldata:=0A= =0A= pg_control version number: 942=0A= Catalog version number: 201409291=0A= Database system identifier: 6225208935473175762=0A= Database cluster state: in production=0A= pg_control last modified: Wed 30 Dec 2015 08:00:02 PM CET=0A= Latest checkpoint location: 191/2E08DC90=0A= Prior checkpoint location: 191/2E0829E0=0A= Latest checkpoint's REDO location: 191/2E08DC58=0A= Latest checkpoint's REDO WAL file: 00000001000001910000002E=0A= Latest checkpoint's TimeLineID: 1=0A= Latest checkpoint's PrevTimeLineID: 1=0A= Latest checkpoint's full_page_writes: on=0A= Latest checkpoint's NextXID: 0/370154188=0A= Latest checkpoint's NextOID: 4803404=0A= Latest checkpoint's NextMultiXactId: 161565=0A= Latest checkpoint's NextMultiOffset: 410484=0A= Latest checkpoint's oldestXID: 187316348=0A= Latest checkpoint's oldestXID's DB: 16422=0A= Latest checkpoint's oldestActiveXID: 370154188=0A= Latest checkpoint's oldestMultiXid: 1=0A= Latest checkpoint's oldestMulti's DB: 16422=0A= Time of latest checkpoint: Wed 30 Dec 2015 08:00:01 PM CET=0A= Fake LSN counter for unlogged rels: 0/1=0A= Minimum recovery ending location: 0/0=0A= Min recovery ending loc's timeline: 0=0A= Backup start location: 0/0=0A= Backup end location: 0/0=0A= End-of-backup record required: no=0A= Current wal_level setting: hot_standby=0A= Current wal_log_hints setting: off=0A= Current max_connections setting: 600=0A= Current max_worker_processes setting: 8=0A= Current max_prepared_xacts setting: 0=0A= Current max_locks_per_xact setting: 64=0A= Maximum data alignment: 8=0A= Database block size: 8192=0A= Blocks per segment of large relation: 131072=0A= WAL block size: 8192=0A= Bytes per WAL segment: 16777216=0A= Maximum length of identifiers: 64=0A= Maximum columns in an index: 32=0A= Maximum size of a TOAST chunk: 1996=0A= Size of a large-object chunk: 2048=0A= Date/time type storage: 64-bit integers=0A= Float4 argument passing: by value=0A= Float8 argument passing: by value=0A= Data page checksum version: 0=0A= =0A= =0A= Running Slave pg_controldata:=0A= =0A= pg_control version number: 942=0A= Catalog version number: 201409291=0A= Database system identifier: 6225208935473175762=0A= Database cluster state: in archive recovery=0A= pg_control last modified: Wed 30 Dec 2015 08:00:40 PM CET=0A= Latest checkpoint location: 191/2E08DC90=0A= Prior checkpoint location: 191/2E0829E0=0A= Latest checkpoint's REDO location: 191/2E08DC58=0A= Latest checkpoint's REDO WAL file: 00000001000001910000002E=0A= Latest checkpoint's TimeLineID: 1=0A= Latest checkpoint's PrevTimeLineID: 1=0A= Latest checkpoint's full_page_writes: on=0A= Latest checkpoint's NextXID: 0/370154188=0A= Latest checkpoint's NextOID: 4803404=0A= Latest checkpoint's NextMultiXactId: 161565=0A= Latest checkpoint's NextMultiOffset: 410484=0A= Latest checkpoint's oldestXID: 187316348=0A= Latest checkpoint's oldestXID's DB: 16422=0A= Latest checkpoint's oldestActiveXID: 370154188=0A= Latest checkpoint's oldestMultiXid: 1=0A= Latest checkpoint's oldestMulti's DB: 16422=0A= Time of latest checkpoint: Wed 30 Dec 2015 08:00:01 PM CET=0A= Fake LSN counter for unlogged rels: 0/1=0A= Minimum recovery ending location: 191/2E093C68=0A= Min recovery ending loc's timeline: 1=0A= Backup start location: 0/0=0A= Backup end location: 0/0=0A= End-of-backup record required: no=0A= Current wal_level setting: hot_standby=0A= Current wal_log_hints setting: off=0A= Current max_connections setting: 600=0A= Current max_worker_processes setting: 8=0A= Current max_prepared_xacts setting: 0=0A= Current max_locks_per_xact setting: 64=0A= Maximum data alignment: 8=0A= Database block size: 8192=0A= Blocks per segment of large relation: 131072=0A= WAL block size: 8192=0A= Bytes per WAL segment: 16777216=0A= Maximum length of identifiers: 64=0A= Maximum columns in an index: 32=0A= Maximum size of a TOAST chunk: 1996=0A= Size of a large-object chunk: 2048=0A= Date/time type storage: 64-bit integers=0A= Float4 argument passing: by value=0A= Float8 argument passing: by value=0A= Data page checksum version: 0=0A= =0A= =0A= Slave pg_controldata after second PITR attempt:=0A= =0A= pg_control version number: 942=0A= Catalog version number: 201409291=0A= Database system identifier: 6225208935473175762=0A= Database cluster state: shut down in recovery=0A= pg_control last modified: Wed 30 Dec 2015 08:53:51 PM CET=0A= Latest checkpoint location: 187/9DA56658=0A= Prior checkpoint location: 186/D7000060=0A= Latest checkpoint's REDO location: 187/9D647E30=0A= Latest checkpoint's REDO WAL file: 00000001000001870000009D=0A= Latest checkpoint's TimeLineID: 1=0A= Latest checkpoint's PrevTimeLineID: 1=0A= Latest checkpoint's full_page_writes: on=0A= Latest checkpoint's NextXID: 0/367008737=0A= Latest checkpoint's NextOID: 4721484=0A= Latest checkpoint's NextMultiXactId: 160851=0A= Latest checkpoint's NextMultiOffset: 409011=0A= Latest checkpoint's oldestXID: 187316348=0A= Latest checkpoint's oldestXID's DB: 16422=0A= Latest checkpoint's oldestActiveXID: 367008737=0A= Latest checkpoint's oldestMultiXid: 1=0A= Latest checkpoint's oldestMulti's DB: 16422=0A= Time of latest checkpoint: Tue 22 Dec 2015 12:21:40 AM CET=0A= Fake LSN counter for unlogged rels: 0/1=0A= Minimum recovery ending location: 187/9FECB9A0=0A= Min recovery ending loc's timeline: 1=0A= Backup start location: 0/0=0A= Backup end location: 0/0=0A= End-of-backup record required: no=0A= Current wal_level setting: hot_standby=0A= Current wal_log_hints setting: off=0A= Current max_connections setting: 600=0A= Current max_worker_processes setting: 8=0A= Current max_prepared_xacts setting: 0=0A= Current max_locks_per_xact setting: 64=0A= Maximum data alignment: 8=0A= Database block size: 8192=0A= Blocks per segment of large relation: 131072=0A= WAL block size: 8192=0A= Bytes per WAL segment: 16777216=0A= Maximum length of identifiers: 64=0A= Maximum columns in an index: 32=0A= Maximum size of a TOAST chunk: 1996=0A= Size of a large-object chunk: 2048=0A= Date/time type storage: 64-bit integers=0A= Float4 argument passing: by value=0A= Float8 argument passing: by value=0A= Data page checksum version: 0=0A= =0A= =0A= # cat backup_label.old =0A= START WAL LOCATION: 186/D7000028 (file 0000000100000186000000D7)=0A= CHECKPOINT LOCATION: 186/D7000060=0A= BACKUP METHOD: streamed=0A= BACKUP FROM: master=0A= START TIME: 2015-12-21 19:00:03 CET=0A= LABEL: pg_basebackup base backup=0A= =0A= =0A= Slave's pg_controldata above after PITR has been created after second recov= ery attempt.=0A= To be honest I don't understand why this second PITR finished without PANIC= when previous attempt from the same base backup and archived wals failed.= =0A= See logs details below:=0A= =0A= First (crashed) PITR:=0A= =0A= 2015-12-30 08:31:19 CET @ LOG: database system was interrupted; last know= n up at 2015-12-21 20:15:02 CET=0A= 2015-12-30 08:31:19 CET @ LOG: starting point-in-time recovery to 2015-12= -22 00:30:00+01=0A= 2015-12-30 08:31:19 CET @ LOG: database system was not properly shut down= ; automatic recovery in progress=0A= 2015-12-30 08:31:19 CET @ LOG: redo starts at 186/D7260540=0A= 2015-12-30 08:31:19 CET @ LOG: consistent recovery state reached at 186/D= 8000000=0A= 2015-12-30 08:31:19 CET @ LOG: database system is ready to accept read on= ly connections=0A= 2015-12-30 08:31:19 CET @ LOG: restored log file "0000000100000186000000D= 8" from archive=0A= 2015-12-30 08:31:20 CET @ LOG: restored log file "0000000100000186000000D= 9" from archive=0A= /.../=0A= 2015-12-30 08:32:48 CET @ LOG: restored log file "00000001000001870000009= D" from archive=0A= 2015-12-30 08:32:50 CET @ LOG: restored log file "00000001000001870000009= E" from archive=0A= 2015-12-30 08:32:51 CET @ WARNING: page 73172 of relation base/16422/2325= 3 is uninitialized=0A= 2015-12-30 08:32:51 CET @ CONTEXT: xlog redo visible: rel 1663/16422/2325= 3; blk 73172=0A= 2015-12-30 08:32:51 CET @ PANIC: WAL contains references to invalid pages= =0A= 2015-12-30 08:32:51 CET @ CONTEXT: xlog redo visible: rel 1663/16422/2325= 3; blk 73172=0A= 2015-12-30 08:32:51 CET @ LOG: startup process (PID 29324) was terminated= by signal 6: Aborted=0A= 2015-12-30 08:32:51 CET @ LOG: terminating any other active server proces= ses=0A= =0A= For this one unfortunatelly I don't have pg_controldata.=0A= =0A= Second one:=0A= =0A= 2015-12-30 20:42:39 CET @ LOG: database system was interrupted; last know= n up at 2015-12-21 20:15:02 CET=0A= 2015-12-30 20:42:39 CET @ LOG: starting point-in-time recovery to 2015-12= -22 00:30:00+01=0A= 2015-12-30 20:42:39 CET @ LOG: restored log file "0000000100000186000000D= 7" from archive=0A= 2015-12-30 20:42:39 CET @ LOG: redo starts at 186/D7000028=0A= 2015-12-30 20:42:39 CET @ LOG: consistent recovery state reached at 186/D= 7265F80=0A= 2015-12-30 20:42:39 CET @ LOG: database system is ready to accept read on= ly connections=0A= 2015-12-30 20:42:40 CET @ LOG: restored log file "0000000100000186000000D= 8" from archive=0A= 2015-12-30 20:42:40 CET @ LOG: restored log file "0000000100000186000000D= 9" from archive=0A= /.../=0A= 2015-12-30 20:44:03 CET @ LOG: restored log file "00000001000001870000009= D" from archive=0A= 2015-12-30 20:44:05 CET @ LOG: restored log file "00000001000001870000009= E" from archive=0A= 2015-12-30 20:44:08 CET @ LOG: restored log file "00000001000001870000009= F" from archive=0A= 2015-12-30 20:44:10 CET @ LOG: recovery stopping before commit of transac= tion 367008747, time 2015-12-22 00:30:01.876776+01=0A= 2015-12-30 20:44:10 CET @ LOG: recovery has paused=0A= 2015-12-30 20:44:10 CET @ HINT: Execute pg_xlog_replay_resume() to contin= ue.=0A= =0A= Regards=0A= Marek=0A= ________________________________________=0A= From: Andres Freund <andres@anarazel.de>=0A= Sent: Wednesday, December 30, 2015 12:49 PM=0A= To: Petr Marek=0A= Cc: michael.paquier@gmail.com; pgsql-bugs@postgresql.org=0A= Subject: Re: [BUGS] BUG #13822: Slave terminated - WAL contains references = to invalid page=0A= =0A= Hi,=0A= =0A= On 2015-12-29 15:51:32 +0000, Marek.Petr@tieto.com wrote:=0A= > I used fresh base backup for slave after both crashes.=0A= > Also tried to scan archived wals several hours before last crash and foun= d only the following for string 71566:=0A= >=0A= > rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 187/= 987859D8, prev 187/987859A0, bkp: 0000, desc: visible: rel 1663/16422/17216= ; blk 71566=0A= > rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 187/= 9CC59020, prev 187/9CC58FE8, bkp: 0000, desc: visible: rel 1663/16422/17220= ; blk 71566=0A= > rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 187/= 9E356D98, prev 187/9E356D60, bkp: 0000, desc: visible: rel 1663/16422/23253= ; blk 71566=0A= =0A= Could you detail how you take base backups and how you set them up as=0A= new replicas? Master's pg_controldata, the crashed standby's and the=0A= standby's backup_label.old would be helpful.=0A= =0A= Greetings,=0A= =0A= Andres Freund=
В списке pgsql-bugs по дате отправления:
Следующее
От: kou@clear-code.comДата:
Сообщение: BUG #13840: pg_dump generates unloadable SQL when third party string type index option is used