Обсуждение: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

Поиск
Список
Период
Сортировка

Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

От
"Cassiano, Marco"
Дата:

Hello everybody,

 

This weeked both replicas of our main db crashed at the same time with this error :

 

2014-02-09 11:42:51 GMT    0 52c671da.14da - PANIC:  WAL contains references to invalid pages

2014-02-09 11:42:51 GMT    0 52c671da.14da - CONTEXT:  xlog redo vacuum: rel 1663/16433/29449; blk 181466, lastBlockVacuumed 181463

2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  startup process (PID 5338) was terminated by signal 6: Aborted

2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  terminating any other active server processes

 

 

All three servers (main + two replicas) are on v. 9.3.2 running on Centos 6.4

 

We upgraded one month ago the main db from v 9.2.6 to 9.3.2 through pg_upgrade and had the replicas rebuilt on 9.3.2

 

I searched the mailing lists and found someone that had the same problem in the past but it seems that their problem was fixed by already released patches.

 

( see thread http://www.postgresql.org/message-id/675b7cee-b7f0-4e32-8e34-1efaf3ca5fe9@email.android.com)

 

So it seems that our problem is a new one since we are running the latest version…….

 

Thank you for your help

 

 

Marco Cassiano

Manifatture del Nord srl unipersonale
Gruppo MaxMara

via Mazzacurati 6

C.P. n° 20 - San Maurizio

42122 Reggio Emilia RE
ITALY

Tel. +39 0522 358215
Fax +39 0522 268715
email : mcassiano@manord.com

 

cid:image001.jpg@01CED0CA.EA27C850

 

Max & Co. logo

 

---------------------------------------------------------------------------------------------

Il contenuto della presente comunicazione è riservato e destinato esclusivamente ai destinatari indicati. Nel caso in cui sia ricevuto da persona diversa dal destinatario sono proibite la diffusione, la distribuzione e la copia. Nel caso riceveste la presente per errore, Vi preghiamo di informarci e di distruggerlo e/o cancellarlo dal Vostro computer, senza utilizzare i dati contenuti.

La presente comunicazione (comprensiva dei documenti allegati) non avrà valore di proposta contrattuale e/o accettazione di proposte provenienti dal destinatario, nè rinuncia o riconoscimento di diritti, debiti e/o crediti, nè sarà impegnativa, qualora non sia sottoscritto successivo accordo da chi può validamente obbligarci. Non deriverà alcuna responsabilità precontrattuale a ns. carico, se la presente non sia seguita da contratto sottoscritto dalle parti.

---------------------------------------------------------------------------------------------

The content of the above communication is strictly confidential and reserved solely for the referred addressees. In the event of receipt by persons different from the addressee, copying, alteration and distribution are forbidden. If received by mistake we ask you to inform us and to destroy and/or delete from your computer without using the data herein contained. The present message (eventual annexes inclusive) shall not be considered a contractual proposal and/or acceptance of offer from the addressee, nor waiver recognizance of rights, debts and/or credits, nor shall it be binding when not executed as a subsequent agreement by persons who could lawfully represent us. No pre-contractual liability shall apply to us when the present communication is not followed by any binding agreement between the parties.

---------------------------------------------------------------------------------------------

 

Вложения

Re: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

От
desmodemone
Дата:



2014-02-10 9:30 GMT+01:00 Cassiano, Marco <mcassiano@manord.com>:

Hello everybody,

 

This weeked both replicas of our main db crashed at the same time with this error :

 

2014-02-09 11:42:51 GMT    0 52c671da.14da - PANIC:  WAL contains references to invalid pages

2014-02-09 11:42:51 GMT    0 52c671da.14da - CONTEXT:  xlog redo vacuum: rel 1663/16433/29449; blk 181466, lastBlockVacuumed 181463

2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  startup process (PID 5338) was terminated by signal 6: Aborted

2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  terminating any other active server processes

 

 

All three servers (main + two replicas) are on v. 9.3.2 running on Centos 6.4

 

We upgraded one month ago the main db from v 9.2.6 to 9.3.2 through pg_upgrade and had the replicas rebuilt on 9.3.2

 

I searched the mailing lists and found someone that had the same problem in the past but it seems that their problem was fixed by already released patches.

 

( see thread http://www.postgresql.org/message-id/675b7cee-b7f0-4e32-8e34-1efaf3ca5fe9@email.android.com)

 

So it seems that our problem is a new one since we are running the latest version…….

 

Thank you for your help

 

 

Marco Cassiano

Manifatture del Nord srl unipersonale
Gruppo MaxMara

via Mazzacurati 6

C.P. n° 20 - San Maurizio

42122 Reggio Emilia RE
ITALY

Tel. +39 0522 358215
Fax +39 0522 268715
email : mcassiano@manord.com

 

cid:image001.jpg@01CED0CA.EA27C850

 

Max & Co. logo

 

---------------------------------------------------------------------------------------------

Il contenuto della presente comunicazione è riservato e destinato esclusivamente ai destinatari indicati. Nel caso in cui sia ricevuto da persona diversa dal destinatario sono proibite la diffusione, la distribuzione e la copia. Nel caso riceveste la presente per errore, Vi preghiamo di informarci e di distruggerlo e/o cancellarlo dal Vostro computer, senza utilizzare i dati contenuti.

La presente comunicazione (comprensiva dei documenti allegati) non avrà valore di proposta contrattuale e/o accettazione di proposte provenienti dal destinatario, nè rinuncia o riconoscimento di diritti, debiti e/o crediti, nè sarà impegnativa, qualora non sia sottoscritto successivo accordo da chi può validamente obbligarci. Non deriverà alcuna responsabilità precontrattuale a ns. carico, se la presente non sia seguita da contratto sottoscritto dalle parti.

---------------------------------------------------------------------------------------------

The content of the above communication is strictly confidential and reserved solely for the referred addressees. In the event of receipt by persons different from the addressee, copying, alteration and distribution are forbidden. If received by mistake we ask you to inform us and to destroy and/or delete from your computer without using the data herein contained. The present message (eventual annexes inclusive) shall not be considered a contractual proposal and/or acceptance of offer from the addressee, nor waiver recognizance of rights, debts and/or credits, nor shall it be binding when not executed as a subsequent agreement by persons who could lawfully represent us. No pre-contractual liability shall apply to us when the present communication is not followed by any binding agreement between the parties.

---------------------------------------------------------------------------------------------

 



Hello,
           please, could you post some details about?  Your replica are on the same storage  as the primary ? Are virtual or physical ? hypervisor type?

Could you attach / post /var/log/messages and postgres log  ?
Could you attach / post   pg_controldata output of the replica ?
Did you verify the filesystem integrity of the replica ?

Are the log of the primary without errors ?

Thank you very much

Mat
Вложения

R: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

От
"Cassiano, Marco"
Дата:

I resend the mail with gzipped attachment due to mailing list message size limits

--------------------------

 

Thank you Mat,

 

here are the additional infos :

 

1)       All of the three servers (main+2 replicas) are virtual on VMware Esxi 5.0

2)       Each server is on a different storage and on different vmware hosts

3)       The Log of the primary are with no errors

4)       Attached : pg_controldata output, postgres log, and /var/log/messages

5)       Fsck on the colume containing the database folders reports no error :

 

[root@pg64prod_rep /]# umount /dev/sdb1

[root@pg64prod_rep /]# fsck -n /dev/sdb1

fsck from util-linux-ng 2.17.2

e2fsck 1.41.12 (17-May-2010)

/dev/sdb1 has gone 193 days without being checked, check forced.

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

/dev/sdb1: 7886/13107200 files (8.3% non-contiguous), 38818825/52428119 blocks

 

Marco

 

Da: desmodemone [mailto:desmodemone@gmail.com]
Inviato: lunedì 10 febbraio 2014 10:30
A: Cassiano, Marco
Cc: pgsql-admin@postgresql.org
Oggetto: Re: [ADMIN] Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

 

 

 

2014-02-10 9:30 GMT+01:00 Cassiano, Marco <mcassiano@manord.com>:

 

Hello,

           please, could you post some details about?  Your replica are on the same storage  as the primary ? Are virtual or physical ? hypervisor type?

Could you attach / post /var/log/messages and postgres log  ?

Could you attach / post   pg_controldata output of the replica ?

Did you verify the filesystem integrity of the replica ?

 

Are the log of the primary without errors ?

 

Thank you very much

Mat

Вложения

R: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

От
"Cassiano, Marco"
Дата:

Thank you Mat,

 

here are the additional infos :

 

1)       All of the three servers (main+2 replicas) are virtual on VMware Esxi 5.0

2)       Each server is on a different storage and on different vmware hosts

3)       The Log of the primary are with no errors

4)       Attached : pg_controldata output, postgres log, and /var/log/messages

5)       Fsck on the colume containing the database folders reports no error :

 

[root@pg64prod_rep /]# umount /dev/sdb1

[root@pg64prod_rep /]# fsck -n /dev/sdb1

fsck from util-linux-ng 2.17.2

e2fsck 1.41.12 (17-May-2010)

/dev/sdb1 has gone 193 days without being checked, check forced.

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

/dev/sdb1: 7886/13107200 files (8.3% non-contiguous), 38818825/52428119 blocks

 

Marco

 

Da: desmodemone [mailto:desmodemone@gmail.com]
Inviato: lunedì 10 febbraio 2014 10:30
A: Cassiano, Marco
Cc: pgsql-admin@postgresql.org
Oggetto: Re: [ADMIN] Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

 

 

 

2014-02-10 9:30 GMT+01:00 Cassiano, Marco <mcassiano@manord.com>:

 

Hello,

           please, could you post some details about?  Your replica are on the same storage  as the primary ? Are virtual or physical ? hypervisor type?

Could you attach / post /var/log/messages and postgres log  ?

Could you attach / post   pg_controldata output of the replica ?

Did you verify the filesystem integrity of the replica ?

 

Are the log of the primary without errors ?

 

Thank you very much

Mat

Вложения

Re: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

От
desmodemone
Дата:

               


2014-02-10 12:29 GMT+01:00 Cassiano, Marco <mcassiano@manord.com>:

I resend the mail with gzipped attachment due to mailing list message size limits

--------------------------

 

Thank you Mat,

 

here are the additional infos :

 

1)       All of the three servers (main+2 replicas) are virtual on VMware Esxi 5.0

2)       Each server is on a different storage and on different vmware hosts

3)       The Log of the primary are with no errors

4)       Attached : pg_controldata output, postgres log, and /var/log/messages

5)       Fsck on the colume containing the database folders reports no error :

 

[root@pg64prod_rep /]# umount /dev/sdb1

[root@pg64prod_rep /]# fsck -n /dev/sdb1

fsck from util-linux-ng 2.17.2

e2fsck 1.41.12 (17-May-2010)

/dev/sdb1 has gone 193 days without being checked, check forced.

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

/dev/sdb1: 7886/13107200 files (8.3% non-contiguous), 38818825/52428119 blocks

 

Marco

 

Da: desmodemone [mailto:desmodemone@gmail.com]

Inviato: lunedì 10 febbraio 2014 10:30
A: Cassiano, Marco
Cc: pgsql-admin@postgresql.org
Oggetto: Re: [ADMIN] Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

 

 

 

2014-02-10 9:30 GMT+01:00 Cassiano, Marco <mcassiano@manord.com>:

 

Hello,

           please, could you post some details about?  Your replica are on the same storage  as the primary ? Are virtual or physical ? hypervisor type?

Could you attach / post /var/log/messages and postgres log  ?

Could you attach / post   pg_controldata output of the replica ?

Did you verify the filesystem integrity of the replica ?

 

Are the log of the primary without errors ?

 

Thank you very much

Mat

 
Hi Marco,

Thanks for your log files about the problem and sorry for late answer,  in the source code of xlogutils.c I see that more log information are logged at DEBUG2 level of

log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2

So I  think will be a good idea to modify your postgresql.conf of one standby at set the log_min_messages= DEBUG2 , so the code will invoke the function report_invalid_page and more information
will be reported about the invalid page of the relation on which investigate.


Bye

Mat