Обсуждение: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

Поиск
Список
Период
Сортировка

Postgresql 9.3.4 Streaming Replication Standby invalid Page block

От
"Burgess, Freddie"
Дата:
PostgreSQL version: 9.3.4
 Operating system:   rhel 6.4 linux
 Action: stream replication Master/Slave
 Description:

Last entries in the PostgreSQL log file before the standby crashed, the pri=
mary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_2013=
06121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlock=
Vacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file ha=
s continued beyond this point, but is are standby server corrupted?

thanks

Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

От
Heikki Linnakangas
Дата:
On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
>   PostgreSQL version: 9.3.4
>   Operating system:   rhel 6.4 linux
>   Action: stream replication Master/Slave
>   Description:
>
> Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
>
> LOG: restored log file "0000000100001127000000cc" from archive
> FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> LOG: startup process (PID 27797) exited with exit code 1
> LOG: terminating any other active server processes
>
> We did re-started the database and the process of restoring the log file has continued beyond this point, but is are
standbyserver corrupted? 

Sounds exactly like this bug:

http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4
in the standby too?

- Heikki

Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

От
Andres Freund
Дата:
On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
> >  PostgreSQL version: 9.3.4
> >  Operating system:   rhel 6.4 linux
> >  Action: stream replication Master/Slave
> >  Description:
> >
> >Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
> >
> >LOG: restored log file "0000000100001127000000cc" from archive
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> >LOG: startup process (PID 27797) exited with exit code 1
> >LOG: terminating any other active server processes
> >
> >We did re-started the database and the process of restoring the log file has continued beyond this point, but is are
standbyserver corrupted? 

Do you run with data checksums enabled?

> Sounds exactly like this bug:
>
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com
>
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
> the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

От
"Burgess, Freddie"
Дата:
show data_checksums;=0A=
 data_checksums =0A=
----------------=0A=
 off=0A=
=0A=
tabsdb=3D# select version();=0A=
                                                                           =
                   version=0A=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----------=0A=
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 2=
0120313 (Red Hat 4.4.7-4). 64-bit=0A=
=0A=
On both Master/Standby=0A=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------=0A=
The standby replayed all of the outstanding WAL logs overnight and we have =
caught up with the primary database now, and streaming replication is runni=
ng fine now.=0A=
=0A=
The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a=
 Partition tablespace with data from the year 2007. I verified that the row=
 counts match up between the master/slave on the tables that reside on that=
 tablespace.=0A=
=0A=
Is there anything else I can do to verify the consistency on the standby?=
=0A=
=0A=
thanks=0A=
=0A=
________________________________________=0A=
From: Andres Freund [andres@2ndquadrant.com]=0A=
Sent: Wednesday, July 02, 2014 7:09 AM=0A=
To: Heikki Linnakangas=0A=
Cc: Burgess, Freddie; "PostgreSQL Bugs =FD[pgsql-bugs@postgresql.org]=FD"=
=0A=
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid =
Page block=0A=
=0A=
On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:=0A=
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:=0A=
> >  PostgreSQL version: 9.3.4=0A=
> >  Operating system:   rhel 6.4 linux=0A=
> >  Action: stream replication Master/Slave=0A=
> >  Description:=0A=
> >=0A=
> >Last entries in the PostgreSQL log file before the standby crashed, the =
primary seems unaffected=0A=
> >=0A=
> >LOG: restored log file "0000000100001127000000cc" from archive=0A=
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_2=
01306121/16444/125127698=0A=
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBl=
ockVacuumed 0=0A=
> >LOG: startup process (PID 27797) exited with exit code 1=0A=
> >LOG: terminating any other active server processes=0A=
> >=0A=
> >We did re-started the database and the process of restoring the log file=
 has continued beyond this point, but is are standby server corrupted?=0A=
=0A=
Do you run with data checksums enabled?=0A=
=0A=
> Sounds exactly like this bug:=0A=
>=0A=
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsg=
VWf8vn4=3DjXe6V4R7Hxmg@mail.gmail.com=0A=
>=0A=
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in=
=0A=
> the standby too?=0A=
=0A=
Hm - that bug was about uninitialized pages, not invalid ones. I don't=0A=
immediately see why it'd be legal to have a invalid page (as in=0A=
!PageIsVerified()) somewhere? At least not after reaching consistency.=0A=
=0A=
Greetings,=0A=
=0A=
Andres Freund=0A=
=0A=
--=0A=
 Andres Freund                     http://www.2ndQuadrant.com/=0A=
 PostgreSQL Development, 24x7 Support, Training & Services=0A=

Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

От
"Burgess, Freddie"
Дата:
Today, we have the same error in the logs, but now the standby server will =
not re-start at all. This error is referring to a static partition holding =
historical data from 2006, so the problem has to be related to autovaccum=
=0A=
=0A=
FATAL: invalid page in block 420538 of relation pg_tblspc/16434/PG_9.3_2013=
06121/16444/125127662=0A=
CONTEXT: xlog redo vacuum: rel 16434/16444/125127662; blk 582590, lastBlock=
Vacuumed 0=0A=
LOG: startup process (PID 14307) exited with exit code 1=0A=
LOG: terminating any other active server processes=0A=
=0A=
Are there any solutions?=0A=
=0A=
thanks=0A=
________________________________________=0A=
From: pgsql-bugs-owner@postgresql.org [pgsql-bugs-owner@postgresql.org] on =
behalf of Burgess, Freddie [FBurgess@Radiantblue.com]=0A=
Sent: Wednesday, July 02, 2014 4:04 PM=0A=
To: Andres Freund; Heikki Linnakangas=0A=
Cc: "PostgreSQL Bugs =FD[pgsql-bugs@postgresql.org]=FD"=0A=
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid =
Page block=0A=
=0A=
show data_checksums;=0A=
 data_checksums=0A=
----------------=0A=
 off=0A=
=0A=
tabsdb=3D# select version();=0A=
                                                                           =
                   version=0A=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----------=0A=
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 2=
0120313 (Red Hat 4.4.7-4). 64-bit=0A=
=0A=
On both Master/Standby=0A=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------=0A=
The standby replayed all of the outstanding WAL logs overnight and we have =
caught up with the primary database now, and streaming replication is runni=
ng fine now.=0A=
=0A=
The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a=
 Partition tablespace with data from the year 2007. I verified that the row=
 counts match up between the master/slave on the tables that reside on that=
 tablespace.=0A=
=0A=
Is there anything else I can do to verify the consistency on the standby?=
=0A=
=0A=
thanks=0A=
=0A=
________________________________________=0A=
From: Andres Freund [andres@2ndquadrant.com]=0A=
Sent: Wednesday, July 02, 2014 7:09 AM=0A=
To: Heikki Linnakangas=0A=
Cc: Burgess, Freddie; "PostgreSQL Bugs =FD[pgsql-bugs@postgresql.org]=FD"=
=0A=
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid =
Page block=0A=
=0A=
On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:=0A=
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:=0A=
> >  PostgreSQL version: 9.3.4=0A=
> >  Operating system:   rhel 6.4 linux=0A=
> >  Action: stream replication Master/Slave=0A=
> >  Description:=0A=
> >=0A=
> >Last entries in the PostgreSQL log file before the standby crashed, the =
primary seems unaffected=0A=
> >=0A=
> >LOG: restored log file "0000000100001127000000cc" from archive=0A=
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_2=
01306121/16444/125127698=0A=
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBl=
ockVacuumed 0=0A=
> >LOG: startup process (PID 27797) exited with exit code 1=0A=
> >LOG: terminating any other active server processes=0A=
> >=0A=
> >We did re-started the database and the process of restoring the log file=
 has continued beyond this point, but is are standby server corrupted?=0A=
=0A=
Do you run with data checksums enabled?=0A=
=0A=
> Sounds exactly like this bug:=0A=
>=0A=
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsg=
VWf8vn4=3DjXe6V4R7Hxmg@mail.gmail.com=0A=
>=0A=
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in=
=0A=
> the standby too?=0A=
=0A=
Hm - that bug was about uninitialized pages, not invalid ones. I don't=0A=
immediately see why it'd be legal to have a invalid page (as in=0A=
!PageIsVerified()) somewhere? At least not after reaching consistency.=0A=
=0A=
Greetings,=0A=
=0A=
Andres Freund=0A=
=0A=
--=0A=
 Andres Freund                     http://www.2ndQuadrant.com/=0A=
 PostgreSQL Development, 24x7 Support, Training & Services=0A=
=0A=
=0A=
--=0A=
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)=0A=
To make changes to your subscription:=0A=
http://www.postgresql.org/mailpref/pgsql-bugs=0A=