Обсуждение: uninitialized page in standby recovery

Поиск
Список
Период
Сортировка

uninitialized page in standby recovery

От
Ray Stell
Дата:
I built a standby with 9.4.12 and about a day later the standby crashed 
with this:

2018-02-02 16:20:44 EST,0, WARNING:  page 1347460 of relation 
base/16391/16414 is uninitialized
2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, PANIC:  WAL contains references to invalid pages
2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, LOG:  startup process (PID 24057) was 
terminated by signal 6: Aborted
2018-02-02 16:20:44 EST,0, LOG:  terminating any other active server 
processes

Any hints to where the corruption begins?  I don't see any disk i/o 
issues.  Not sure what to look for in the release notes,

but I'll try to patch asap, but that is difficult to get done politically.



Re: uninitialized page in standby recovery

От
Ray Stell
Дата:
On 2/5/18 9:06 AM, Ray Stell wrote:

> I built a standby with 9.4.12 and about a day later the standby 
> crashed with this:
>
> 2018-02-02 16:20:44 EST,0, WARNING:  page 1347460 of relation 
> base/16391/16414 is uninitialized
> 2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
> 1663/16391/16414; blk 1347460
> 2018-02-02 16:20:44 EST,0, PANIC:  WAL contains references to invalid 
> pages
> 2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
> 1663/16391/16414; blk 1347460
> 2018-02-02 16:20:44 EST,0, LOG:  startup process (PID 24057) was 
> terminated by signal 6: Aborted
> 2018-02-02 16:20:44 EST,0, LOG:  terminating any other active server 
> processes
>
> Any hints to where the corruption begins?  I don't see any disk i/o 
> issues.  Not sure what to look for in the release notes,
>
> but I'll try to patch asap, but that is difficult to get done 
> politically.
>
looks like bug 13822, discussed here:
https://www.postgresql.org/message-id/20151217125025.6916.26898%40wrigleys.postgresql.org
This discuss is in 9.4.5, how can I follow the thread?  Is there a bug 
db to query?


Re: uninitialized page in standby recovery

От
Ray Stell
Дата:
On 2/5/18 9:06 AM, Ray Stell wrote:

> I built a standby with 9.4.12 and about a day later the standby 
> crashed with this:
>
> 2018-02-02 16:20:44 EST,0, WARNING:  page 1347460 of relation 
> base/16391/16414 is uninitialized
> 2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
> 1663/16391/16414; blk 1347460
> 2018-02-02 16:20:44 EST,0, PANIC:  WAL contains references to invalid 
> pages
> 2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
> 1663/16391/16414; blk 1347460
> 2018-02-02 16:20:44 EST,0, LOG:  startup process (PID 24057) was 
> terminated by signal 6: Aborted
> 2018-02-02 16:20:44 EST,0, LOG:  terminating any other active server 
> processes
>
> Any hints to where the corruption begins?  I don't see any disk i/o 
> issues.  Not sure what to look for in the release notes,
>
> but I'll try to patch asap, but that is difficult to get done 
> politically.
>
I begin to wonder about pg_basebackup in this old version.  I rebuilt 
the stby again and this time when I fired up the stby I get:

LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  redo starts at 2F45/1F4B7F8
FATAL:  could not access status of transaction 4053124744
DETAIL:  Could not read from file "pg_clog/0F19" at offset 90112: Success.
CONTEXT:  xlog redo commit: 2018-02-05 11:35:54.291398-05
LOG:  startup process (PID 130590) exited with exit code 1
LOG:  terminating any other active server processes

right or wrong, I rsync-ed gp_clog and it recovered.  Can you use 
pg_basebackup from a more current patch_level on 9.4.12 cluster?