Обсуждение: help with error "unexpected pageaddr"

Поиск
Список
Период
Сортировка

help with error "unexpected pageaddr"

От
"Scot Kreienkamp"
Дата:

Hey everyone,

 

We have a PG 8.3.7 server that is doing WAL log shipping to 2 other servers that are remote mirrors.  This has been working well for almost two years.  Last night we did some massive data and structure changes to one of our databases.  Since then I get these errors on the two mirrors:

 

2010-09-15 08:35:05 EDT: LOG:  restored log file "0000000100000301000000D9" from archive

2010-09-15 08:35:27 EDT: LOG:  restored log file "0000000100000301000000DA" from archive

2010-09-15 08:35:40 EDT: LOG:  restored log file "0000000100000301000000DB" from archive

2010-09-15 08:35:40 EDT: LOG:  unexpected pageaddr 301/47000000 in log file 769, segment 219, offset 0

2010-09-15 08:35:40 EDT: LOG:  redo done at 301/DA370780

2010-09-15 08:35:40 EDT: LOG:  last completed transaction was at log time 2010-09-15 08:30:01.24936-04

2010-09-15 08:35:40 EDT: LOG:  restored log file "0000000100000301000000DA" from archive

2010-09-15 08:36:26 EDT: LOG:  selected new timeline ID: 2

2010-09-15 08:37:11 EDT: LOG:  archive recovery complete

 

I've taken two separate file level backups and tried to restart the mirrors, and every time on both servers I get a similar error message.  I seem to recall reading that it may have something to do with corruption in the timeline, which is why it's jumping to a new timeline ID.

 

1.  Can anyone tell me what this means? 

2.  Is there some corruption in the database? 

3.  If so, is there an easy way to fix it? 

 

Also, one additional question.  I don't have a 00001.history file which makes the PITRTools complain constantly.  Is there any way to regenerate this file?

 

Any help would be much appreciated.  I'm rather worried that I've got corruption, and not having the mirrors running puts us at risk for data loss.

Re: help with error "unexpected pageaddr"

От
Tom Lane
Дата:
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> We have a PG 8.3.7 server that is doing WAL log shipping to 2 other
> servers that are remote mirrors.  This has been working well for almost
> two years.  Last night we did some massive data and structure changes to
> one of our databases.  Since then I get these errors on the two mirrors:

> 2010-09-15 08:35:05 EDT: LOG:  restored log file
> "0000000100000301000000D9" from archive

> 2010-09-15 08:35:27 EDT: LOG:  restored log file
> "0000000100000301000000DA" from archive

> 2010-09-15 08:35:40 EDT: LOG:  restored log file
> "0000000100000301000000DB" from archive

> 2010-09-15 08:35:40 EDT: LOG:  unexpected pageaddr 301/47000000 in log
> file 769, segment 219, offset 0

This appears to indicate that you archived the wrong contents of log
file 0000000100000301000000DB.  If you don't still have the correct
contents on the master, I think the only way to recover is to take a
fresh base backup so you can make the slaves roll forward from a point
later than this log segment.  There's no reason to suppose that there's
data corruption on the master, just bad data in the WAL archive.

You'd probably be well advised to look closely at your WAL archiving
script to see if it has any race conditions that might be triggered by
very fast generation of WAL.

> Also, one additional question.  I don't have a 00001.history file which
> makes the PITRTools complain constantly.  Is there any way to regenerate
> this file?

Just ignore that, it's cosmetic (the file isn't supposed to exist).

            regards, tom lane

Re: help with error "unexpected pageaddr"

От
Tom Lane
Дата:
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> I tried to take a new base backup about 45 minutes ago.  The master has
> rolled forward a number of WAL files since I last tried, but it still
> fails.

> LOG:  restored log file "0000000100000301000000FE" from archive
> LOG:  restored log file "000000010000030200000000" from archive
> LOG:  restored log file "000000010000030200000001" from archive
> LOG:  restored log file "000000010000030200000002" from archive
> LOG:  restored log file "000000010000030200000003" from archive
> LOG:  unexpected pageaddr 301/50000000 in log file 770, segment 3,
> offset 0

Hmmm ... is it possible that your WAL archive contains log files
numbered higher than where your master is?

            regards, tom lane

Re: help with error "unexpected pageaddr"

От
"Scot Kreienkamp"
Дата:
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> We have a PG 8.3.7 server that is doing WAL log shipping to 2 other
> servers that are remote mirrors.  This has been working well for
almost
> two years.  Last night we did some massive data and structure changes
to
> one of our databases.  Since then I get these errors on the two
mirrors:

> 2010-09-15 08:35:05 EDT: LOG:  restored log file
> "0000000100000301000000D9" from archive

> 2010-09-15 08:35:27 EDT: LOG:  restored log file
> "0000000100000301000000DA" from archive

> 2010-09-15 08:35:40 EDT: LOG:  restored log file
> "0000000100000301000000DB" from archive

> 2010-09-15 08:35:40 EDT: LOG:  unexpected pageaddr 301/47000000 in log
> file 769, segment 219, offset 0

This appears to indicate that you archived the wrong contents of log
file 0000000100000301000000DB.  If you don't still have the correct
contents on the master, I think the only way to recover is to take a
fresh base backup so you can make the slaves roll forward from a point
later than this log segment.  There's no reason to suppose that there's
data corruption on the master, just bad data in the WAL archive.

You'd probably be well advised to look closely at your WAL archiving
script to see if it has any race conditions that might be triggered by
very fast generation of WAL.

> Also, one additional question.  I don't have a 00001.history file
which
> makes the PITRTools complain constantly.  Is there any way to
regenerate
> this file?

Just ignore that, it's cosmetic (the file isn't supposed to exist).

            regards, tom lane


Tom,

I tried to take a new base backup about 45 minutes ago.  The master has
rolled forward a number of WAL files since I last tried, but it still
fails.

LOG:  restored log file "0000000100000301000000FE" from archive
LOG:  restored log file "000000010000030200000000" from archive
LOG:  restored log file "000000010000030200000001" from archive
LOG:  restored log file "000000010000030200000002" from archive
LOG:  restored log file "000000010000030200000003" from archive
LOG:  unexpected pageaddr 301/50000000 in log file 770, segment 3,
offset 0
LOG:  redo done at 302/2BCE828
LOG:  last completed transaction was at log time 2010-09-15
15:07:01.040854-04
LOG:  restored log file "000000010000030200000002" from archive
LOG:  selected new timeline ID: 2

My entire WAL archiving script is 4 cp %p %f commands.  It's so short I
don't even have a script, it's directly in the postgresql.conf archive
command.


Re: help with error "unexpected pageaddr"

От
"Scot Kreienkamp"
Дата:
Shouldn't have, the only thing we did to the server was restart it and
run our database queries.  Clearing out all the wal files from pg_xlog
along with a new base backup did fix it though.

Thanks for the help Tom!

Scot Kreienkamp
skreien@la-z-boy.com