Re: corrupt pages detected by enabling checksums

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: corrupt pages detected by enabling checksums
Дата
Msg-id 20130403235241.GA19178@awork2.anarazel.de
обсуждение исходный текст
Ответ на corrupt pages detected by enabling checksums  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: corrupt pages detected by enabling checksums  (andres@anarazel.de (Andres Freund))
Список pgsql-hackers
On 2013-04-03 15:57:49 -0700, Jeff Janes wrote:
> I've changed the subject from "regression test failed when enabling
> checksum" because I now know they are totally unrelated.
> 
> My test case didn't need to depend on archiving being on, and so with a
> simple tweak I rendered the two issues orthogonal.
> 
> 
> On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> 
> > On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
> >
> > > What I would probably really want is the data as it existed after the
> > > crash but before recovery started, but since the postmaster
> > > immediately starts recovery after the crash, I don't know of a good
> > > way to capture this.
> >
> > Can you just turn off the restart_after_crash GUC? I had a chance to
> > look at this, and seeing the block before and after recovery would be
> > nice. I didn't see a log file in the data directory, but it didn't go
> > through recovery, so I assume it already did that.
> >
> 
> You don't know that the cluster is in the bad state until after it goes
> through recovery because most crashes recover perfectly fine.  So it would
> have to make a side-copy of the cluster after the crash, then recover the
> original and see how things go, then either retain or delete the side-copy.
>  Unfortunately my testing harness can't do this at the moment, because the
> perl script storing the consistency info needs to survive over the crash
> and recovery.   It might take me awhile to figure out how to make it do
> this.
> 
> I have the server log just go to stderr, where it gets mingled together
> with messages from my testing harness.  I had not uploaded that file.
> 
> Here is a new upload. It contains both a data_dir tarball including xlog,
> and the mingled stderr ("do_new.out")
> 
> https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHc&usp=sharing
> 
> The other 3 files in it constitute the harness.  It is a quite a mess, with
> some hard-coded paths.  The just-posted fix for wal_keep_segments will also
> have to be applied.
> 
> 
> 
> >
> > The block is corrupt as far as I can tell. The first third is written,
> > and the remainder is all zeros. The header looks like this:
> >
> 
> Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
> by my design, or course.

There are no FPIs (if you mean a full page image with that) afaics:

Your logs tell us about recovery:
27692  2013-04-03 10:09:15.621 PDT:LOG:  redo starts at 1/31000090
27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data checksum in record at 1/31169A68
27692  2013-04-03 10:09:15.647 PDT:LOG:  redo done at 1/31169A38

And then the error:

27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING:  page verification failed, calculated checksum 16296 but expected
49517
27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR:  invalid page in block 90 of relation base/16384/835589


looking at xlog from that time, the first records are:
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 1/31000028, prev 1/30000090, bkp: 0000, desc:
checkpoint:redo 1/31000028; tli 1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; oldest xid 1799
inDB 1; oldest multi 1 in DB 1; oldest running xid 0; online
 
rmgr: XLOG        len (rec/tot):      4/    36, tx:          0, lsn: 1/31000090, prev 1/31000028, bkp: 0000, desc:
nextOid:843781
 
rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 1/310000B8, prev 1/31000090, bkp: 0000, desc: file
create:base/16384/835589
 
rmgr: Heap        len (rec/tot):     37/  7905, tx:   26254997, lsn: 1/310000E8, prev 1/310000B8, bkp: 1000, desc:
hot_update:rel 1663/16384/12660; tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0
 

So the file was created after the last checkpoint, so no full pages need
to be logged. And we theoretically should be able to recovery all things
like torn pages from the WAL since that should cover everything that
happened to that file.

The bkp block 1 indicated in the 4th record above is the only one in
that period of WAL btw.

Looking a bit more, the last page that's covered by WAL is:
rmgr: Heap        len (rec/tot):     35/    67, tx:   26254998, lsn: 1/311699A8, prev 1/31169960, bkp: 0000, desc:
insert:rel 1663/16384/835589; tid 44/56
 

which is far earlier than the "block 90" the error is found in unless i
misunderstand something. Which is strange, since non-zero content to
those pages shouldn't go to disk until the respective WAL is
flushed. Huh, are we missing something here?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: Page replacement algorithm in buffer cache
Следующее
От: Gavin Flower
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)