Re: 9.4 checksum errors in recovery with gin index

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: 9.4 checksum errors in recovery with gin index
Дата
Msg-id 20140507173421.GJ13397@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: 9.4 checksum errors in recovery with gin index  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: 9.4 checksum errors in recovery with gin index
Список pgsql-hackers
Hi,

On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> 
> > Hi,
> >
> > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > > When recovering from a crash (with injection of a partial page write at
> > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > > verification failure.
> > >
> > > 16396 is a gin index.
> >
> > Over which type? What was the load? make check?
> >
> 
> A gin index on text[].
> 
> The load is a variation of the crash recovery tester I've been using the
> last few years, this time adapted to use a gin index in a rather unnatural
> way.  I just increment a counter on a random row repeatedly via a unique
> key, but for this purpose that unique key is stuffed into text[] along with
> a bunch of cruft.  The cruft is text representations of negative integers,
> the actual key is text representation of nonnegative integers.
> 
> The test harness (patch to induce crashes, and two driving programs) and a
> preserved data directory are here:
> 
> https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing
> 
> (role jjanes, database jjanes)
> 
> As far as I can tell, this problem goes back to the beginning of page
> checksums.

Interesting.

> > > If I have it ignore checksum failures, there is no apparent misbehavior.
> > >  I'm trying to bisect it, but it could take a while and I thought someone
> > > might have some theories based on the log:
> >
> > If you have the WAL a pg_xlogdump grepping for everything referring to
> > that block would be helpful.
> >
> 
> The only record which mentions block 28486 by name is this one:

Hm, try running it with -b specified.

> rmgr: Gin         len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
> 
> However, I think that that record precedes the recovery start point.

If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
are missing somewhere...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [v9.5] Custom Plan API