Re: WAL consistency check facility

Поиск
Список
Период
Сортировка
От Kuntal Ghosh
Тема Re: WAL consistency check facility
Дата
Msg-id CAGz5QCKWrfnSTZEPJEWB6XQo0mMViOuFAM6dyNjLMojC5M=NXg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WAL consistency check facility  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: WAL consistency check facility
Список pgsql-hackers
On Thu, Nov 10, 2016 at 10:25 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 9, 2016 at 11:32 PM, Kuntal Ghosh
>> Thanks a lot for reviewing the patch. Based on your review, I've attached the

>> I've done a fair amount of testing which includes regression tests
>> and manual creation of inconsistencies in the page. I've also found the
>> reason behind inconsistency in brin revmap page.
>> Brin revmap page doesn't have standard page layout. Besides, It doesn't update
>> pd_upper and pd_lower in its operations as well. But, during WAL
>> insertions, it uses
>> REGBUF_STANDARD to register a reference in the WAL record. Hence, when we
>> restore image before consistency check, RestoreBlockImage fills the space
>> between pd_upper and pd_lower(page hole) with zero. I've posted this as a
>> separate thread.
>>
https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qO5ShTZB9WWz_Fm7+Wd83zPs9Geg@mail.gmail.com
>
> Nice to have spotted the inconsistency. This tool is going to be useful..
>
> I have spent a couple of hours playing with the patch, and worked on
> it a bit more with a couple of minor changes:
> - In gindesc.c, the if blocks are more readable with brackets.
> - Addition of a comment when info is set, to mention that this is done
> at the beginning of the routine so as callers of XLogInsert() can pass
> the flag for consistency checks.
> - apply_image should be reset in ResetDecoder().
> - The BRIN problem is here:
> 2016-11-10 12:24:10 JST [65776]: [23-1] db=,user=,app=,client= FATAL:
> Inconsistent page found, rel 1663/16385/30625, forknum 0, blkno 1
> 2016-11-10 12:24:10 JST [65776]: [24-1] db=,user=,app=,client=
> CONTEXT:  xlog redo at 0/9BD31148 for BRIN/UPDATE+INIT: heapBlk 100
> pagesPerRange 1 old offnum 11, new offnum 1
> 2016-11-10 12:24:10 JST [65776]: [25-1] db=,user=,app=,client=
> WARNING:  buffer refcount leak: [4540] (rel=base/16385/30625,
> blockNum=1, flags=0x93800000, refcount=1 1)
> TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506)
> Now the buffer refcount leak is not normal! The safest thing to do
> here is to release the buffer once a copy of it has been taken, and
> the leaks goes away when calling FATAL to report the inconsistency.
> - When checking for XLR_CHECK_CONSISTENCY, better to add a != 0 to get
> a boolean out of it.
> - guc_malloc() should be done as late as possible, this simplifies the
> code and prevents any memory leaks which is what your patch is doing
> when there is an error. (I have finally put my finger on what was
> itching me here).
> - In btree_mask, the lookup of BTP_DELETED can be deadly simplified.
> - I wondered also about putting assign_wal_consistency and
> check_wal_consistency in a separate file, say xlogparams.c, concluding
> that the current code does nothing bad either even if it adds rmgr.h
> in the list of headers in guc.c.
> - Some comment blocks are longer than 72~80 characters.
> - Typos!
All the changes make perfect sense to me.

> With the patch for BRIN applied, I am able to get installcheck-world
> working with wal_consistency = all and a standby doing the consistency
> checks behind. Adding wal_consistency = all in PostgresNode.pm, the
> recovery tests are passing. This patch is switched as "Ready for
> Committer". Thanks for completing this effort begun 3 years ago!
Thanks to you for reviewing all the patches in so much detail. Amit, Robert
and Dilip also helped me a lot in developing the feature. Thanks to them
as well.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mithun Cy
Дата:
Сообщение: Re: Patch: Implement failover on libpq connect level.
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress