Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: new heapcheck contrib module
Дата
Msg-id CAH2-Wz=AnzG_KvoR-KairFaZJavSzzDKfv2h30aJxM=RYxuUTg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Ответы Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
Re: new heapcheck contrib module  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On Tue, May 12, 2020 at 7:07 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Thank you yet again for reviewing.  I really appreciate the feedback!

Happy to help. It's important work.

> Ok, I take your point that the code cannot soldier on after the first error is returned.  I'll change that for v6 of
thepatch, moving on to the next relation after hitting the first corruption in any particular index.  Do you mind that
Irefactored the code to return the error rather than ereporting? 

try/catch seems like the way to do it. Not all amcheck errors come
from amcheck -- some are things that the backend code does, that are
known to appear in amcheck from time to time. I'm thinking in
particular of the
table_index_build_scan()/heapam_index_build_range_scan() errors, as
well as the errors from _bt_checkpage().

> Yes, I agree that reindexing is the most sensible remedy.  I certainly have no plans to implement some pg_fsck_index
typetool.  Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what
thenature of the corruption is, as that might make it easier to debug what went wrong.  It's not just for debugging
productionsystems, but also for chasing down problems in half-baked code prior to release. 

All good goals.

>  * check_tuphdr_xids

> The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or
whatever).

I don't think that you can expect to avoid assertion failures in
general. I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control. I'm sure that you could get an assertion failure somewhere in
there, and even if you couldn't that could change at any time.

You've quasi-duplicated some sensitive code to do that much, which
seems excessive. But it's also not enough.

> I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed,
detectinglandmines on occasion by seeing them first, but frequently by failing to see them and just blowing up. 

That's not it. I would certainly prefer if the landmine detector
didn't blow up. Not having that happen is certainly a goal I share --
that's why PageGetItemIdCareful() exists. But not at any cost,
especially not when "blow up" means an assertion failure that users
won't actually see in production. Avoiding assertion failures like the
one you showed is likely to have a high cost (removing defensive
asserts in low level access method code) for a low benefit. Any
attempt to avoid having the checker itself blow up rather than throw
an error message needs to be assessed pragmatically, on a case-by-case
basis.

> One of the delays in submitting the most recent version of the patch is that I was having trouble creating a
reliable,portable btree corrupting regression test. 

To be clear, I think that corrupting data is very helpful with ad-hoc
testing during development.

> I did however address (some?) issues that you and others mentioned about the table corrupting regression test.
Perhapsthere are remaining issues that will show up on machines with different endianness than I have thus far tested,
butI don't see that they will be insurmountable.  Are you fundamentally opposed to that test framework? 

I haven't thought about it enough just yet, but I am certainly suspicious of it.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: PG 13 release notes, first draft
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: refactoring basebackup.c