Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: new heapcheck contrib module
Дата
Msg-id DCB2CB70-1BF8-4601-9A55-A8FD02CF472B@enterprisedb.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

> On May 12, 2020, at 5:34 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, May 11, 2020 at 10:21 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> 2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in
whichall errors are reported via ereport, and a new mode for returning errors as rows from a set returning function. 

Thank you yet again for reviewing.  I really appreciate the feedback!

> Somebody suggested that I make amcheck work in this way during its
> initial development. I rejected that idea at the time, though. It
> seems hard to make it work because the B-Tree index scan is a logical
> order index scan. It's quite possible that a corrupt index will have
> circular sibling links, and things like that. Making everything an
> error removes that concern. There are clearly some failures that we
> could just soldier on from, but the distinction gets rather blurred.

Ok, I take your point that the code cannot soldier on after the first error is returned.  I'll change that for v6 of
thepatch, moving on to the next relation after hitting the first corruption in any particular index.  Do you mind that
Irefactored the code to return the error rather than ereporting?  If it offends your sensibilities, I could rip that
backout, at the expense of having to use try/catch logic in some other places.  I prefer to avoid the try/catch stuff,
butI'm not going to put up a huge fuss. 

> I understand why you want to do it this way. It makes sense that the
> heap stuff would report all inconsistencies together, at the end. I
> don't think that that's really workable (or even desirable) in the
> case of B-Tree indexes, though. When an index is corrupt, the solution
> is always to do root cause analysis, to make sure that the issue does
> not recur, and then to REINDEX. There isn't really a question about
> doing data recovery of the index structure.

Yes, I agree that reindexing is the most sensible remedy.  I certainly have no plans to implement some pg_fsck_index
typetool.  Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what
thenature of the corruption is, as that might make it easier to debug what went wrong.  It's not just for debugging
productionsystems, but also for chasing down problems in half-baked code prior to release. 

> Would it be possible to log the first B-Tree inconsistency, and then
> move on to the next high-level phase of verification? You don't have
> to throw an error, but it seems like a good idea for amcheck to still
> give up on further verification of the index.

Ok, good, it sounds like we're converging on the same idea.  I'm happy to do so.

> The assertion failure that you reported happens because of a generic
> assertion made from _bt_compare(). It doesn't have anything to do with
> amcheck (you'll see the same thing from regular index scans), really.

Oh, I know that already.  I could see that easily enough in the backtrace.  But if you look at the way I implemented
verify_heapam,you might notice this: 

/*
 * check_tuphdr_xids
 *
 *  Determine whether tuples are visible for verification.  Similar to
 *  HeapTupleSatisfiesVacuum, but with critical differences.
 *
 *  1) Does not touch hint bits.  It seems imprudent to write hint bits
 *     to a table during a corruption check.
 *  2) Only makes a boolean determination of whether verification should
 *     see the tuple, rather than doing extra work for vacuum-related
 *     categorization.
 *
 *  The caller should already have checked that xmin and xmax are not out of
 *  bounds for the relation.
 */

The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or
whatever). I was talking about refactoring the btree checking code to be similarly careful. 

> I think that removing that assertion would be the opposite of
> hardening. Even if you removed it, the backend will still crash once
> you come up with a slightly more evil index tuple. Maybe *that* could
> be mostly avoided with widespread hardening; we could in principle
> perform cross-checks of varlena headers against the tuple or page
> layout at any point reachable from _bt_compare(). That seems like
> something that would have unacceptable overhead, because the cost
> would be imposed on everything. And even then you've only ameliorated
> the problem.

I think we may have different mental models of how this all works in practice.  I am (or was) envisioning that the
backend,during regular table and index scans, cannot afford to check for corruption at all steps along the way, and
thereforedoes not, but that a corruption checking tool has a fundamentally different purpose, and can and should choose
tooperate in a way that won't blow up when checking a corrupt relation.  It's the difference between a car designed to
drivedown the highway at high speed vs. a military vehicle designed to drive over a minefield with a guy on the front
bumperscanning for landmines, the whole while going half a mile an hour. 

I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed,
detectinglandmines on occasion by seeing them first, but frequently by failing to see them and just blowing up. 

> Code like amcheck's PageGetItemIdCareful() goes further than the
> equivalent backend macro (PageGetItemId()) to avoid assertion failures
> and crashes with corrupt data. I doubt that it is practical to take it
> much further than that, though. It's subject to diminishing returns.

Ok.

> In general, _bt_compare() calls user-defined code that is usually
> written in C. This C code could in principle feel entitled to do any
> number of scary things when you corrupt the input data. The amcheck
> module's dependency on user-defined operator code is totally
> unavoidable -- it is the single source of truth for the nbtree checks.

I don't really understand this argument, since users with buggy user defined operators are not the target audience, but
Ialso don't think there is any point in arguing it, since I'm already resolved to take your advice about not hardening
thebtree stuff any further. 

> It boils down to this: I think that regression tests that run on the
> buildfarm and actually corrupt data are not practical, at least in the
> case of the index checks -- though probably in all cases. Look at the
> pageinspect "btree.out" test output file -- it's very limited, because
> we have to work around a bunch of implementation details. It's no
> accident that the bt_page_items() test shows a palindrome value in the
> data column (the value is "01 00 00 00 00 00 00 01"). That's an
> endianness workaround.

One of the delays in submitting the most recent version of the patch is that I was having trouble creating a reliable,
portablebtree corrupting regression test.  Ultimately, I submitted v5 without any btree corrupting regression test, as
itproved pretty difficult to write one good enough for submission, and I had already put a couple more days into
developingv5 than I had intended.  So I can't argue too much with your point here. 

I did however address (some?) issues that you and others mentioned about the table corrupting regression test.  Perhaps
thereare remaining issues that will show up on machines with different endianness than I have thus far tested, but I
don'tsee that they will be insurmountable.  Are you fundamentally opposed to that test framework?  If you're going to
voteagainst committing the patch with that test, I'll back down and just remove it from the patch, but it doesn't seem
likea bad regression test to me. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Event trigger code comment duplication
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: tablespace_map code cleanup