Обсуждение: amcheck's verify_heapam(), and HOT chain verification

Поиск
Список
Период
Сортировка

amcheck's verify_heapam(), and HOT chain verification

От
Peter Geoghegan
Дата:
I recently pushed a commit which formalized what we expect of HOT
chains during pruning -- see assertions and comments added by commit
5cd7eb1f, "Add various assertions to heap pruning code". I notice that
verify_heapam() doesn't really do anything with HOT chains, and I'd
say that that's a gap. Fortunately it should not be terribly difficult
to fill in this gap.

The short version is this: I think that verify_heapam() should
validate whole HOT chains, not just individual heap-only tuples. This
can be accomplished by taking a similar approach to the approach taken
in heap_prune_chain() already, for pruning.

For a given heap page, pruning fills in its PruneState variable to
build a coherent picture of the state of all HOT chains on the entire
page, before modifying anything (barring hint bit setting). We only
reach heap_page_prune_execute() when we have the whole picture about
HOT chains. This high level approach is essential when pruning.
Consider LP_REDIRECT items. We might need to set an existing
LP_REDIRECT item to LP_DEAD, rendering an entire HOT chain dead.
Whether or not that's the appropriate course of action for a given
LP_REDIRECT item will depend on the state of the HOT chain as a whole.

The HOT chain could of course have several heap-only tuples, and only
one visible tuple means that the LP_REDIRECT root item must not be set
LP_DEAD -- it should be made to link to the remaining live heap-only
tuple instead (while the dead intermediate heap-only tuples become
LP_UNUSED, never LP_DEAD). This is made even more tricky by the fact
that we have to consider HOT updaters with aborted transactions during
pruning.

We even have to consider the case where a HOT updater aborts, but
another HOT updater tries again and commits. Now we have a heap-only
tuple that is "disconnected from its original HOT chain" (or will be
until the next prune). Having a stray/disconnected heap-only tuple
like this ought to be fine -- the next pruning operation will get to
it whenever (see the special handling for these aborted heap-only
tuples at the top of heap_prune_chain()). But any disconnected
heap-only tuple had better be from an aborted xact -- otherwise we
have corruption. This is corruption that looks like index corruption,
because sequential scans still give correct answers (I've seen this
several times before, ask me how). But it's not index corruption,
contrary to appearances -- it's heap corruption.

Here are some specific checks I have in mind:

* Right now, we throw an error when an LP_REDIRECT points to an
LP_UNUSED item. But that existing test should go further, like the
assertions I added recently. They should make sure that the
LP_REDIRECT links directly to an LP_NORMAL item, which has tuple
storage. And, the stored tuple must be a heap-only tuple.

Note, in particular, that it's not okay if the LP_REDIRECT points to
an LP_DEAD item (which amcheck won't complain about today). That is at
least my position on it, and so far it looks like the right position
(no assertion failures reported yet).

* HOT chain validation: All visible/committed heap-only tuples should
be reachable through a "recognizable root item for their HOT chain, or
an earlier member of the HOT chain via its t_ctid link". The root item
can either be an LP_REDIRECT, or a regular not-heap-only tuple -- but
it can't be anything else.

* All other heap-only tuples (all heap-only tuples that don't seem to
be part of a valid HOT chain) must be from aborted transactions -- as
I said, this has to be corruption (heap corruption disguised as index
corruption, even).

Note that we're very permissive about broken HOT chains -- in general
a heap-only tuple's t_ctid link can have pretty wild values, and
that's okay. For example, it can point past the current bounds of the
page's line pointer array. Everything is still robust because of the
way that we validate HOT chains during chain traversal -- we match
tuple N's xmax to tuple N+1's xmin, all the time, inside code like
heap_hot_search_buffer() that is routinely used during index scans.
However, the root item of a HOT chain (usually an LP_REDIRECT)
absolutely must point to a valid heap-only tuple at all times --
that's very different indeed. That's the starting point for an index
scan -- it is the TID that is stored in indexes that needs to be
stable over time. And so pruning needs to leave behind coherent
looking whole entire HOT chains, just as we ought to verify that
pruning consistently gets it right.

-- 
Peter Geoghegan



Re: amcheck's verify_heapam(), and HOT chain verification

От
Mark Dilger
Дата:

> On Nov 5, 2021, at 7:51 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> I recently pushed a commit which formalized what we expect of HOT
> chains during pruning -- see assertions and comments added by commit
> 5cd7eb1f, "Add various assertions to heap pruning code". I notice that
> verify_heapam() doesn't really do anything with HOT chains, and I'd
> say that that's a gap. Fortunately it should not be terribly difficult
> to fill in this gap.
>
> The short version is this: I think that verify_heapam() should
> validate whole HOT chains, not just individual heap-only tuples. This
> can be accomplished by taking a similar approach to the approach taken
> in heap_prune_chain() already, for pruning.
>
> For a given heap page, pruning fills in its PruneState variable to
> build a coherent picture of the state of all HOT chains on the entire
> page, before modifying anything (barring hint bit setting). We only
> reach heap_page_prune_execute() when we have the whole picture about
> HOT chains. This high level approach is essential when pruning.
> Consider LP_REDIRECT items. We might need to set an existing
> LP_REDIRECT item to LP_DEAD, rendering an entire HOT chain dead.
> Whether or not that's the appropriate course of action for a given
> LP_REDIRECT item will depend on the state of the HOT chain as a whole.
>
> The HOT chain could of course have several heap-only tuples, and only
> one visible tuple means that the LP_REDIRECT root item must not be set
> LP_DEAD -- it should be made to link to the remaining live heap-only
> tuple instead (while the dead intermediate heap-only tuples become
> LP_UNUSED, never LP_DEAD). This is made even more tricky by the fact
> that we have to consider HOT updaters with aborted transactions during
> pruning.
>
> We even have to consider the case where a HOT updater aborts, but
> another HOT updater tries again and commits. Now we have a heap-only
> tuple that is "disconnected from its original HOT chain" (or will be
> until the next prune). Having a stray/disconnected heap-only tuple
> like this ought to be fine -- the next pruning operation will get to
> it whenever (see the special handling for these aborted heap-only
> tuples at the top of heap_prune_chain()). But any disconnected
> heap-only tuple had better be from an aborted xact -- otherwise we
> have corruption. This is corruption that looks like index corruption,
> because sequential scans still give correct answers (I've seen this
> several times before, ask me how). But it's not index corruption,
> contrary to appearances -- it's heap corruption.
>
> Here are some specific checks I have in mind:
>
> * Right now, we throw an error when an LP_REDIRECT points to an
> LP_UNUSED item. But that existing test should go further, like the
> assertions I added recently. They should make sure that the
> LP_REDIRECT links directly to an LP_NORMAL item, which has tuple
> storage. And, the stored tuple must be a heap-only tuple.
>
> Note, in particular, that it's not okay if the LP_REDIRECT points to
> an LP_DEAD item (which amcheck won't complain about today). That is at
> least my position on it, and so far it looks like the right position
> (no assertion failures reported yet).
>
> * HOT chain validation: All visible/committed heap-only tuples should
> be reachable through a "recognizable root item for their HOT chain, or
> an earlier member of the HOT chain via its t_ctid link". The root item
> can either be an LP_REDIRECT, or a regular not-heap-only tuple -- but
> it can't be anything else.
>
> * All other heap-only tuples (all heap-only tuples that don't seem to
> be part of a valid HOT chain) must be from aborted transactions -- as
> I said, this has to be corruption (heap corruption disguised as index
> corruption, even).
>
> Note that we're very permissive about broken HOT chains -- in general
> a heap-only tuple's t_ctid link can have pretty wild values, and
> that's okay. For example, it can point past the current bounds of the
> page's line pointer array. Everything is still robust because of the
> way that we validate HOT chains during chain traversal -- we match
> tuple N's xmax to tuple N+1's xmin, all the time, inside code like
> heap_hot_search_buffer() that is routinely used during index scans.
> However, the root item of a HOT chain (usually an LP_REDIRECT)
> absolutely must point to a valid heap-only tuple at all times --
> that's very different indeed. That's the starting point for an index
> scan -- it is the TID that is stored in indexes that needs to be
> stable over time. And so pruning needs to leave behind coherent
> looking whole entire HOT chains, just as we ought to verify that
> pruning consistently gets it right.

Thanks, Peter, for this analysis.  It's getting a bit late in the day, but I'll try this out tomorrow, I expect.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: amcheck's verify_heapam(), and HOT chain verification

От
Peter Geoghegan
Дата:
On Fri, Nov 5, 2021 at 8:18 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> Thanks, Peter, for this analysis.

It's strategically important work. What you've done so far has every
chance of catching corruption involving storage issues, which is
great. But we ought to do more to catch certain known general patterns
of corruption. The so-called "freeze the dead" bug (see commit
9c2f0a6c for the final fix) is the single weirdest and scariest bug
that I can recall (there were a couple of incorrect bug fixes that had
to be reverted), and it is very much the kind of thing that I'd like
to see verify_heapam.c handle exhaustively. That would de-risk a lot
of things. Note that the same variety of HOT chain corruption bugs
besides that one, so it's really a general class of corruption, not
one specific bug.

The heapallindexed verification option actually detected the "freeze
the dead" bug [1], although that was a total accident -- that checking
option happens to use CREATE INDEX infrastructure, which happens to
sanitize HOT chains by calling heap_get_root_tuples() and detecting
contradictions -- heap_get_root_tuples() is code that is right next to
the pruning code I mentioned. That level of detection is certainly
better than nothing, but it's still not good enough -- we need this in
verify_heapam, too.

Here's why I think that we need very thorough HOT chain verification
that lives directly in verify_heapam:

First of all, the heapallindexed option is rather expensive, whereas
the additional overhead for verify_heapam would be minimal -- you
probably wouldn't even need to make it optional. Second of all, why
should you need to use bt_index_check() to detect what is after all
heap corruption? And third of all, the approach of detecting
corruption by outsourcing detection to heapam_index_build_range_scan()
(which calls heap_get_root_tuples() itself) probably risks missing
corrupt HOT chains where the corruption doesn't look a certain way --
it was not designed with amcheck in mind.

heapam_index_build_range_scan() + heap_get_root_tuples() will notice a
heap-only tuple without a parent, which is a good start. But what
about an LP_REDIRECT item whose lp_off (i.e. page offset number
redirect) somehow points to any item on the page other than a valid
heap-only tuple? Also doesn't catch contradictory commit states for
heap-only tuples that form a hot chain through their t_ctid links. I'm
sure that there are other things that I haven't thought of, too --
there is a lot going on here.

This HOT chain verification business is a little like the checking
that we do in verify_nbtree.c, actually. The individual tuples (in
this case heap tuples) may well not be corrupt (or demonstrably
corrupt) in isolation. But taken together, in a particular page offset
number sequence order, they can nevertheless *imply* corruption --
it's implied if the tuples tell a contradictory story about what's
going on at the level of the whole page. A HOT chain LP_REDIRECT is
after all a little like a "mini index" stored on a heap page.
Sequential scans don't care about LP_REDIRECTs at all. But index scans
expect a great deal from LP_REDIRECTs.

[1] https://www.postgresql.org/message-id/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
--
Peter Geoghegan



Re: amcheck's verify_heapam(), and HOT chain verification

От
Peter Geoghegan
Дата:
On Fri, Nov 5, 2021 at 7:51 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Here are some specific checks I have in mind:

One more for the list:

* Validate PageIsAllVisible() for each page.

In other words, pg_visibility should be merged with verify_heapam.c
(or at least pg_visibility 's pg_check_frozen() and pg_check_visible()
functions should be moved, merged, or whatever). This would mean that
verify_heapam() would directly check if the page-level PD_ALL_VISIBLE
flag contradicts either the tuple headers of tuples with storage on
the page, the presence (or absence) of LP_DEAD stub line pointers on
the page, or the corresponding visibility map bit (e.g.,
VISIBILITYMAP_ALL_VISIBLE) for the page.

There is value in teaching verify_heapam() about any possible problem,
including with the visibility map, but it's certainly less valuable
than the HOT chain verification stuff -- and probably trickier to get
right. I'm mentioning it now to be exhaustive, but it's less of a
priority for me personally.

I am quite willing to help out with all this, if you're interested.

One more thing about HOT chain validation:

I can give you another example bug of the kind I'd expect
verify_heapam() to catch only with full HOT chain validation. This one
is a vintage MultiXact bug that has the same basic HOT chain
corruption, looks-like-index-corruption-but-isn't quality as the more
memorable freeze-the-dead bug (this one was fixed by commit 6bfa88ac):

https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb%2BbYkUcnCk%3DO67w0cSswPvV7XfUcU5g%40mail.gmail.com

In general I think that reviewing historic examples of pernicious
corruption bugs is a valuable exercise when designing tools like
amcheck. Maybe even revert the fix during testing, to be sure it would
have been caught had the final tool been available. History doesn't
repeat itself, but it does rhyme.

-- 
Peter Geoghegan



Re: amcheck's verify_heapam(), and HOT chain verification

От
Mark Dilger
Дата:

> On Nov 6, 2021, at 3:09 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> I am quite willing to help out with all this, if you're interested.

Yes, I am quite interested, though I will have to alternate between this work and the various patch sets that I've
alreadysubmitted for this development cycle. 

I think we need a corruption generating framework that can be deployed on the buildfarm.  What I have in mind is
inspiredby your comments about the "freeze the dead" bug.  We need to be able to build versions of the database with
knownbugs enabled, both real-world bugs from the past and contrived bugs we write only for this purpose, so as to
createnon-trivial corruption on the build farm.  I think it would be sufficient if such builds were run less
frequently,perhaps triggered by commits to a corruption testing branch?  We could keep the old bugs alive on that
branchwhile periodically merging in everything else from HEAD?  That seems a tad tiresome, but maybe it wouldn't be too
badif the intentional bugs were limited to just a few backend files, and as much as possible in code at the end of the
fileor in separate files to reduce merge conflicts? 

I'm cc'ing Andrew to get his thoughts about the buildfarm integration....

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: amcheck's verify_heapam(), and HOT chain verification

От
Peter Geoghegan
Дата:
On Sun, Nov 7, 2021 at 9:30 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> Yes, I am quite interested, though I will have to alternate between this work and the various patch sets that I've
alreadysubmitted for this development cycle.
 

Great! We still have a lot of work to do with HOT chain level
invariants. I have seen very clear evidence of that in the last 24
hours:

https://postgr.es/m/CAH2-Wzma=Y3O+LRx2Wj_HwGbbbeNwr6FoJzXni8hxOMw55pcZg@mail.gmail.com

> I think we need a corruption generating framework that can be deployed on the buildfarm.  What I have in mind is
inspiredby your comments about the "freeze the dead" bug.
 

I'm not sure that that's truly necessary. IMV the important thing is
that we formalize the invariants, and have tooling that can test them.
Maintaining tests that actually display specific broken behavior (as
opposed to its absence) seems like it might be quite a burden.

There are many specific ways that these invariants might break, but
the specifics shouldn't matter -- the invariants should cut through
that (to the extent that that's possible). The "freeze the dead" bug
is mostly useful as a way of framing the discussion (perhaps even in
code comments). I did this with a historic CREATE INDEX CONCURRENTLY
bug, in code comments for heapallindexed verification. Verification
using heapallindexed actually detected two more CIC bugs years later
-- not including the earlier fixes that didn't quite get everything
right (thinking of the prepared xact CIC bugs found using amcheck).
The invariants that heapallindexed tests generalize to many different
situations, including situations that I couldn't have possibly
anticipated with any kind of precision. Having the right high level
idea is what really mattered.

-- 
Peter Geoghegan