Обсуждение: [HACKERS] heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug(Was: amcheck (B-Tree integrity checking tool))

Поиск
Список
Период
Сортировка
On Sun, Oct 16, 2016 at 6:46 PM, Noah Misch <noah@leadboat.com> wrote:
>> Noah and I discussed possible future directions for amcheck in person
>> recently. I would like to get Noah's thoughts again here on how a tool
>> like amcheck might reasonably target other access methods for
>> verification. In particular, the heapam. MultiXacts were mentioned as
>> a structure that could receive verification in a future iteration of
>> this tool, but I lack expertise there.
>
> Yes, a heap checker could examine plenty of things.  Incomplete list:

The heap checking enhancement that is under review in the next CF [1],
which checks an index against the table it indexes, doesn't directly
test any of the things you outlined in this mail almost exactly a year
ago. (That said, some of the same things are checked indirectly,
simply by using IndexBuildHeapScan().)

I do still think that there is a very real need for a "direct
heap/SLRU checker" function, though. I only put off implementing one
because there was still some low hanging fruit. I would like to go
over your suggestions again now. My understanding of what we *can* do
has evolved in the past several months.

> - Detect impossible conditions in the hint bits.  A tuple should not have both
>   HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID.  Every tuple bearing
>   HEAP_ONLY_TUPLE should bear HEAP_UPDATED.  HEAP_HASVARWIDTH should be true
>   if and only if the tuple has a non-NULL value in a negative-typlen column,
>   possibly a dropped column.  A tuple should not have both HEAP_KEYS_UPDATED
>   and HEAP_XMAX_LOCK_ONLY.

It certainly makes sense to check for clearly contradictory hint bits like this.

> - Verify agreement between CLOG, MULTIXACT, and hint bits.

This is where it gets complicated, I think. This is what I really want
to talk about.

The recent commit 20b65522 pretty clearly established that a tuple
could technically escape freezing (having HEAP_XMIN_FROZEN set)
despite actually being before a relfrozenxid cut-off. The idea was
that as long as we reliably set hint bits on heap-only tuples through
WAL-logging, it doesn't matter that their CLOG could be truncated,
because nobody will ever need to interrogate the CLOG anyway (to coin
a phrase, the heap-only tuple nevertheless still had its xmax
"logically frozen"). If nothing else, this leaves me with a very
squishy set of invariant conditions to work with when I go to verify
agreement with CLOG, MULTIXACT, and hint bits.

So, the question is: What is the exact set of invariant conditions
that I can check when I go to verify agreement between a heap relation
and the various SLRUs? In particular, at what precise XID-wise point
do CLOG and MULTIXACT stop reliably storing transaction status? And,
is there a different point for the xmax of tuples that happen to be
heap-only tuples?

Another important concern here following 20b65522 is: Why is it safe
to assume that nobody will ever call TransactionIdDidCommit() for
"logically frozen" heap-only tuples that are not at the end of their
HOT chain, and in so doing get a wrong answer? I can't find a rule
that implies that there is no dangerous ambiguity that's written down
anywhere. I *can* find a comment that suggests that it's wrong, though
-- the "N.B." comment at the top of heap_prepare_freeze_tuple().
(Granted, that comment doesn't seem to acknowledge the fact that the
caller does WAL-log as part of freezing; there has been some churn in
this area.)

I'm pretty sure that the classic rule for TransactionIdDidCommit()
when called from tqual.c was that the tuple cannot have
HEAP_XMIN_FROZEN set, which was straightforward enough back when a
frozen tuple was assumed to have necessarily committed (and to have a
"raw" xmin point that should logically be visible to everyone). This
business of "logically frozen" xmax values for dead heap-only tuples
makes things *way* more complicated, though. I'm concerned that we've
failed to account for that, and that TransactionIdDidCommit() calls
concerning pre-relfrozenxid XIDs can still happen.

I should point out that for whatever the reason, there is evidence
that we do in fact risk TransactionIdDidCommit() calls that give wrong
answers (independent of the  ~2014 MULTIXACT bugs where that was
clearly the case, and even before 20b65522): Jeff Janes showed [2]
that there is probably some unaccounted-for case where "premature"
truncation takes place. This may be related to the recent HOT
chain/freezing bugs, and our (only partial [3]) fixes may have fixed
that, too -- I just don't know.

[1] https://commitfest.postgresql.org/15/1263/
[2] https://postgr.es/m/CAMkU=1y-TNP_7JdFg_ubXqDB8esMO280aCDGDwsa429HRtE8Lw@mail.gmail.com
[3] https://postgr.es/m/20171007232524.sdpi3jk2636fvjge@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Mon, Oct 09, 2017 at 05:19:11PM -0700, Peter Geoghegan wrote:
> On Sun, Oct 16, 2016 at 6:46 PM, Noah Misch <noah@leadboat.com> wrote:
> > - Verify agreement between CLOG, MULTIXACT, and hint bits.
> 
> This is where it gets complicated, I think. This is what I really want
> to talk about.
> 
> The recent commit 20b65522 pretty clearly established that a tuple
> could technically escape freezing (having HEAP_XMIN_FROZEN set)
> despite actually being before a relfrozenxid cut-off. The idea was
> that as long as we reliably set hint bits on heap-only tuples through
> WAL-logging, it doesn't matter that their CLOG could be truncated,
> because nobody will ever need to interrogate the CLOG anyway (to coin
> a phrase, the heap-only tuple nevertheless still had its xmax
> "logically frozen"). If nothing else, this leaves me with a very
> squishy set of invariant conditions to work with when I go to verify
> agreement with CLOG, MULTIXACT, and hint bits.
> 
> So, the question is: What is the exact set of invariant conditions
> that I can check when I go to verify agreement between a heap relation
> and the various SLRUs? In particular, at what precise XID-wise point
> do CLOG and MULTIXACT stop reliably storing transaction status? And,
> is there a different point for the xmax of tuples that happen to be
> heap-only tuples?
> 
> Another important concern here following 20b65522 is: Why is it safe
> to assume that nobody will ever call TransactionIdDidCommit() for
> "logically frozen" heap-only tuples that are not at the end of their
> HOT chain, and in so doing get a wrong answer? I can't find a rule
> that implies that there is no dangerous ambiguity that's written down
> anywhere. I *can* find a comment that suggests that it's wrong, though
> -- the "N.B." comment at the top of heap_prepare_freeze_tuple().
> (Granted, that comment doesn't seem to acknowledge the fact that the
> caller does WAL-log as part of freezing; there has been some churn in
> this area.)

All good questions; I don't know offhand.  Discovering those answers is
perhaps the chief labor required of such a project.  The checker should
consider circumstances potentially carried from past versions via pg_upgrade.
Fortunately, if you get some details wrong, it's cheap to recover from checker
bugs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Fri, Oct 13, 2017 at 7:09 PM, Noah Misch <noah@leadboat.com> wrote:
> All good questions; I don't know offhand.  Discovering those answers is
> perhaps the chief labor required of such a project.

ISTM that by far the hardest part of the project is arriving at a
consensus around what a good set of invariants for CLOG and MultiXact
looks like.

I think that it's fair to say that this business with relfrozenxid now
appears to be more complicated than many of us would have thought. Or,
at least, more complicated than I thought when I first started
thinking about it. Once we're measuring this complexity (by having
checks), we should be in a better position to keep it under control,
and to avoid ambiguity.

> The checker should
> consider circumstances potentially carried from past versions via pg_upgrade.

Right. False positives are simply unacceptable.

> Fortunately, if you get some details wrong, it's cheap to recover from checker
> bugs.

Ideally, amcheck will become a formal statement of the contracts
provided by major subsystems, such as the heapam, the various SLRUs,
and so on. While I agree that having bugs there is much less severe
than having bugs in backend code, I would like the tool to reach a
point where it actually *defines* correctness (by community
consensus). If a bug in amcheck reflects a bug in our high level
thinking about correctness, then that actually is a serious problem.
Arguably, it's the most costly variety of bug that Postgres can have.

I may never be able to get general buy-in here; building broad
consensus like that is a lot harder than writing some code for a
contrib module. Making the checking code the *authoritative* record of
how invariants are *expected* to work is a major goal of the project,
though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Mon, Oct 16, 2017 at 12:57:39PM -0700, Peter Geoghegan wrote:
> On Fri, Oct 13, 2017 at 7:09 PM, Noah Misch <noah@leadboat.com> wrote:
> > The checker should
> > consider circumstances potentially carried from past versions via pg_upgrade.
> 
> Right. False positives are simply unacceptable.

False positives are bugs, but they're not exceptionally-odious bugs.

> > Fortunately, if you get some details wrong, it's cheap to recover from checker
> > bugs.
> 
> Ideally, amcheck will become a formal statement of the contracts
> provided by major subsystems, such as the heapam, the various SLRUs,
> and so on. While I agree that having bugs there is much less severe
> than having bugs in backend code, I would like the tool to reach a
> point where it actually *defines* correctness (by community
> consensus).

That presupposes construction of two pieces of software, the server and the
checker, such that every disagreement is a bug in the server.  But checkers
get bugs just like servers get bugs.  Checkers do provide a sort of
double-entry bookkeeping.  When a reproducible test case prompts a checker
complaint, we'll know *some* code is wrong.  That's an admirable contribution.

> If a bug in amcheck reflects a bug in our high level
> thinking about correctness, then that actually is a serious problem.

My notion of data file correctness is roughly this:
 A data file is correct if the server's reads and mutations thereof will not cause it to deviate from documented
behavior. Where the documentation isn't specific, fall back on SQL standards.  Where no documentation or SQL standard
addressesa particular behavior, we should debate the matter and document the decision.
 

I'm essentially saying that the server is innocent until proven guilty.  It
would be cool to have a self-contained specification of PostgreSQL data files,
but where the server disagrees with the spec without causing problem
behaviors, we'd ultimately update the spec to fit the server.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Mon, Oct 16, 2017 at 8:09 PM, Noah Misch <noah@leadboat.com> wrote:
> That presupposes construction of two pieces of software, the server and the
> checker, such that every disagreement is a bug in the server.  But checkers
> get bugs just like servers get bugs.

You make a good point, which is that *some* code must be wrong when an
error is raised and hardware is not to blame, but ISTM that the nuance
of that really matters. The checker seems much less likely to be where
bugs are, for three reasons:

* There is far less code for us to maintain as compared to the volume
of backend code that is effectively tested (again, not including the
hidden universe of complex, unauditable firmware code that could be
involved these days).

* Much of the actual checking (as much as possible) is outsourced to
core code that is already critically important. If that has bugs in
it, then it is unlikely to be defined as an amcheck bug.

* Knowing all this, we can go out of our way to do a good job of
getting the design right the first time. (A sound design is far more
important than actually having zero bugs.)

Obviously there could be unambiguous bugs; I'm not arguing otherwise.
I just hope that we can push this model as far as we need to, and
perhaps accommodate verifiability as a goal for *future* development
projects. We're almost doing that today; debuggability of on-disk
structures is something that the community already values. This is the
logical next step, IMV.

> Checkers do provide a sort of
> double-entry bookkeeping.  When a reproducible test case prompts a checker
> complaint, we'll know *some* code is wrong.

I really like your double entry bookkeeping analogy. A tiny
discrepancy will bubble up, even in a huge organization, and yet the
underlying principles are broad and not all that complicated.

> That's an admirable contribution.

Thank you. I just hope that it becomes something that other
contributors have some sense of ownership over.

> I'm essentially saying that the server is innocent until proven guilty.  It
> would be cool to have a self-contained specification of PostgreSQL data files,
> but where the server disagrees with the spec without causing problem
> behaviors, we'd ultimately update the spec to fit the server.

I might not have done a good job of explaining my position. I agree
with everything you say here. I would like to see amcheck become a
kind of vehicle for discussing things that we already discuss. You get
a nice tool at the end, that clarifies and increases confidence in the
original understanding over time (or acts as a canary-in-the-coalmine
forcing function when the original understanding turns out to be
faulty). The tool itself is ultimately just a bonus.

Bringing it back to the concrete freeze-the-dead issue, and the
question of an XID-cutoff for safely interrogating CLOG: I guess it
will be possible to assess a HOT chain as a whole. We can probably do
this safely while holding a super-exclusive lock on the buffer. I can
probably find a way to ensure this only needs to happen in a rare slow
path, when it looks like the invariant might be violated but we need
to make sure (I'm already following this pattern in a couple of
places). Realistically, there will be some amount of "try it and see"
here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Oct 18, 2017 at 12:45 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> Bringing it back to the concrete freeze-the-dead issue, and the
> question of an XID-cutoff for safely interrogating CLOG: I guess it
> will be possible to assess a HOT chain as a whole. We can probably do
> this safely while holding a super-exclusive lock on the buffer. I can
> probably find a way to ensure this only needs to happen in a rare slow
> path, when it looks like the invariant might be violated but we need
> to make sure (I'm already following this pattern in a couple of
> places). Realistically, there will be some amount of "try it and see"
> here.

I would like to point out for the record/archives that I now believe
that Andres' pending do-over fix for the "Freeze the dead" bug [1]
will leave things in *much* better shape when it comes to
verification. Andres' patch neatly addresses *all* of the concerns
that I raised on this thread. The high-level idea of relfrozenxid as a
unambiguous cut-off point at which it must be safe to interrogate the
CLOG is restored.

Off hand, I'd say that the only interlock amcheck verification now
needs when verifying heap pages against the CLOG is a VACUUM-style
SHARE UPDATE EXCLUSIVE lock on the heap relation being verified. Every
heap tuple must either be observed to be frozen, or must only have
hint bits that are observably in agreement with CLOG. The only
complicated part is the comment that explains why this is
comprehensive and correct (i.e. does not risk false positives or false
negatives). We end up with something that is a bit like a "correct by
construction" design.

The fact that Andres also proposes to add a bunch of new defensive
"can't happen" hard elog()s (mostly by promoting assertions) should
validate the design of tuple + multixact freezing, in the same way
that I hope amcheck can.

[1] https://postgr.es/m/20171114030341.movhteyakqeqx5pm@alap3.anarazel.de
-- 
Peter Geoghegan