Обсуждение: Re: pg_amcheck contrib application

Поиск
Список
Период
Сортировка

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 16, 2021, at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> It is unfortunate that the failing test only runs pg_amcheck after creating numerous corruptions, as we can't know
ifpg_amcheck would have complained about pg_statistic before the corruptions were created in other tables, or if it
onlydoes so after.  The attached patch v7-0003 adds a call to pg_amcheck after all tables are created and populated,
butbefore any corruptions are caused.  This should help narrow down what is happening, and doesn't hurt to leave in
placelong-term. 
>>
>> I don't immediately see anything wrong with how pg_statistic uses a pseudo-type, but it leads me to want to poke a
bitmore at pg_statistic on hornet and tern, though I don't have any regression tests specifically for doing so. 
>>
>> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.
>
> Since we now know that shutting autovacuum off makes the problem go
> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> instead, if, as presently seems to be the case, that's where the
> problem is.

If you get unlucky, autovacuum will process one of the tables that the test intentionally corrupted, with bad
consequences,ultimately causing build farm intermittent test failures.  We could wait to see if this ever happens
beforefixing it, if you prefer.  I'm not sure what that buys us, though. 

The right approach, I think, is to extend the contrib/amcheck tests to include regressing this particular case to see
ifit fails.  I've written that test and verified that it fails against the old code and passes against the new. 

> I just committed 0002.

Thanks!

> I think 0003 is probably a good idea, but I haven't committed it yet.

It won't do anything for us in this particular case, but it might make debugging failures quicker in the future.

> As for 0004, it seems to me that we might want to do a little more
> rewording of these messages and perhaps we should try to do it all at
> once. Like, for example, your other change to print out the toast
> value ID seems like a good idea, and could apply to any new messages
> as well as some existing ones. Maybe there are also more fields in the
> TOAST pointer for which we could add checks.

Of the toast pointer fields:

    int32       va_rawsize;     /* Original data size (includes header) */
    int32       va_extsize;     /* External saved size (doesn't) */
    Oid         va_valueid;     /* Unique ID of value within TOAST table */
    Oid         va_toastrelid;  /* RelID of TOAST table containing it */

all seem worth getting as part of any toast error message, even if these fields themselves are not corrupt.  It just
makesit easier to understand the context of the error you're looking at.  At first I tried putting these into each
message,but it is very wordy to say things like "toast pointer with rawsize %u and extsize %u pointing at relation with
oid%u" and such.  It made more sense to just add these four fields to the verify_heapam tuple format.  That saves
puttingthem in the message text itself, and has the benefit that you could filter the rows coming from verify_heapam()
forones where valueid is or is not null, for example.  This changes the external interface of verify_heapam, but I
didn'tbother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was added as part of the v14 development work
andhas not yet been released.  My assumption is that I can just change it, rather than making a new upgrade file. 

These patches fix the visibility rules and add extra toast checking.  Some of the previous patch material is not
included,since it is not clear to me if you wanted to commit any of it. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Tom Lane
Дата:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
>> On Mar 16, 2021, at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Since we now know that shutting autovacuum off makes the problem go
>> away, I don't see a reason to commit 0001. We should fix pg_amcheck
>> instead, if, as presently seems to be the case, that's where the
>> problem is.

> If you get unlucky, autovacuum will process one of the tables that the test intentionally corrupted, with bad
consequences,ultimately causing build farm intermittent test failures. 

Um, yeah, the test had better shut off autovacuum on any table that
it intentionally corrupts.

            regards, tom lane



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Mar 18, 2021 at 12:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mark Dilger <mark.dilger@enterprisedb.com> writes:
> >> On Mar 16, 2021, at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Since we now know that shutting autovacuum off makes the problem go
> >> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> >> instead, if, as presently seems to be the case, that's where the
> >> problem is.
>
> > If you get unlucky, autovacuum will process one of the tables that the test intentionally corrupted, with bad
consequences,ultimately causing build farm intermittent test failures.
 
>
> Um, yeah, the test had better shut off autovacuum on any table that
> it intentionally corrupts.

Right, good point. But... does that really apply to
005_opclass_damage.pl? I feel like with the kind of physical damage
we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
from crashing headlong into that table. But, 005 is doing "logical"
damage rather than "physical" damage, and I don't see why autovacuum
should misbehave in that kind of case. In fact, the fact that
autovacuum can handle such cases is one of the selling points for the
whole design of vacuum, as opposed to, for example, retail index
lookups.

Pending resolution of that question, I've committed the change to
disable autovacuum in 003, and also Mark's changes to have it also run
pg_amcheck BEFORE corrupting anything, so the post-corruption tests
fail - say by finding the wrong kind of corruption - we can see
whether it was also failing before the corruption was even introduced.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 23, 2021, at 12:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

That is a good point.  Checking that autovacuum behaves sensibly despite sort order breakage sounds reasonable, but
test005 doesn't do that reliably, because it does nothing to make sure that autovacuum runs against the affected table
duringthe short window when the affected table exists.  All the same, I don't see that turning autovacuum off is
required. If autovacuum is broken in this regard, we may get occasional, hard to reproduce build farm failures, but
thatwould be more informative than no failures at all. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Peter Geoghegan
Дата:
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Right, good point. But... does that really apply to
> 005_opclass_damage.pl? I feel like with the kind of physical damage
> we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
> from crashing headlong into that table. But, 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

FWIW that is only 99.9% true (contrary to what README.HOT says). This
is the case because nbtree page deletion will in fact search the tree
to find a downlink to the target page, which must be removed at the
same time -- see the call to _bt_search() made within nbtpage.c.

This is much less of a problem than you'd think, though, even with an
opclass that gives wrong answers all the time. Because it's also true
that _bt_getstackbuf() is remarkably tolerant when it actually goes to
locate the downlink -- because that happens via a linear search that
matches on downlink block number (it doesn't use the opclass for that
part). This means that we'll accidentally fail to fail if the page is
*somewhere* to the right in the "true" key space. Which probably means
that it has a greater than 50% chance of not failing with a 100%
broken opclass. Which probably makes our odds better with more
plausible levels of misbehavior (e.g. collation changes).

That being said, I should make _bt_lock_subtree_parent() return false
and back out of page deletion without raising an error in the case
where we really cannot locate a valid downlink. We really ought to
soldier on when that happens, since we'll do that for a bunch of other
reasons already. I believe that the only reason we throw an error
today is for parity with the page split case (the main
_bt_getstackbuf() call). But this isn't the same situation at all --
this is VACUUM.

I will make this change to HEAD soon, barring objections.

-- 
Peter Geoghegan



Re: pg_amcheck contrib application

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> That being said, I should make _bt_lock_subtree_parent() return false
> and back out of page deletion without raising an error in the case
> where we really cannot locate a valid downlink. We really ought to
> soldier on when that happens, since we'll do that for a bunch of other
> reasons already. I believe that the only reason we throw an error
> today is for parity with the page split case (the main
> _bt_getstackbuf() call). But this isn't the same situation at all --
> this is VACUUM.

> I will make this change to HEAD soon, barring objections.

+1.  Not deleting the upper page seems better than the alternatives.

            regards, tom lane



Re: pg_amcheck contrib application

От
Peter Geoghegan
Дата:
On Tue, Mar 23, 2021 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I will make this change to HEAD soon, barring objections.
>
> +1.  Not deleting the upper page seems better than the alternatives.

FWIW it might also work that way as a holdover from the old page
deletion algorithm. These days we decide exactly which pages (leaf
page plus possible internal pages) are to be deleted as a whole up
front (these are a subtree, though usually just a degenerate
single-leaf-page subtree -- internal page deletions are rare).

One of the advantages of this design is that we verify practically all
of the work involved in deleting an entire subtree up-front, inside
_bt_lock_subtree_parent(). It's clearly safe to back out of it if it
looks dicey.

--
Peter Geoghegan



Re: pg_amcheck contrib application

От
Peter Geoghegan
Дата:
On Tue, Mar 23, 2021 at 12:53 PM Peter Geoghegan <pg@bowt.ie> wrote:
> One of the advantages of this design is that we verify practically all
> of the work involved in deleting an entire subtree up-front, inside
> _bt_lock_subtree_parent(). It's clearly safe to back out of it if it
> looks dicey.

That's taken care of. I just pushed a commit that teaches
_bt_lock_subtree_parent() to press on.

-- 
Peter Geoghegan



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 17, 2021, at 9:00 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> Of the toast pointer fields:
>
>    int32       va_rawsize;     /* Original data size (includes header) */
>    int32       va_extsize;     /* External saved size (doesn't) */
>    Oid         va_valueid;     /* Unique ID of value within TOAST table */
>    Oid         va_toastrelid;  /* RelID of TOAST table containing it */
>
> all seem worth getting as part of any toast error message, even if these fields themselves are not corrupt.  It just
makesit easier to understand the context of the error you're looking at. At first I tried putting these into each
message,but it is very wordy to say things like "toast pointer with rawsize %u and extsize %u pointing at relation with
oid%u" and such.  It made more sense to just add these four fields to the verify_heapam tuple format.  That saves
puttingthem in the message text itself, and has the benefit that you could filter the rows coming from verify_heapam()
forones where valueid is or is not null, for example.  This changes the external interface of verify_heapam, but I
didn'tbother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was added as part of the v14 development work
andhas not yet been released.  My assumption is that I can just change it, rather than making a new upgrade file. 
>
> These patches fix the visibility rules and add extra toast checking.

These new patches address the same issues as v9 (which was never committed), and v10 (which was never even posted to
thislist), with some changes. 

Rather than print out all four toast pointer fields for each toast failure, va_rawsize, va_extsize, and va_toastrelid
areonly mentioned in the corruption message if they are related to the specific corruption.  Otherwise, just the
va_valueidis mentioned in the corruption message. 

The visibility rules fix is different in v11, relying on a visibility check which more closely follows the
implementationof HeapTupleSatisfiesVacuumHorizon. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> The visibility rules fix is different in v11, relying on a visibility check which more closely follows the
implementationof HeapTupleSatisfiesVacuumHorizon.
 

Hmm. The header comment you wrote says "If a tuple might not be
visible to any running transaction, then we must not check it." But, I
don't find that statement very clear: does it mean "if there could be
even one transaction to which this tuple is not visible, we must not
check it"? Or does it mean "if the number of transactions that can see
this tuple could potentially be zero, then we must not check it"? I
don't think either of those is actually what we care about. I think
what we should be saying is "if the tuple could have been inserted by
a transaction that also added a column to the table, but which
ultimately did not commit, then the table's current TupleDesc might
differ from the one used to construct this tuple, so we must not check
it."

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 24, 2021 at 9:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > The visibility rules fix is different in v11, relying on a visibility check which more closely follows the
implementationof HeapTupleSatisfiesVacuumHorizon.
 
>
> Hmm. The header comment you wrote says "If a tuple might not be
> visible to any running transaction, then we must not check it." But, I
> don't find that statement very clear: does it mean "if there could be
> even one transaction to which this tuple is not visible, we must not
> check it"? Or does it mean "if the number of transactions that can see
> this tuple could potentially be zero, then we must not check it"? I
> don't think either of those is actually what we care about. I think
> what we should be saying is "if the tuple could have been inserted by
> a transaction that also added a column to the table, but which
> ultimately did not commit, then the table's current TupleDesc might
> differ from the one used to construct this tuple, so we must not check
> it."

Hit send too soon. And I was wrong, too. Wahoo. Thinking about the
buildfarm failure, I realized that there's a second danger here,
unrelated to the possibility of different TupleDescs, which we talked
about before: if the tuple is dead, we can't safely follow any TOAST
pointers, because the TOAST chunks might disappear at any time. So
technically we could split the return value up into something
three-way: if the inserted is known to have committed, we can check
the tuple itself, because the TupleDesc has to be reasonable. And, if
the tuple is known not to be dead already, and known not to be in a
state where it could become dead while we're doing stuff, we can
follow the TOAST pointer. I'm not sure whether it's worth trying to be
that fancy or not.

If we were only concerned about the mismatched-TupleDesc problem, this
function could return true in a lot more cases. Once we get to the
comment that says "Okay, the inserter committed..." we could just
return true. Similarly, the HEAP_MOVED_IN and HEAP_MOVED_OFF cases
could just skip all the interior test and return true, because if the
tuple is being moved, the original inserter has to have committed.
Conversely, however, the !HeapTupleHeaderXminCommitted ->
TransactionIdIsCurrentTransactionId case probably ought to always
return false. One could argue otherwise: if we're the inserter, then
the only in-progress transaction that might have changed the TupleDesc
is us, so we could just consider this case to be a true return value
also, regardless of what's going on with xmax. In that case, we're not
asking "did the inserter definitely commit?" but "are the inserter's
possible DDL changes definitely visible to us?" which might be an OK
definition too.

However, the could-the-TOAST-data-disappear problem is another story.
I don't see how we can answer that question correctly with the logic
you've got here, because you have no XID threshold. Consider the case
where we reach this code:

+       if (!(tuphdr->t_infomask & HEAP_XMAX_COMMITTED))
+       {
+               if
(TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+                       return true;            /*
HEAPTUPLE_DELETE_IN_PROGRESS */
+               else if
(!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuphdr)))
+
+                       /*
+                        * Not in Progress, Not Committed, so either
Aborted or crashed
+                        */
+                       return true;            /* HEAPTUPLE_LIVE */
+
+               /* At this point the xmax is known committed */
+       }

If we reach the case where the code comment says
HEAPTUPLE_DELETE_IN_PROGRESS, we know that the tuple isn't dead right
now, and so the TOAST tuples aren't dead either. But, by the time we
go try to look at the TOAST tuples, they might have become dead and
been pruned away, because the deleting transaction can commit at any
time, and after that pruning can happen at any time. Our only
guarantee that that won't happen is if the deleting XID is new enough
that it's invisible to some snapshot that our backend has registered.
That's approximately why HeapTupleSatisfiesVacuumHorizon needs to set
*dead_after in this case and one other, and I think you have the same
requirement.

I just noticed that this whole thing has another, related problem:
check_tuple_header_and_visibilty() and check_tuple_attribute() are
called from within check_tuple(), which is called while we hold a
buffer lock on the heap page. We should not be going and doing complex
operations that might take their own buffer locks - like TOAST index
checks - while we're holding an lwlock. That's going to have to be
changed so that the TOAST pointer checking happens after
UnlockReleaseBuffer(); in other words, we'll need to remember the
TOAST pointers to go look up and actually look them up after
UnlockReleaseBuffer(). But, when we do that, then the HEAPTUPLE_LIVE
case above has the same race condition that is already present in the
HEAPTUPLE_DELETE_IN_PROGRESS case: after we release the buffer pin,
some other transaction might delete the tuple and the associated TOAST
tuples, and they might then commit, and the tuple might become dead
and get pruned away before we check the TOAST table.

On a related note, I notice that your latest patch removes all the
logic that complains about XIDs being out of bounds. I don't think
that's good. Those seem like important checks. They're important for
finding problems with the relation,  and I think we probably also need
them because of the XID-horizon issue mentioned above. One possible
way of looking at it is to say that the XID_BOUNDS_OK case has two
sub-cases: either the XID is within bounds and is one that cannot
become all-visible concurrently because it's not visible to all of our
backend's registered snapshots, or it's within bounds but does have
the possibility of becoming all-visible. In the former case, if it
appears as XMAX we can safely follow TOAST pointers found within the
tuple; in the latter case, we can't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
Mark,

Here's a quick and very dirty sketch of what I think perhaps this
logic could look like. This is pretty much untested and it might be
buggy, but at least you can see whether we're thinking at all in the
same direction.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 24, 2021, at 1:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Mark,
>
> Here's a quick and very dirty sketch of what I think perhaps this
> logic could look like. This is pretty much untested and it might be
> buggy, but at least you can see whether we're thinking at all in the
> same direction.

Thanks!  The attached patch addresses your comments here and in your prior email.  In particular, this patch changes
thetuple visibility logic to not check tuples for which the inserting transaction aborted or is still in progress, and
tonot check toast for tuples deleted in transactions older than our transaction snapshot's xmin.  A list of toasted
attributeswhich are safe to check is compiled per main table page during the scan of the page, then checked after the
bufferlock on the main page is released. 

In the perhaps unusual case where verify_heapam() is called in a transaction which has also added tuples to the table
beingchecked, this patch's visibility logic chooses not to check such tuples.  I'm on the fence about this choice, and
ammostly following your lead.  I like that this decision maintains the invariant that we never check tuples which have
notyet been committed. 

The patch includes a bit of refactoring.  In the old code, heap_check() performed clog bounds checking on xmin and xmax
priorto calling check_tuple_header_and_visibilty(), but I think that's not such a great choice.  If the tuple header is
garbledto have random bytes in the xmin and xmax fields, and we can detect that situation because other tuple header
fieldsare garbled in detectable ways, I'd rather get a report about the header being garbled than a report about the
xminor xmax being out of bounds.  In the new code, the tuple header is checked first, then the visibility is checked,
thenthe tuple is checked against the current relation description, then the tuple attributes are checked.  I think the
layoutis easier to follow, too. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Mon, Mar 29, 2021 at 1:45 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Thanks!  The attached patch addresses your comments here and in your prior email.  In particular, this patch changes
thetuple visibility logic to not check tuples for which the inserting transaction aborted or is still in progress, and
tonot check toast for tuples deleted in transactions older than our transaction snapshot's xmin.  A list of toasted
attributeswhich are safe to check is compiled per main table page during the scan of the page, then checked after the
bufferlock on the main page is released. 
>
> In the perhaps unusual case where verify_heapam() is called in a transaction which has also added tuples to the table
beingchecked, this patch's visibility logic chooses not to check such tuples.  I'm on the fence about this choice, and
ammostly following your lead.  I like that this decision maintains the invariant that we never check tuples which have
notyet been committed. 
>
> The patch includes a bit of refactoring.  In the old code, heap_check() performed clog bounds checking on xmin and
xmaxprior to calling check_tuple_header_and_visibilty(), but I think that's not such a great choice.  If the tuple
headeris garbled to have random bytes in the xmin and xmax fields, and we can detect that situation because other tuple
headerfields are garbled in detectable ways, I'd rather get a report about the header being garbled than a report about
thexmin or xmax being out of bounds.  In the new code, the tuple header is checked first, then the visibility is
checked,then the tuple is checked against the current relation description, then the tuple attributes are checked.  I
thinkthe layout is easier to follow, too. 

Hmm, so this got ~10x bigger from my version. Could you perhaps
separate it out into a series of patches for easier review? Say, one
that just fixes the visibility logic, and then a second to avoid doing
the TOAST check with a buffer lock held, and then more than that if
there are other pieces that make sense to separate out?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 29, 2021, at 1:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 
> Hmm, so this got ~10x bigger from my version. Could you perhaps
> separate it out into a series of patches for easier review? Say, one
> that just fixes the visibility logic, and then a second to avoid doing
> the TOAST check with a buffer lock held, and then more than that if
> there are other pieces that make sense to separate out?

Sure, here are four patches which do the same as the single v12 patch did.



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Sure, here are four patches which do the same as the single v12 patch did.

Thanks. Here are some comments on 0003 and 0004:

When you posted v11, you said that "Rather than print out all four
toast pointer fields for each toast failure, va_rawsize, va_extsize,
and va_toastrelid are only mentioned in the corruption message if they
are related to the specific corruption.  Otherwise, just the
va_valueid is mentioned in the corruption message." I like that
principal; in fact, as you know, I suggested it. But, with the v13
patches applied, exactly zero of the callers to
report_toast_corruption() appear to be following it, because none of
them include the value ID. I think you need to revise the messages,
e.g. "toasted value for attribute %u missing from toast table" ->
"toast value %u not found in toast table"; "final toast chunk number
%u differs from expected value %u" -> "toast value %u was expected to
end at chunk %u, but ended at chunk %u"; "toast chunk sequence number
is null" -> "toast value %u has toast chunk with null sequence
number". In the first of those example cases, I think you need not
mention the attribute number because it's already there in its own
column.

On a related note, it doesn't look like you are actually checking
va_toastrelid here. Doing so seems like it would be a good idea. It
also seems like it would be good to check that the compressed size is
less than or equal to the uncompressed size.

I do not like the name tuple_is_volatile, because volatile has a
couple of meanings already, and this isn't one of them. A
SQL-callable function is volatile if it might return different outputs
given the same inputs, even within the same SQL statement. A C
variable is volatile if it might be magically modified in ways not
known to the compiler. I had suggested tuple_cannot_die_now, which is
closer to the mark. If you want to be even more precise, you could
talk about whether the tuple is potentially prunable (e.g.
tuple_can_be_pruned, which inverts the sense). That's really what
we're worried about: whether MVCC rules would permit the tuple to be
pruned after we release the buffer lock and before we check TOAST.

I would ideally prefer not to rename report_corruption(). The old name
is clearer, and changing it produces a bunch of churn that I'd rather
avoid. Perhaps the common helper function could be called
report_corruption_internal(), and the callers could be
report_corruption() and report_toast_corruption().

Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer
to correct now, but I want to go through it in more detail. I think,
though, that you've made some of my comments worse. For example, I
wrote: "It should be impossible for xvac to still be running, since
we've removed all that code, but even if it were, it ought to be safe
to read the tuple, since the original inserter must have committed.
But, if the xvac transaction committed, this tuple (and its associated
TOAST tuples) could be pruned at any time." You changed that to read
"We don't bother comparing against safe_xmin because the VACUUM FULL
must have committed prior to an upgrade and can't still be running."
Your comment is shorter, which is a point in its favor, but what I was
trying to emphasize is that the logic would be correct EVEN IF we
again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your
version makes it sound like the code would need to be revised in that
case. If that's true, then my comment was wrong, but I didn't think it
was true, or I wouldn't have written the comment in that way.

Also, and maybe this is a job for a separate patch, but then again
maybe not, I wonder if it's really a good idea for get_xid_status to
return both a XidBoundsViolation and an XidCommitStatus. It seems to
me (without checking all that carefully) that it might be better to
just flatten all of that into a single enum, because right now it
seems like you often end up with two consecutive switch statements
where, perhaps, just one would suffice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 30, 2021, at 12:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> Sure, here are four patches which do the same as the single v12 patch did.
>
> Thanks. Here are some comments on 0003 and 0004:
>
> When you posted v11, you said that "Rather than print out all four
> toast pointer fields for each toast failure, va_rawsize, va_extsize,
> and va_toastrelid are only mentioned in the corruption message if they
> are related to the specific corruption.  Otherwise, just the
> va_valueid is mentioned in the corruption message." I like that
> principal; in fact, as you know, I suggested it. But, with the v13
> patches applied, exactly zero of the callers to
> report_toast_corruption() appear to be following it, because none of
> them include the value ID. I think you need to revise the messages,
> e.g.

These changes got lost between v11 and v12.  I've put them back, as well as updating to use your language.

> "toasted value for attribute %u missing from toast table" ->
> "toast value %u not found in toast table";

Changed.

> "final toast chunk number
> %u differs from expected value %u" -> "toast value %u was expected to
> end at chunk %u, but ended at chunk %u";

Changed.

> "toast chunk sequence number
> is null" -> "toast value %u has toast chunk with null sequence
> number".

Changed.

> In the first of those example cases, I think you need not
> mention the attribute number because it's already there in its own
> column.

Correct.  I'd removed that but lost that work in v12.

> On a related note, it doesn't look like you are actually checking
> va_toastrelid here. Doing so seems like it would be a good idea. It
> also seems like it would be good to check that the compressed size is
> less than or equal to the uncompressed size.

Yeah, those checks were in v11 but got lost when I changed things for v12.  They are back in v14.

> I do not like the name tuple_is_volatile, because volatile has a
> couple of meanings already, and this isn't one of them. A
> SQL-callable function is volatile if it might return different outputs
> given the same inputs, even within the same SQL statement. A C
> variable is volatile if it might be magically modified in ways not
> known to the compiler. I had suggested tuple_cannot_die_now, which is
> closer to the mark. If you want to be even more precise, you could
> talk about whether the tuple is potentially prunable (e.g.
> tuple_can_be_pruned, which inverts the sense). That's really what
> we're worried about: whether MVCC rules would permit the tuple to be
> pruned after we release the buffer lock and before we check TOAST.

I used "tuple_can_be_pruned".  I didn't like "tuple_cannot_die_now", and still don't like that name, as it has several
wronginterpretations.  One meaning of "cannot die now" is that it has become immortal.  Another is "cannot be deleted
fromthe table".   

> I would ideally prefer not to rename report_corruption(). The old name
> is clearer, and changing it produces a bunch of churn that I'd rather
> avoid. Perhaps the common helper function could be called
> report_corruption_internal(), and the callers could be
> report_corruption() and report_toast_corruption().

Yes, hence the commit message in the previous patch set, "This patch can probably be left out if the committer believes
itcreates more git churn than it is worth."  I've removed this patch from this next patch set, and used the function
namesyou suggest. 

> Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer
> to correct now, but I want to go through it in more detail. I think,
> though, that you've made some of my comments worse. For example, I
> wrote: "It should be impossible for xvac to still be running, since
> we've removed all that code, but even if it were, it ought to be safe
> to read the tuple, since the original inserter must have committed.
> But, if the xvac transaction committed, this tuple (and its associated
> TOAST tuples) could be pruned at any time." You changed that to read
> "We don't bother comparing against safe_xmin because the VACUUM FULL
> must have committed prior to an upgrade and can't still be running."
> Your comment is shorter, which is a point in its favor, but what I was
> trying to emphasize is that the logic would be correct EVEN IF we
> again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your
> version makes it sound like the code would need to be revised in that
> case. If that's true, then my comment was wrong, but I didn't think it
> was true, or I wouldn't have written the comment in that way.

I think the logic would have to change if we brought back the old VACUUM FULL behavior.

I'm not looking at the old VACUUM FULL code, but my assumption is that if the xvac code were resurrected, then when a
tupleis moved off by a VACUUM FULL, the old tuple and associated toast cannot be pruned until concurrent transactions
end. So, if amcheck is running more-or-less concurrently with the VACUUM FULL and has a snapshot xmin no newer than the
xidof the VACUUM FULL's xid, it can check the toast associated with the moved off tuple after the VACUUM FULL commits.
Ifinstead the VACUUM FULL xid was older than amcheck's xmin, then the toast is in danger of being vacuumed away.  So
thelogic in verify_heapam would need to change to think about this distinction.  We don't have to concern ourselves
aboutthat, because VACUUM FULL cannot be running, and so the xid for it must be older than our xmin, and hence the
toastis unconditionally not safe to check. 

I'm changing the comments back to how you had them, but I'd like to know why my reasoning is wrong.

> Also, and maybe this is a job for a separate patch, but then again
> maybe not, I wonder if it's really a good idea for get_xid_status to
> return both a XidBoundsViolation and an XidCommitStatus. It seems to
> me (without checking all that carefully) that it might be better to
> just flatten all of that into a single enum, because right now it
> seems like you often end up with two consecutive switch statements
> where, perhaps, just one would suffice.

get_xid_status was written to return XidBoundsViolation separately from returning by reference an XidCommitStatus
because,if you pass null for the XidCommitStatus parameter, the function can return earlier without taking the
XactTruncationLockand checking clog.  I think that design made a lot of sense at the time get_xid_status was written,
butthere are no longer any callers passing null, so the function never returns early. 

I am hesitant to refactor get_xid_status as you suggest until we're sure no such callers who pass null are needed.  So
perhapsyour idea of having that change as a separate patch for after this patch series is done and committed is the
rightstrategy. 

Also, even now, there are some places where the returned XidBoundsViolation is used right away, but some other
processinghappens before the XidCommitStatus is finally used.  If they were one value in a merged enum, there would
stillbe two switches at least in the location I'm thinking of. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I'm not looking at the old VACUUM FULL code, but my assumption is that if the xvac code were resurrected, then when a
tupleis moved off by a VACUUM FULL, the old tuple and associated toast cannot be pruned until concurrent transactions
end. So, if amcheck is running more-or-less concurrently with the VACUUM FULL and has a snapshot xmin no newer than the
xidof the VACUUM FULL's xid, it can check the toast associated with the moved off tuple after the VACUUM FULL commits.
Ifinstead the VACUUM FULL xid was older than amcheck's xmin, then the toast is in danger of being vacuumed away.  So
thelogic in verify_heapam would need to change to think about this distinction.  We don't have to concern ourselves
aboutthat, because VACUUM FULL cannot be running, and so the xid for it must be older than our xmin, and hence the
toastis unconditionally not safe to check. 
>
> I'm changing the comments back to how you had them, but I'd like to know why my reasoning is wrong.

Let's start by figuring out *whether* your reasoning is wrong. My
assumption was that old-style VACUUM FULL would move tuples without
retoasting. That is, if we decided to move a tuple from page 2 of the
main table to page 1, we would just write the tuple into page 1,
marking it moved-in, and on page 2 we would mark it moved-off. And
that we would not examine or follow any TOAST pointers at all, so
whatever TOAST entries existed would be shared between the two tuples.
One tuple or the other would eventually die, depending on whether xvac
went on to commit or abort, but either way the TOAST doesn't need
updating because there's always exactly 1 remaining tuple using
pointers to those TOAST values.

Your assumption seems to be the opposite, that the TOASTed values
would be retoasted as part of VF. If that is true, then your idea is
right.

Do you agree with this analysis? If so, we can check the code and find
out which way it actually worked.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 31, 2021, at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> I'm not looking at the old VACUUM FULL code, but my assumption is that if the xvac code were resurrected, then when
atuple is moved off by a VACUUM FULL, the old tuple and associated toast cannot be pruned until concurrent transactions
end. So, if amcheck is running more-or-less concurrently with the VACUUM FULL and has a snapshot xmin no newer than the
xidof the VACUUM FULL's xid, it can check the toast associated with the moved off tuple after the VACUUM FULL commits.
Ifinstead the VACUUM FULL xid was older than amcheck's xmin, then the toast is in danger of being vacuumed away.  So
thelogic in verify_heapam would need to change to think about this distinction.  We don't have to concern ourselves
aboutthat, because VACUUM FULL cannot be running, and so the xid for it must be older than our xmin, and hence the
toastis unconditionally not safe to check. 
>>
>> I'm changing the comments back to how you had them, but I'd like to know why my reasoning is wrong.
>
> Let's start by figuring out *whether* your reasoning is wrong. My
> assumption was that old-style VACUUM FULL would move tuples without
> retoasting. That is, if we decided to move a tuple from page 2 of the
> main table to page 1, we would just write the tuple into page 1,
> marking it moved-in, and on page 2 we would mark it moved-off. And
> that we would not examine or follow any TOAST pointers at all, so
> whatever TOAST entries existed would be shared between the two tuples.
> One tuple or the other would eventually die, depending on whether xvac
> went on to commit or abort, but either way the TOAST doesn't need
> updating because there's always exactly 1 remaining tuple using
> pointers to those TOAST values.
>
> Your assumption seems to be the opposite, that the TOASTed values
> would be retoasted as part of VF. If that is true, then your idea is
> right.
>
> Do you agree with this analysis? If so, we can check the code and find
> out which way it actually worked.

Actually, that makes a lot of sense without even looking at the old code.  I was implicitly assuming that the toast
tablewas undergoing a VF also, and that the toast pointers in the main table tuples would be updated to point to the
newlocation, so we'd be unable to follow the pointers to the old location without danger of the old location entries
beingvacuumed away.  But if the main table tuples get moved while keeping their toast pointers unaltered, then you
don'thave to worry about that, although you do have to worry that a VF of the main table doesn't help so much with
toasttable bloat. 

We're only discussing this in order to craft the right comment for a bit of code with respect to a hypothetical
situationin which VF gets resurrected, so I'm not sure this should be top priority, but I'm curious enough now to go
readthe old code.... 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 31, 2021 at 1:31 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Actually, that makes a lot of sense without even looking at the old code.  I was implicitly assuming that the toast
tablewas undergoing a VF also, and that the toast pointers in the main table tuples would be updated to point to the
newlocation, so we'd be unable to follow the pointers to the old location without danger of the old location entries
beingvacuumed away.  But if the main table tuples get moved while keeping their toast pointers unaltered, then you
don'thave to worry about that, although you do have to worry that a VF of the main table doesn't help so much with
toasttable bloat. 
>
> We're only discussing this in order to craft the right comment for a bit of code with respect to a hypothetical
situationin which VF gets resurrected, so I'm not sure this should be top priority, but I'm curious enough now to go
readthe old code.... 

Right, well, we wouldn't be PostgreSQL hackers if we didn't spend lots
of time worrying about obscure details. Whether that's good software
engineering or mere pedantry is sometimes debatable.

I took a look at commit 0a469c87692d15a22eaa69d4b3a43dd8e278dd64,
which removed old-style VACUUM FULL, and AFAICS, it doesn't contain
any references to tuple deforming, varlena, HeapTupleHasExternal, or
anything else that would make me think it has the foggiest idea
whether the tuples it's moving around contain TOAST pointers, so I
think I had the right idea.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Mar 31, 2021, at 10:31 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>
>
>> On Mar 31, 2021, at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
>> <mark.dilger@enterprisedb.com> wrote:
>>> I'm not looking at the old VACUUM FULL code, but my assumption is that if the xvac code were resurrected, then when
atuple is moved off by a VACUUM FULL, the old tuple and associated toast cannot be pruned until concurrent transactions
end. So, if amcheck is running more-or-less concurrently with the VACUUM FULL and has a snapshot xmin no newer than the
xidof the VACUUM FULL's xid, it can check the toast associated with the moved off tuple after the VACUUM FULL commits.
Ifinstead the VACUUM FULL xid was older than amcheck's xmin, then the toast is in danger of being vacuumed away.  So
thelogic in verify_heapam would need to change to think about this distinction.  We don't have to concern ourselves
aboutthat, because VACUUM FULL cannot be running, and so the xid for it must be older than our xmin, and hence the
toastis unconditionally not safe to check. 
>>>
>>> I'm changing the comments back to how you had them, but I'd like to know why my reasoning is wrong.
>>
>> Let's start by figuring out *whether* your reasoning is wrong. My
>> assumption was that old-style VACUUM FULL would move tuples without
>> retoasting. That is, if we decided to move a tuple from page 2 of the
>> main table to page 1, we would just write the tuple into page 1,
>> marking it moved-in, and on page 2 we would mark it moved-off. And
>> that we would not examine or follow any TOAST pointers at all, so
>> whatever TOAST entries existed would be shared between the two tuples.
>> One tuple or the other would eventually die, depending on whether xvac
>> went on to commit or abort, but either way the TOAST doesn't need
>> updating because there's always exactly 1 remaining tuple using
>> pointers to those TOAST values.
>>
>> Your assumption seems to be the opposite, that the TOASTed values
>> would be retoasted as part of VF. If that is true, then your idea is
>> right.
>>
>> Do you agree with this analysis? If so, we can check the code and find
>> out which way it actually worked.
>
> Actually, that makes a lot of sense without even looking at the old code.  I was implicitly assuming that the toast
tablewas undergoing a VF also, and that the toast pointers in the main table tuples would be updated to point to the
newlocation, so we'd be unable to follow the pointers to the old location without danger of the old location entries
beingvacuumed away. But if the main table tuples get moved while keeping their toast pointers unaltered, then you don't
haveto worry about that, although you do have to worry that a VF of the main table doesn't help so much with toast
tablebloat. 
>
> We're only discussing this in order to craft the right comment for a bit of code with respect to a hypothetical
situationin which VF gets resurrected, so I'm not sure this should be top priority, but I'm curious enough now to go
readthe old code.... 

Well, that's annoying.  The documentation of postgres 8.2 for vacuum full [1] says,

  Selects "full" vacuum, which may reclaim more space, but takes much longer and exclusively locks the table.

I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code shows that it takes an
AccessExclusiveLock. I think the docs are pretty misleading here, though I understand that grammatically it is hard to
say"accessively exclusively locks" or such.  But a part of my analysis was based on the reasoning that if VF only takes
anExclusiveLock, then there must be concurrent readers possible.  VF went away long enough ago that I had forgotten
exactlyhow inconvenient it was. 

[1] https://www.postgresql.org/docs/8.2/sql-vacuum.html

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 31, 2021 at 1:44 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code shows that it takes an
AccessExclusiveLock. I think the docs are pretty misleading here, though I understand that grammatically it is hard to
say"accessively exclusively locks" or such.  But a part of my analysis was based on the reasoning that if VF only takes
anExclusiveLock, then there must be concurrent readers possible.  VF went away long enough ago that I had forgotten
exactlyhow inconvenient it was. 

It kinda depends on what you mean by concurrent readers, because a
transaction that could start on Monday and acquire an XID, and then on
Tuesday you could run VACUUM FULL on relation "foo", and then on
Wednesday the transaction from before could get around to reading some
data from "foo". The two transactions are concurrent, in the sense
that the 3-day transaction was running before the VACUUM FULL, was
still running after VACUUM FULL, read the same pages that the VACUUM
FULL modified, and cares whether the XID from the VACUUM FULL
committed or aborted. But, it's not concurrent in the sense that you
never have a situation where the VACUUM FULL does some of its
modifications, then an overlapping transaction sees them, and then it
does the rest of them.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> These changes got lost between v11 and v12.  I've put them back, as well as updating to use your language.

Here's an updated patch that includes your 0001 and 0002 plus a bunch
of changes by me. I intend to commit this version unless somebody
spots a problem with it.

Here are the things I changed:

- I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
think it does a better job suggesting that we're uncertain about what
will happen.

- I got rid of bool header_garbled and changed it to bool result,
inverting the sense, because it didn't seem useful to have a function
that ended with if (some_boolean) return false; return true when I
could end the function with return some_other_boolean.

- I removed all the one-word comments that said /* corrupt */ or /*
checkable */ because they seemed redundant.

- In the xmin_status section of check_tuple_visibility(), I got rid of
the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
XID_IN_PROGRESS cases because they were redundant with the xmin_status
!= XID_COMMITTED case.

- If xmax is a multi but seems to be garbled, I changed it to return
true rather than false. The inserter is known to have committed by
that point, so I think it's OK to try to deform the tuple. We just
shouldn't try to check TOAST.

- I changed both switches over xmax_status to break in each case and
then return true after instead of returning true for each case. I
think this is clearer.

- I changed get_xid_status() to perform the TransactionIdIs... checks
in the same order that HeapTupleSatisfies...() does them. I believe
that it's incorrect to conclude that the transaction must be in
progress because it neither IsCurrentTransaction nor DidCommit nor
DidAbort, because all of those things will be false for a transaction
that is running at the time of a system crash. The correct rule is
that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
then it's aborted.

- I moved a few comments and rewrote some others, including some of
the ones that you took from my earlier draft patch. The thing is, the
comment needs to be adjusted based on where you put it. Like, I had a
comment that says"It should be impossible for xvac to still be
running, since we've removed all that code, but even if it were, it
ought to be safe to read the tuple, since the original inserter must
have committed.  But, if the xvac transaction committed, this tuple
(and its associated TOAST tuples) could be pruned at any time." which
in my version was right before a TransactionIdDidCommit() test, and
explains why that test is there and why the code does what it does as
a result. But in your version you've moved it to a place where we've
already tested that the transaction has committed, and more
importantly, where we've already tested that it's not still running.
Saying that it "should" be impossible for it not to be running when
we've *actually checked that* doesn't make nearly as much sense as it
does when we haven't checked that and aren't going to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 1, 2021, at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> These changes got lost between v11 and v12.  I've put them back, as well as updating to use your language.
>
> Here's an updated patch that includes your 0001 and 0002 plus a bunch
> of changes by me. I intend to commit this version unless somebody
> spots a problem with it.
>
> Here are the things I changed:
>
> - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
> think it does a better job suggesting that we're uncertain about what
> will happen.

+1

> - I got rid of bool header_garbled and changed it to bool result,
> inverting the sense, because it didn't seem useful to have a function
> that ended with if (some_boolean) return false; return true when I
> could end the function with return some_other_boolean.

+1

> - I removed all the one-word comments that said /* corrupt */ or /*
> checkable */ because they seemed redundant.

Ok.

> - In the xmin_status section of check_tuple_visibility(), I got rid of
> the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
> XID_IN_PROGRESS cases because they were redundant with the xmin_status
> != XID_COMMITTED case.

Ok.

> - If xmax is a multi but seems to be garbled, I changed it to return
> true rather than false. The inserter is known to have committed by
> that point, so I think it's OK to try to deform the tuple. We just
> shouldn't try to check TOAST.

It is hard to know what to do when at least one tuple header field is corrupt.  You don't necesarily know which one it
is. For example, if HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it is out of bounds, we
reportit as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took
theview that if either xmin or xmax appear to be corrupt when interpreted in light of the various tuple header bits,
allwe really know is that the set of fields/bits don't make sense as a whole, so we report corruption, don't trust any
ofthem, and abort further checking of the tuple.  You have be burden of proof the other way around.  If the xmin
appearsfine, and xmax appears corrupt, then we only know that xmax is corrupt, so the tuple is checkable because
accordingto the xmin it committed. 

I don't think how you have it causes undue problems, since deforming the tuple when you shouldn't merely risks a bunch
ofextra not-so-helpful corruption messages.  And hey, maybe they're helpful to somebody clever enough to diagnose why
thatparticular bit of noise was generated.  

> - I changed both switches over xmax_status to break in each case and
> then return true after instead of returning true for each case. I
> think this is clearer.

Ok.

> - I changed get_xid_status() to perform the TransactionIdIs... checks
> in the same order that HeapTupleSatisfies...() does them. I believe
> that it's incorrect to conclude that the transaction must be in
> progress because it neither IsCurrentTransaction nor DidCommit nor
> DidAbort, because all of those things will be false for a transaction
> that is running at the time of a system crash. The correct rule is
> that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
> then it's aborted.

Ok.

> - I moved a few comments and rewrote some others, including some of
> the ones that you took from my earlier draft patch. The thing is, the
> comment needs to be adjusted based on where you put it. Like, I had a
> comment that says"It should be impossible for xvac to still be
> running, since we've removed all that code, but even if it were, it
> ought to be safe to read the tuple, since the original inserter must
> have committed.  But, if the xvac transaction committed, this tuple
> (and its associated TOAST tuples) could be pruned at any time." which
> in my version was right before a TransactionIdDidCommit() test, and
> explains why that test is there and why the code does what it does as
> a result. But in your version you've moved it to a place where we've
> already tested that the transaction has committed, and more
> importantly, where we've already tested that it's not still running.
> Saying that it "should" be impossible for it not to be running when
> we've *actually checked that* doesn't make nearly as much sense as it
> does when we haven't checked that and aren't going to do so.


            * If xmin_status happens to be XID_IN_PROGRESS, then in theory

Did you mean to say XID_IS_CURRENT_XID here?

/* xmax is an MXID, not an MXID. Sanity check it. */

Is it an MXID or isn't it?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> > - If xmax is a multi but seems to be garbled, I changed it to return
> > true rather than false. The inserter is known to have committed by
> > that point, so I think it's OK to try to deform the tuple. We just
> > shouldn't try to check TOAST.
>
> It is hard to know what to do when at least one tuple header field is corrupt.  You don't necesarily know which one
itis.  For example, if HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it is out of bounds,
wereport it as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took
theview that if either xmin or xmax appear to be corrupt when interpreted in light of the various tuple header bits,
allwe really know is that the set of fields/bits don't make sense as a whole, so we report corruption, don't trust any
ofthem, and abort further checking of the tuple.  You have be burden of proof the other way around.  If the xmin
appearsfine, and xmax appears corrupt, then we only know that xmax is corrupt, so the tuple is checkable because
accordingto the xmin it committed. 

I agree that it's hard to be sure what's gone once we start finding
corrupted data, but deciding that maybe xmin didn't really commit
because we see that there's something wrong with xmax seems excessive
to me. I thought about a related case: if xmax is a bad multi but is
also hinted invalid, should we try to follow TOAST pointers? I think
that's hard to say, because we don't know whether (1) the invalid
marking is in error, (2) it's wrong to consider it a multi rather than
an XID, (3) the stored multi got overwritten with a garbage value, or
(4) the stored multi got removed before the tuple was frozen. Not
knowing which of those is the case, how are we supposed to decide
whether the TOAST tuples might have been (or be about to get) pruned?

But, in the case we're talking about here, I don't think it's a
particularly close decision. All we need to say is that if xmax or the
infomask bits pertaining to it are corrupted, we're still going to
suppose that xmin and the infomask bits pertaining to it, which are
all different bytes and bits, are OK. To me, the contrary decision,
namely that a bogus xmax means xmin was probably lying about the
transaction having been committed in the first place, seems like a
serious overreaction. As you say:

> I don't think how you have it causes undue problems, since deforming the tuple when you shouldn't merely risks a
bunchof extra not-so-helpful corruption messages.  And hey, maybe they're helpful to somebody clever enough to diagnose
whythat particular bit of noise was generated. 

I agree. The biggest risk here is that we might omit >0 complaints
when only 0 are justified. That will panic users. The possibility that
we might omit >x complaints when only x are justified, for some x>0,
is also a risk, but it's not nearly as bad, because there's definitely
something wrong, and it's just a question of what it is exactly. So we
have to be really conservative about saying that X is corruption if
there's any possibility that it might be fine. But once we've
complained about one thing, we can take a more balanced approach about
whether to risk issuing more complaints. The possibility that
suppressing the additional complaints might complicate resolution of
the issue also needs to be considered.

>             * If xmin_status happens to be XID_IN_PROGRESS, then in theory
>
> Did you mean to say XID_IS_CURRENT_XID here?

Yes, I did, thanks.

> /* xmax is an MXID, not an MXID. Sanity check it. */
>
> Is it an MXID or isn't it?

Good catch.

New patch attached.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 1, 2021, at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>>> - If xmax is a multi but seems to be garbled, I changed it to return
>>> true rather than false. The inserter is known to have committed by
>>> that point, so I think it's OK to try to deform the tuple. We just
>>> shouldn't try to check TOAST.
>>
>> It is hard to know what to do when at least one tuple header field is corrupt.  You don't necesarily know which one
itis.  For example, if HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it is out of bounds,
wereport it as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took
theview that if either xmin or xmax appear to be corrupt when interpreted in light of the various tuple header bits,
allwe really know is that the set of fields/bits don't make sense as a whole, so we report corruption, don't trust any
ofthem, and abort further checking of the tuple.  You have be burden of proof the other way around.  If the xmin
appearsfine, and xmax appears corrupt, then we only know that xmax is corrupt, so the tuple is checkable because
accordingto the xmin it committed. 
>
> I agree that it's hard to be sure what's gone once we start finding
> corrupted data, but deciding that maybe xmin didn't really commit
> because we see that there's something wrong with xmax seems excessive
> to me. I thought about a related case: if xmax is a bad multi but is
> also hinted invalid, should we try to follow TOAST pointers? I think
> that's hard to say, because we don't know whether (1) the invalid
> marking is in error, (2) it's wrong to consider it a multi rather than
> an XID, (3) the stored multi got overwritten with a garbage value, or
> (4) the stored multi got removed before the tuple was frozen. Not
> knowing which of those is the case, how are we supposed to decide
> whether the TOAST tuples might have been (or be about to get) pruned?
>
> But, in the case we're talking about here, I don't think it's a
> particularly close decision. All we need to say is that if xmax or the
> infomask bits pertaining to it are corrupted, we're still going to
> suppose that xmin and the infomask bits pertaining to it, which are
> all different bytes and bits, are OK. To me, the contrary decision,
> namely that a bogus xmax means xmin was probably lying about the
> transaction having been committed in the first place, seems like a
> serious overreaction. As you say:
>
>> I don't think how you have it causes undue problems, since deforming the tuple when you shouldn't merely risks a
bunchof extra not-so-helpful corruption messages.  And hey, maybe they're helpful to somebody clever enough to diagnose
whythat particular bit of noise was generated. 
>
> I agree. The biggest risk here is that we might omit >0 complaints
> when only 0 are justified. That will panic users. The possibility that
> we might omit >x complaints when only x are justified, for some x>0,
> is also a risk, but it's not nearly as bad, because there's definitely
> something wrong, and it's just a question of what it is exactly. So we
> have to be really conservative about saying that X is corruption if
> there's any possibility that it might be fine. But once we've
> complained about one thing, we can take a more balanced approach about
> whether to risk issuing more complaints. The possibility that
> suppressing the additional complaints might complicate resolution of
> the issue also needs to be considered.

This all seems fine to me.  The main thing is that we don't go on to check the toast, which we don't.

>>            * If xmin_status happens to be XID_IN_PROGRESS, then in theory
>>
>> Did you mean to say XID_IS_CURRENT_XID here?
>
> Yes, I did, thanks.

Ouch.  You've got a typo:  s/XID_IN_CURRENT_XID/XID_IS_CURRENT_XID/

>> /* xmax is an MXID, not an MXID. Sanity check it. */
>>
>> Is it an MXID or isn't it?
>
> Good catch.
>
> New patch attached.

Seems fine other than the typo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> Seems fine other than the typo.

OK, let's try that again.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 1, 2021, at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> Seems fine other than the typo.
>
> OK, let's try that again.

Looks good!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 1, 2021 at 1:24 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> > On Apr 1, 2021, at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > OK, let's try that again.
> Looks good!

OK, committed. We still need to deal with what you had as 0003
upthread, so I guess the next step is for me to spend some time
reviewing that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 1, 2021 at 1:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> OK, committed. We still need to deal with what you had as 0003
> upthread, so I guess the next step is for me to spend some time
> reviewing that one.

I did this, and it was a bit depressing. It appears that we now have
duplicate checks for xmin and xmax being out of the valid range.
Somehow you have the removal of those duplicate checks in 0003, but
why in the world didn't you put them into one of the previous patches
and just make 0003 about fixing the
holding-buffer-lock-while-following-TOAST-pointers problem? (And, gah,
why did I not look carefully enough to notice that you hadn't done
that?)

Other than that, I notice a few other things:

- There are a total of two (2) calls in the current source code to
palloc0fast, and hundreds of calls to palloc0. So I think you should
forget about using the fast variant and just do what almost every
other caller does.

- If you want to make this code faster, a better idea would be to
avoid doing all of this allocate and free work and just allocate an
array that's guaranteed to be big enough, and then keep track of how
many elements of that array are actually in use.

- #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
introducing such a feature. Either we do it for real and expose it via
SQL and pg_amcheck as an optional behavior, or we rip it out and
revisit it later. Given the nearness of feature freeze, my vote is for
the latter.

- I'd remove the USE_LZ4 bit, too. Let's not define the presence of
LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
people will expect to be able to use this to find places where they
are dependent on LZ4 if they want to move away from it -- and if we
don't recurse into composite datums, that will not actually work.

- check_toast_tuple() has an odd and slightly unclear calling
convention for which there are no comments. I wonder if it would be
better to reverse things and make bool *error the return value and
what is now the return value into a pointer argument, but whatever we
do I think it needs a few words in the comments. We don't need to
slavishly explain every argument -- I think toasttup and ctx and tctx
are reasonably clear -- but this is not.

- To me it would be more logical to reverse the order of the
toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
- VARHDRSZ checks. Whether we're pointing at the correct relation
feels more fundamental.

- If we moved the toplevel foreach loop in check_toasted_attributes()
out to the caller, say renaming the function to just
check_toasted_attribute(), we'd save a level of indentation in that
whole function and probably add a tad of clarity, too. You wouldn't
feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
if the caller had just done list_free(ctx->toasted_attributes);
ctx->toasted_attributes = NIL.

- Is there a reason we need a cross-check on both the number of chunks
and on the total size? It seems to me that we should check that each
individual chunk has the size we expect, and that the total number of
chunks is what we expect. The overall size is then necessarily
correct.

- Why are all the changes to the tests in this patch? What do they
have to do with getting the TOAST checks out from under the buffer
lock? I really need you to structure the patch series so that each
patch is about one topic and, equally, so that each topic is only
covered by one patch. Otherwise it's just way too confusing.

- I think some of these messages need a bit of word-smithing, too, but
we can leave that for when we're closer to being done with this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 1, 2021, at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>
>
> - There are a total of two (2) calls in the current source code to
> palloc0fast, and hundreds of calls to palloc0. So I think you should
> forget about using the fast variant and just do what almost every
> other caller does.

Done.

> - If you want to make this code faster, a better idea would be to
> avoid doing all of this allocate and free work and just allocate an
> array that's guaranteed to be big enough, and then keep track of how
> many elements of that array are actually in use.

Sounds like premature optimization to me.  I only used palloc0fast because the argument is compile-time known.  I
wasn'tspecifically attempting to speed anything up. 

> - #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
> introducing such a feature. Either we do it for real and expose it via
> SQL and pg_amcheck as an optional behavior, or we rip it out and
> revisit it later. Given the nearness of feature freeze, my vote is for
> the latter.
>
> - I'd remove the USE_LZ4 bit, too. Let's not define the presence of
> LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
> people will expect to be able to use this to find places where they
> are dependent on LZ4 if they want to move away from it -- and if we
> don't recurse into composite datums, that will not actually work.

Ok, I have removed this bit.  I also removed the part of the patch that introduced a new corruption check,
decompressingthe data to see if it decompresses without error. 

> - check_toast_tuple() has an odd and slightly unclear calling
> convention for which there are no comments. I wonder if it would be
> better to reverse things and make bool *error the return value and
> what is now the return value into a pointer argument, but whatever we
> do I think it needs a few words in the comments. We don't need to
> slavishly explain every argument -- I think toasttup and ctx and tctx
> are reasonably clear -- but this is not.
...
> - Is there a reason we need a cross-check on both the number of chunks
> and on the total size? It seems to me that we should check that each
> individual chunk has the size we expect, and that the total number of
> chunks is what we expect. The overall size is then necessarily
> correct.

Good point.  I've removed the extra check on the total size, since it cannot be wrong if the checks on the individual
chunksizes were all correct.  This eliminates the need for the odd calling convention for check_toast_tuple(), so I've
changedthat to return void and not take any return-by-reference arguments. 

> - To me it would be more logical to reverse the order of the
> toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
> VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
> - VARHDRSZ checks. Whether we're pointing at the correct relation
> feels more fundamental.

Done.

> - If we moved the toplevel foreach loop in check_toasted_attributes()
> out to the caller, say renaming the function to just
> check_toasted_attribute(), we'd save a level of indentation in that
> whole function and probably add a tad of clarity, too. You wouldn't
> feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
> if the caller had just done list_free(ctx->toasted_attributes);
> ctx->toasted_attributes = NIL.

You're right.  It looks nicer that way.  Changed.

> - Why are all the changes to the tests in this patch? What do they
> have to do with getting the TOAST checks out from under the buffer
> lock? I really need you to structure the patch series so that each
> patch is about one topic and, equally, so that each topic is only
> covered by one patch. Otherwise it's just way too confusing.

v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes in
v17-0002vs. v17-0003 

v18-0002 - Postpones the toast checks for a page until after the main table page lock is released

v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread.  Changes the tests to
expectthe new messages, but adds no new checks 

v18-0004 - Adding corruption checks of toast pointers.  Extends the regression tests to cover the new checks.

>
> - I think some of these messages need a bit of word-smithing, too, but
> we can leave that for when we're closer to being done with this.

Ok.




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes in
v17-0002vs. v17-0003
 

Committed.

> v18-0002 - Postpones the toast checks for a page until after the main table page lock is released

Committed, but I changed list_free() to list_free_deep() to avoid a
memory leak, and I revised the commit message to mention the important
point that we need to avoid following TOAST pointers from
potentially-prunable tuples.

> v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread.  Changes the tests to
expectthe new messages, but adds no new checks
 

Kibitizing your message wording:

"toast value %u chunk data is null" -> "toast value %u chunk %d has
null data". We can mention the chunk number this way.

"toast value %u corrupt extended chunk has invalid varlena header: %0x
(sequence number %d)" -> "toast value %u chunk %d has invalid varlena
header %0x". We can be more consistent about how we incorporate the
chunk number into the text, and we don't really need to include the
word corrupt, because all of these are corruption complaints, and I
think it looks better without the colon.

"toast value %u chunk sequence number %u does not match the expected
sequence number %u" -> "toast value %u contains chunk %d where chunk
%d was expected". Shorter. Uses %d for a sequence number instead of
%u, which I think is correct -- anyway we should have them all one way
or all the other. I think I'd rather ditch the "sequence number"
technology and just talk about "chunk %d" or whatever.

"toast value %u chunk sequence number %u exceeds the end chunk
sequence number %u" -> "toast value %u chunk %d follows last expected
chunk %d"

"toast value %u chunk size %u differs from the expected size %u" ->
"toast value %u chunk %d has size %u, but expected size %u"

Other complaints:

Your commit message fails to mention the addition of
VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix
unrelated to message wording.

It feels like we have a non-minimal number of checks/messages for the
series of toast chunks. I think that if we find a chunk after the last
chunk we were expecting to find (curchunk > endchunk) and you also get
a message if we have the wrong number of chunks in total (chunkno !=
(endchunk + 1)). Now maybe I'm wrong, but if the first message
triggers, it seems like the second message must also trigger. Is that
wrong? If not, maybe we can get rid of the first one entirely? That's
such a small change I think we could include it in this same patch, if
it's a correct idea.

On a related note, as I think I said before, I still think we should
be rejiggering this so that we're not testing both the size of each
individual chunk and the total size, because that ought to be
redundant. That might be better done as a separate patch but I think
we should try to clean it up.

> v18-0004 - Adding corruption checks of toast pointers.  Extends the regression tests to cover the new checks.

I think we could check that the result of
VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to
see.

Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize
is 0x3FFFFFFF in no small part because that's the maximum length that
can be represented by a varlena, but I'm not sure it's a good idea to
couple the concepts so closely like this. Maybe we can just #define
VARLENA_SIZE_LIMIT in this file and use that, and a message that says
size %u exceeds limit %u.

I'm a little worried about whether the additional test cases are
Endian-dependent at all. I don't immediately know what might be wrong
with them, but I'm going to think about that some more later. Any
chance you have access to a Big-endian box where you can test this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 7, 2021, at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes
inv17-0002 vs. v17-0003 
>
> Committed.

Thank you.

>> v18-0002 - Postpones the toast checks for a page until after the main table page lock is released
>
> Committed, but I changed list_free() to list_free_deep() to avoid a
> memory leak, and I revised the commit message to mention the important
> point that we need to avoid following TOAST pointers from
> potentially-prunable tuples.

Thank you, and yes, I agree with that change.

>> v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread.  Changes the tests to
expectthe new messages, but adds no new checks 
>
> Kibitizing your message wording:
>
> "toast value %u chunk data is null" -> "toast value %u chunk %d has
> null data". We can mention the chunk number this way.

Changed.

> "toast value %u corrupt extended chunk has invalid varlena header: %0x
> (sequence number %d)" -> "toast value %u chunk %d has invalid varlena
> header %0x". We can be more consistent about how we incorporate the
> chunk number into the text, and we don't really need to include the
> word corrupt, because all of these are corruption complaints, and I
> think it looks better without the colon.

Changed.

> "toast value %u chunk sequence number %u does not match the expected
> sequence number %u" -> "toast value %u contains chunk %d where chunk
> %d was expected". Shorter. Uses %d for a sequence number instead of
> %u, which I think is correct -- anyway we should have them all one way
> or all the other. I think I'd rather ditch the "sequence number"
> technology and just talk about "chunk %d" or whatever.

I don't agree with this one.  I do agree with changing the message, but not to the message you suggest.

Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update the toast to have only 6 chunks numbered
[0..5]except we corruptly keep chunks numbered [12..17] in the toast table.  We'd rather see a report like this: 

# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 6 has sequence number 12, but expected sequence number 6
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 7 has sequence number 13, but expected sequence number 7
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 8 has sequence number 14, but expected sequence number 8
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 was expected to end at chunk 6, but ended at chunk 12

than one like this:

# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 12 where chunk 6 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 13 where chunk 7 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 14 where chunk 8 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 15 where chunk 9 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 16 where chunk 10 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 contains chunk 17 where chunk 11 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 was expected to end at chunk 6, but ended at chunk 12

because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory.  You need
thedistinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not be
thesame. 

> "toast value %u chunk sequence number %u exceeds the end chunk
> sequence number %u" -> "toast value %u chunk %d follows last expected
> chunk %d"

Changed.

> "toast value %u chunk size %u differs from the expected size %u" ->
> "toast value %u chunk %d has size %u, but expected size %u"

Changed.

> Other complaints:
>
> Your commit message fails to mention the addition of
> VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix
> unrelated to message wording.

Right you are.

> It feels like we have a non-minimal number of checks/messages for the
> series of toast chunks. I think that if we find a chunk after the last
> chunk we were expecting to find (curchunk > endchunk) and you also get
> a message if we have the wrong number of chunks in total (chunkno !=
> (endchunk + 1)). Now maybe I'm wrong, but if the first message
> triggers, it seems like the second message must also trigger. Is that
> wrong? If not, maybe we can get rid of the first one entirely? That's
> such a small change I think we could include it in this same patch, if
> it's a correct idea.

Motivated by discussions we had off-list, I dug into this one.

Purely as manual testing, and not part of the patch, I hacked the backend a bit to allow direct modification of the
toasttable.  After corrupting the toast with the following bit of SQL: 

    WITH chunk_limit AS (
        SELECT chunk_id, MAX(chunk_seq) AS maxseq
            FROM $toastname
            GROUP BY chunk_id)
        INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
            (SELECT t.chunk_id,
                    t.chunk_seq + cl.maxseq + CASE WHEN t.chunk_seq < 3 THEN 1 ELSE 7 END,
                    t.chunk_data
                FROM $toastname t
                INNER JOIN chunk_limit cl
                ON t.chunk_id = cl.chunk_id)

pg_amcheck reports the following corruption messages:

# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 6 follows last expected chunk 5
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 7 follows last expected chunk 5
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 8 follows last expected chunk 5
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
#     toast value 16444 was expected to end at chunk 6, but ended at chunk 12

I think if we'd left out the first three messages, it would read strangely.  We would be complaining about three chunks
withthe wrong sequence number, then conclude that there were six extra chunks.  A sufficiently savvy user might deduce
thepresence of chunks 6, 7, and 8, but the problem is more obvious (to my eyes, at least) if we keep the first three
messages. This seems like a judgement call and not a clear argument either way, so if you still want me to change it, I
guessI don't mind doing so. 

> On a related note, as I think I said before, I still think we should
> be rejiggering this so that we're not testing both the size of each
> individual chunk and the total size, because that ought to be
> redundant. That might be better done as a separate patch but I think
> we should try to clean it up.

Can you point me to the exact check you are mentioning, and with which patch applied?  I don't see any examples of this
afterapplying the v18-0003. 

>> v18-0004 - Adding corruption checks of toast pointers.  Extends the regression tests to cover the new checks.
>
> I think we could check that the result of
> VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to
> see.

Yes.  I had that before, pulled it out along with other toast compression checks, but have put it back in for v19.

> Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize
> is 0x3FFFFFFF in no small part because that's the maximum length that
> can be represented by a varlena, but I'm not sure it's a good idea to
> couple the concepts so closely like this. Maybe we can just #define
> VARLENA_SIZE_LIMIT in this file and use that, and a message that says
> size %u exceeds limit %u.

Changed.

> I'm a little worried about whether the additional test cases are
> Endian-dependent at all. I don't immediately know what might be wrong
> with them, but I'm going to think about that some more later. Any
> chance you have access to a Big-endian box where you can test this?

I don't have a Big-endian box, but I think one of them may be wrong now that you mention the issue:

# Corrupt column c's toast pointer va_extinfo field

The problem is that the 30-bit extsize and 2-bit cmid split is not being handled in the perl test, and I don't see an
easyway to have perl's pack/unpack do that for us.  There isn't any requirement that each possible corruption we check
actuallybe manifested in the regression tests.  The simplest solution is to remove this problematic test, so that's
whatI did.  The other two new tests corrupt c_va_toastrelid and c_va_rawsize, both of which are read/written using
unpack/pack,so perl should handle the endianness for us (I hope). 

If you'd rather not commit these two extra tests, you don't have to, as I've split them out into v19-0003.  But if you
docommit them, it makes more sense to me to be one commit with 0002+0003 together, rather than separately.   Not
committingthe new tests just means that verify_heapam() is able to detect additional forms of corruption that we're not
coveringin the regression tests.  But that's already true for some other corruption types, such as detecting toast
chunkswith null sequence numbers. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update the toast to have only 6 chunks numbered
[0..5]except we corruptly keep chunks numbered [12..17] in the toast table.  We'd rather see a report like this: 
[ toast value NNN chunk NNN has sequence number NNN, but expected
sequence number NNN ]
> than one like this:
[ toast value NNN contains chunk NNN where chunk NNN was expected ]
>
> because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory.  You
needthe distinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not
bethe same. 

Hmm, I see your point, and that's a good example to illustrate it.
But, with that example in front of me, I am rather doubtful that
either of these is what users actually want. Consider the case where I
should have chunks 0..17 and chunk 1 is just plain gone. This, by the
way, seems like a pretty likely case to arise in practice, since all
we need is for a block to get truncated away or zeroed erroneously, or
for a tuple to get pruned that shouldn't. With either of the above
schemes, I guess we're going to get a message about every chunk from 2
to 17, complaining that they're all misnumbered. We might also get a
complaint that the last chunk is the wrong size, and that the total
number of chunks isn't right. What we really want is a single
complaint saying chunk 1 is missing.

Likewise, in your example, I sort of feel like what I really want,
rather than either of the above outputs, is to get some messages like
this:

toast value NNN contains unexpected extra chunk [12-17]

Both your phrasing for those messages and what I suggested make it
sound like the problem is that the chunk number is wrong. But that
doesn't seem like it's taking the right view of the situation. Chunks
12-17 shouldn't exist at all, and if they do, we should say that, e.g.
by complaining about something like "toast value 16444 chunk 12
follows last expected chunk 5"

In other words, I don't buy the idea that the user will accept the
idea that there's a chunk number and a chunk sequence number, and that
they should know the difference between those things and what each of
them are. They're entitled to imagine that there's just one thing, and
that we're going to tell them about value that are extra or missing.
The fact that we're not doing that seems like it's just a matter of
missing code. If we start the index scan and get chunk 4, we can
easily emit messages for chunks 0..3 right on the spot, declaring them
missing. Things do get a bit hairy if the index scan returns values
out of order: what if it gives us chunk_seq = 2 and then chunk_seq =
1? But I think we could handle that by just issuing a complaint in any
such case that "toast index returns chunks out of order for toast
value NNN" and stopping further checking of that toast value.

> Purely as manual testing, and not part of the patch, I hacked the backend a bit to allow direct modification of the
toasttable.  After corrupting the toast with the following bit of SQL: 
>
>         WITH chunk_limit AS (
>                 SELECT chunk_id, MAX(chunk_seq) AS maxseq
>                         FROM $toastname
>                         GROUP BY chunk_id)
>                 INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
>                         (SELECT t.chunk_id,
>                                         t.chunk_seq + cl.maxseq + CASE WHEN t.chunk_seq < 3 THEN 1 ELSE 7 END,
>                                         t.chunk_data
>                                 FROM $toastname t
>                                 INNER JOIN chunk_limit cl
>                                 ON t.chunk_id = cl.chunk_id)
>
> pg_amcheck reports the following corruption messages:
>
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 6 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 7 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 8 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 was expected to end at chunk 6, but ended at chunk 12
>
> I think if we'd left out the first three messages, it would read strangely.  We would be complaining about three
chunkswith the wrong sequence number, then conclude that there were six extra chunks.  A sufficiently savvy user might
deducethe presence of chunks 6, 7, and 8, but the problem is more obvious (to my eyes, at least) if we keep the first
threemessages.  This seems like a judgement call and not a clear argument either way, so if you still want me to change
it,I guess I don't mind doing so. 

I mean, looking at it, the question here is why it's not just using
the same message for all of them. The fact that the chunk numbers are
higher than 5 is the problem. The sequence numbers seem like just a
distraction.

> > On a related note, as I think I said before, I still think we should
> > be rejiggering this so that we're not testing both the size of each
> > individual chunk and the total size, because that ought to be
> > redundant. That might be better done as a separate patch but I think
> > we should try to clean it up.
>
> Can you point me to the exact check you are mentioning, and with which patch applied?  I don't see any examples of
thisafter applying the v18-0003. 

Hmm, my mistake, I think.

> > I'm a little worried about whether the additional test cases are
> > Endian-dependent at all. I don't immediately know what might be wrong
> > with them, but I'm going to think about that some more later. Any
> > chance you have access to a Big-endian box where you can test this?
>
> I don't have a Big-endian box, but I think one of them may be wrong now that you mention the issue:
>
> # Corrupt column c's toast pointer va_extinfo field
>
> The problem is that the 30-bit extsize and 2-bit cmid split is not being handled in the perl test, and I don't see an
easyway to have perl's pack/unpack do that for us.  There isn't any requirement that each possible corruption we check
actuallybe manifested in the regression tests.  The simplest solution is to remove this problematic test, so that's
whatI did.  The other two new tests corrupt c_va_toastrelid and c_va_rawsize, both of which are read/written using
unpack/pack,so perl should handle the endianness for us (I hope). 

I don't immediately see why this particular thing should be an issue.
The format of the varlena header itself is different on big-endian and
little-endian machines, which is why postgres.h has all this stuff
conditioned on WORDS_BIGENDIAN. But va_extinfo doesn't have any
similar treatment, so I'm not sure what could go wrong there, as long
as the 4-byte value as a whole is being packed and unpacked according
to the machine's endian-ness.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 8, 2021, at 1:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update the toast to have only 6 chunks
numbered[0..5] except we corruptly keep chunks numbered [12..17] in the toast table.  We'd rather see a report like
this:
> [ toast value NNN chunk NNN has sequence number NNN, but expected
> sequence number NNN ]
>> than one like this:
> [ toast value NNN contains chunk NNN where chunk NNN was expected ]
>>
>> because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory.  You
needthe distinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not
bethe same. 
>
> Hmm, I see your point, and that's a good example to illustrate it.
> But, with that example in front of me, I am rather doubtful that
> either of these is what users actually want. Consider the case where I
> should have chunks 0..17 and chunk 1 is just plain gone. This, by the
> way, seems like a pretty likely case to arise in practice, since all
> we need is for a block to get truncated away or zeroed erroneously, or
> for a tuple to get pruned that shouldn't. With either of the above
> schemes, I guess we're going to get a message about every chunk from 2
> to 17, complaining that they're all misnumbered. We might also get a
> complaint that the last chunk is the wrong size, and that the total
> number of chunks isn't right. What we really want is a single
> complaint saying chunk 1 is missing.
>
> Likewise, in your example, I sort of feel like what I really want,
> rather than either of the above outputs, is to get some messages like
> this:
>
> toast value NNN contains unexpected extra chunk [12-17]
>
> Both your phrasing for those messages and what I suggested make it
> sound like the problem is that the chunk number is wrong. But that
> doesn't seem like it's taking the right view of the situation. Chunks
> 12-17 shouldn't exist at all, and if they do, we should say that, e.g.
> by complaining about something like "toast value 16444 chunk 12
> follows last expected chunk 5"
>
> In other words, I don't buy the idea that the user will accept the
> idea that there's a chunk number and a chunk sequence number, and that
> they should know the difference between those things and what each of
> them are. They're entitled to imagine that there's just one thing, and
> that we're going to tell them about value that are extra or missing.
> The fact that we're not doing that seems like it's just a matter of
> missing code.

Somehow, we have to get enough information about chunk_seq discontinuity into the output that if the user forwards it
to-hackers, or if the output comes from a buildfarm critter, that we have all the information to help diagnose what
wentwrong. 

As a specific example, if the va_rawsize suggests 2 chunks, and we find 150 chunks all with contiguous chunk_seq
values,that is different from a debugging point of view than if we find 150 chunks with chunk_seq values spread all
overthe [0..MAXINT] range.  We can't just tell the user that there were 148 extra chunks.  We also shouldn't phrase the
errorin terms of "extra chunks", since it might be the va_rawsize that is corrupt. 

I agree that the current message output might be overly verbose in how it reports this information.  Conceptually, we
wantto store up information about the chunk issues and report them all at once, but that's hard to do in general, as
thenumber of chunk_seq discontinuities might be quite large, much too large to fit reasonably into any one message.
Maybewe could report just the first N discontinuities rather than all of them, but if somebody wants to open a hex
editorand walk through the toast table, they won't appreciate having the corruption information truncated like that. 

All this leads me to believe that we should report the following:

1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how many
wegot 
2) If the chunk_seq numbers are discontiguous, report each discontiguity.
3) If the index scan returned chunks out of chunk_seq order, report that
4) If any chunk is not the expected size, report that

So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we expected,
thatthe second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller than
expectedby M bytes, and that the total size was smaller than we expected by N bytes.  The third of those is somewhat
misleading,since the final chunk was presumably the right size; we just weren't expecting to hit a partial chunk quite
yet. But I don't see how to make that better in the general case. 

> If we start the index scan and get chunk 4, we can
> easily emit messages for chunks 0..3 right on the spot, declaring them
> missing. Things do get a bit hairy if the index scan returns values
> out of order: what if it gives us chunk_seq = 2 and then chunk_seq =
> 1? But I think we could handle that by just issuing a complaint in any
> such case that "toast index returns chunks out of order for toast
> value NNN" and stopping further checking of that toast value.
>
>> Purely as manual testing, and not part of the patch, I hacked the backend a bit to allow direct modification of the
toasttable.  After corrupting the toast with the following bit of SQL: 
>>
>>        WITH chunk_limit AS (
>>                SELECT chunk_id, MAX(chunk_seq) AS maxseq
>>                        FROM $toastname
>>                        GROUP BY chunk_id)
>>                INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
>>                        (SELECT t.chunk_id,
>>                                        t.chunk_seq + cl.maxseq + CASE WHEN t.chunk_seq < 3 THEN 1 ELSE 7 END,
>>                                        t.chunk_data
>>                                FROM $toastname t
>>                                INNER JOIN chunk_limit cl
>>                                ON t.chunk_id = cl.chunk_id)
>>
>> pg_amcheck reports the following corruption messages:
>>
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 6 follows last expected chunk 5
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 7 follows last expected chunk 5
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 8 follows last expected chunk 5
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11
>> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
>> #     toast value 16444 was expected to end at chunk 6, but ended at chunk 12
>>
>> I think if we'd left out the first three messages, it would read strangely.  We would be complaining about three
chunkswith the wrong sequence number, then conclude that there were six extra chunks.  A sufficiently savvy user might
deducethe presence of chunks 6, 7, and 8, but the problem is more obvious (to my eyes, at least) if we keep the first
threemessages.  This seems like a judgement call and not a clear argument either way, so if you still want me to change
it,I guess I don't mind doing so. 
>
> I mean, looking at it, the question here is why it's not just using
> the same message for all of them. The fact that the chunk numbers are
> higher than 5 is the problem. The sequence numbers seem like just a
> distraction.

Again, I don't think we can reach that conclusion.  You are biasing the corruption reports in favor of believing the
va_rawsizerather than believing the toast table. 

>>> On a related note, as I think I said before, I still think we should
>>> be rejiggering this so that we're not testing both the size of each
>>> individual chunk and the total size, because that ought to be
>>> redundant. That might be better done as a separate patch but I think
>>> we should try to clean it up.
>>
>> Can you point me to the exact check you are mentioning, and with which patch applied?  I don't see any examples of
thisafter applying the v18-0003. 
>
> Hmm, my mistake, I think.
>
>>> I'm a little worried about whether the additional test cases are
>>> Endian-dependent at all. I don't immediately know what might be wrong
>>> with them, but I'm going to think about that some more later. Any
>>> chance you have access to a Big-endian box where you can test this?
>>
>> I don't have a Big-endian box, but I think one of them may be wrong now that you mention the issue:
>>
>> # Corrupt column c's toast pointer va_extinfo field
>>
>> The problem is that the 30-bit extsize and 2-bit cmid split is not being handled in the perl test, and I don't see
aneasy way to have perl's pack/unpack do that for us.  There isn't any requirement that each possible corruption we
checkactually be manifested in the regression tests.  The simplest solution is to remove this problematic test, so
that'swhat I did.  The other two new tests corrupt c_va_toastrelid and c_va_rawsize, both of which are read/written
usingunpack/pack, so perl should handle the endianness for us (I hope). 
>
> I don't immediately see why this particular thing should be an issue.
> The format of the varlena header itself is different on big-endian and
> little-endian machines, which is why postgres.h has all this stuff
> conditioned on WORDS_BIGENDIAN. But va_extinfo doesn't have any
> similar treatment, so I'm not sure what could go wrong there, as long
> as the 4-byte value as a whole is being packed and unpacked according
> to the machine's endian-ness.

Good point.  Perhaps the test was ok after all.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> All this leads me to believe that we should report the following:
>
> 1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how many
wegot 
> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
> 3) If the index scan returned chunks out of chunk_seq order, report that
> 4) If any chunk is not the expected size, report that
>
> So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we
expected,that the second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller
thanexpected by M bytes, and that the total size was smaller than we expected by N bytes.  The third of those is
somewhatmisleading, since the final chunk was presumably the right size; we just weren't expecting to hit a partial
chunkquite yet.  But I don't see how to make that better in the general case. 

Hmm, that might be OK. It seems like it's going to be a bit verbose in
simple cases like 1 missing chunk, but on the plus side, it avoids a
mountain of output if the raw size has been overwritten with a
gigantic bogus value. But, how is #2 different from #3? Those sound
like the same thing to me.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 8, 2021, at 3:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> All this leads me to believe that we should report the following:
>>
>> 1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how
manywe got 
>> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
>> 3) If the index scan returned chunks out of chunk_seq order, report that
>> 4) If any chunk is not the expected size, report that
>>
>> So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we
expected,that the second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller
thanexpected by M bytes, and that the total size was smaller than we expected by N bytes.  The third of those is
somewhatmisleading, since the final chunk was presumably the right size; we just weren't expecting to hit a partial
chunkquite yet.  But I don't see how to make that better in the general case. 
>
> Hmm, that might be OK. It seems like it's going to be a bit verbose in
> simple cases like 1 missing chunk, but on the plus side, it avoids a
> mountain of output if the raw size has been overwritten with a
> gigantic bogus value. But, how is #2 different from #3? Those sound
> like the same thing to me.

#2 is if chunk_seq goes up but skips numbers.  #3 is if chunk_seq ever goes down, meaning the index scan did something
unexpected.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 8, 2021 at 6:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> #2 is if chunk_seq goes up but skips numbers.  #3 is if chunk_seq ever goes down, meaning the index scan did
somethingunexpected.
 

Yeah, sure. But I think we could probably treat those the same way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 8, 2021, at 3:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> All this leads me to believe that we should report the following:
>>
>> 1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how
manywe got 
>> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
>> 3) If the index scan returned chunks out of chunk_seq order, report that
>> 4) If any chunk is not the expected size, report that
>>
>> So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we
expected,that the second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller
thanexpected by M bytes, and that the total size was smaller than we expected by N bytes.  The third of those is
somewhatmisleading, since the final chunk was presumably the right size; we just weren't expecting to hit a partial
chunkquite yet.  But I don't see how to make that better in the general case. 
>
> Hmm, that might be OK. It seems like it's going to be a bit verbose in
> simple cases like 1 missing chunk, but on the plus side, it avoids a
> mountain of output if the raw size has been overwritten with a
> gigantic bogus value. But, how is #2 different from #3? Those sound
> like the same thing to me.

I think #4, above, requires some clarification.  If there are missing chunks, the very definition of how large we
expectsubsequent chunks to be is ill-defined.  I took a fairly conservative approach to avoid lots of bogus complaints
aboutchunks that are of unexpected size.   Not all such complaints are removed, but enough are removed that I needed to
adda final complaint at the end about the total size seen not matching the total size expected. 

Here are a set of corruptions with the corresponding corruption reports from before and from after the code changes.
Thecorruptions are *not* cumulative. 

Honestly, I'm not totally convinced that these changes are improvements in all cases.  Let me know if you want further
changes,or if you'd like to see other corruptions and their before and after results. 


Corruption #1:

    UPDATE $toastname SET chunk_seq = chunk_seq + 1000

Before:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 0 has sequence number 1000, but expected sequence number 0
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 1 has sequence number 1001, but expected sequence number 1
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 2 has sequence number 1002, but expected sequence number 2
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 3 has sequence number 1003, but expected sequence number 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 4 has sequence number 1004, but expected sequence number 4
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 5 has sequence number 1005, but expected sequence number 5

After:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 0 through 999


Corruption #2:

    UPDATE $toastname SET chunk_seq = chunk_seq * 1000

Before:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 1 has sequence number 1000, but expected sequence number 1
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 2 has sequence number 2000, but expected sequence number 2
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 3 has sequence number 3000, but expected sequence number 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 4 has sequence number 4000, but expected sequence number 4
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 5 has sequence number 5000, but expected sequence number 5

After:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 1 through 999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 1001 through 1999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 2001 through 2999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 3001 through 3999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunks 4001 through 4999
# heap table "postgres"."public"."test", block 0, offset 3, attribute 2:


Corruption #3:

    UPDATE $toastname SET chunk_id = (chunk_id::integer + 10000000)::oid WHERE chunk_seq = 3

Before:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 3 has sequence number 4, but expected sequence number 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 4 has sequence number 5, but expected sequence number 4
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 was expected to end at chunk 6, but ended at chunk 5

After:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 missing chunk 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 chunk 4 has size 20, but expected size 1996
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 was expected to end at chunk 6, but ended at chunk 5
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
#     toast value 16445 was expected to have size 10000, but had size 8004
# heap table "postgres"."public"."test", block 0, offset 3, attribute 2:



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> I think #4, above, requires some clarification.  If there are missing chunks, the very definition of how large we
expectsubsequent chunks to be is ill-defined.  I took a fairly conservative approach to avoid lots of bogus complaints
aboutchunks that are of unexpected size.   Not all such complaints are removed, but enough are removed that I needed to
adda final complaint at the end about the total size seen not matching the total size expected. 

My instinct is to suppose that the size that we expect for future
chunks is independent of anything being wrong with previous chunks. So
if each chunk is supposed to be 2004 bytes (which probably isn't the
real number) and the value is 7000 bytes long, we expect chunks 0-2 to
be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to
not exist. If chunk 1 happens to be missing or the wrong length or
whatever, our expectations for chunks 2 and 3 are utterly unchanged.

> Corruption #1:
>
>         UPDATE $toastname SET chunk_seq = chunk_seq + 1000
>
> Before:
>
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 0 has sequence number 1000, but expected sequence number 0
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 1 has sequence number 1001, but expected sequence number 1
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 2 has sequence number 1002, but expected sequence number 2
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 3 has sequence number 1003, but expected sequence number 3
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 4 has sequence number 1004, but expected sequence number 4
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 chunk 5 has sequence number 1005, but expected sequence number 5
>
> After:
>
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> #     toast value 16445 missing chunks 0 through 999

Applying the above principle would lead to complaints that chunks 0-5
are missing, and 1000-1005 are extra.

> Corruption #2:
>
>         UPDATE $toastname SET chunk_seq = chunk_seq * 1000

Similarly here, except the extra chunk numbers are different.

> Corruption #3:
>
>         UPDATE $toastname SET chunk_id = (chunk_id::integer + 10000000)::oid WHERE chunk_seq = 3

And here we'd just get a complaint that chunk 3 is missing.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 9, 2021, at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> I think #4, above, requires some clarification.  If there are missing chunks, the very definition of how large we
expectsubsequent chunks to be is ill-defined.  I took a fairly conservative approach to avoid lots of bogus complaints
aboutchunks that are of unexpected size.   Not all such complaints are removed, but enough are removed that I needed to
adda final complaint at the end about the total size seen not matching the total size expected. 
>
> My instinct is to suppose that the size that we expect for future
> chunks is independent of anything being wrong with previous chunks. So
> if each chunk is supposed to be 2004 bytes (which probably isn't the
> real number) and the value is 7000 bytes long, we expect chunks 0-2 to
> be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to
> not exist. If chunk 1 happens to be missing or the wrong length or
> whatever, our expectations for chunks 2 and 3 are utterly unchanged.

Fair enough.

>> Corruption #1:
>>
>>        UPDATE $toastname SET chunk_seq = chunk_seq + 1000
>>
>> Before:
>>
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 0 has sequence number 1000, but expected sequence number 0
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 1 has sequence number 1001, but expected sequence number 1
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 2 has sequence number 1002, but expected sequence number 2
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 3 has sequence number 1003, but expected sequence number 3
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 4 has sequence number 1004, but expected sequence number 4
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 chunk 5 has sequence number 1005, but expected sequence number 5
>>
>> After:
>>
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> #     toast value 16445 missing chunks 0 through 999
>
> Applying the above principle would lead to complaints that chunks 0-5
> are missing, and 1000-1005 are extra.

That sounds right.  It now reports:

# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
#     toast value 16459 missing chunks 0 through 4 with expected size 1996 and chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
#     toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 followed by chunk 1005 with size 20


>> Corruption #2:
>>
>>        UPDATE $toastname SET chunk_seq = chunk_seq * 1000
>
> Similarly here, except the extra chunk numbers are different.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 missing chunks 1 through 4 with expected size 1996 and chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 unexpected chunk 1000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 unexpected chunk 2000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 unexpected chunk 3000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 unexpected chunk 4000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
#     toast value 16460 unexpected chunk 5000 with size 20

I don't see any good way in this case to report the extra chunks in one row, as in the general case there could be
arbitrarilymany of them, with the message text getting arbitrarily large if it reported the chunks as "chunks 1000,
2000,3000, 4000, 5000, ...".  I don't expect this sort of corruption being particularly common, though, so I'm not too
botheredabout it. 

>
>> Corruption #3:
>>
>>        UPDATE $toastname SET chunk_id = (chunk_id::integer + 10000000)::oid WHERE chunk_seq = 3
>
> And here we'd just get a complaint that chunk 3 is missing.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
#     toast value 16461 missing chunk 3 with expected size 1996
# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
#     toast value 16461 was expected to end at chunk 6 with total size 10000, but ended at chunk 5 with total size 8004

It sounds like you weren't expecting the second of these reports.  I think it is valuable, especially when there are
multiplemissing chunks and multiple extraneous chunks, as it makes it easier for the user to reconcile the missing
chunksagainst the extraneous chunks. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> It now reports:
>
> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
> #     toast value 16461 missing chunk 3 with expected size 1996
> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
> #     toast value 16461 was expected to end at chunk 6 with total size 10000, but ended at chunk 5 with total size
8004
>
> It sounds like you weren't expecting the second of these reports.  I think it is valuable, especially when there are
multiplemissing chunks and multiple extraneous chunks, as it makes it easier for the user to reconcile the missing
chunksagainst the extraneous chunks. 

I wasn't, but I'm not overwhelmingly opposed to it, either. I do think
I would be in favor of splitting this kind of thing up into two
messages:

#     toast value 16459 unexpected chunks 1000 through 1004 each with
size 1996 followed by chunk 1005 with size 20

We'll have fewer message variants and I don't think any real
regression in usability if we say:

#     toast value 16459 has unexpected chunks 1000 through 1004 each
with size 1996
#     toast value 16459 has unexpected chunk 1005 with size 20

(Notice that I also inserted "has" so that the sentence a verb. Or we
could "contains.")

I committed 0001.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 14, 2021, at 10:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> It now reports:
>>
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> #     toast value 16461 missing chunk 3 with expected size 1996
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> #     toast value 16461 was expected to end at chunk 6 with total size 10000, but ended at chunk 5 with total size
8004
>>
>> It sounds like you weren't expecting the second of these reports.  I think it is valuable, especially when there are
multiplemissing chunks and multiple extraneous chunks, as it makes it easier for the user to reconcile the missing
chunksagainst the extraneous chunks. 
>
> I wasn't, but I'm not overwhelmingly opposed to it, either. I do think
> I would be in favor of splitting this kind of thing up into two
> messages:
>
> #     toast value 16459 unexpected chunks 1000 through 1004 each with
> size 1996 followed by chunk 1005 with size 20
>
> We'll have fewer message variants and I don't think any real
> regression in usability if we say:
>
> #     toast value 16459 has unexpected chunks 1000 through 1004 each
> with size 1996
> #     toast value 16459 has unexpected chunk 1005 with size 20

Changed.

> (Notice that I also inserted "has" so that the sentence a verb. Or we
> could "contains.")

I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other similar
corruptionreports. 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other
similarcorruption reports.
 

That makes sense.

I think it's odd that a range of extraneous chunks is collapsed into a
single report if the size of each chunk happens to be
TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
first and last extraneous chunk and the size of each? If the next
chunk you see is the next one in sequence and the same size as all the
others, extend your notion of the sequence end by 1. Otherwise, report
the range accumulated so far. It seems to me that this wouldn't be any
more code than you have now, and might actually be less.

I think that report_missing_chunks() could probably just report the
range of missing chunks and not bother reporting how big they were
expected to be. But, if it is going to report how big they were
expected to be, I think it should have only 2 cases rather than 4:
either a range of missing chunks of equal size, or a single missing
chunk of some size. If, as I propose, it doesn't report the expected
size, then you still have just 2 cases: a range of missing chunks, or
a single missing chunk.

Somehow I have a hard time feeling confident that check_toast_tuple()
is going to do the right thing. The logic is really complex and hard
to understand. 'chunkno' is a counter that advances every time we move
to the next chunk, and 'curchunk' is the value we actually find in the
TOAST tuple. This terminology is not easy to understand. Most messages
now report 'curchunk', but some still report 'chunkno'. Why does
'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
particular reason why what you're calling 'chunkno' needs to exist
even as a local variable, let alone be printed out. Either we haven't
yet validated that the chunk_id extracted from the tuple is non-null
and greater than the last chunk number we saw, in which case we can
just complain about it if we find it to be otherwise, or we have
already done that validation, in which case we should complain about
that value and not 'chunkno' in any subsequent messages.

The conditionals between where you set max_valid_prior_chunk and where
you set last_chunk_seen seem hard to understand, particularly the
bifurcated way that missing chunks are reported. Initial missing
chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0)
and later missing chunks are detected by (chunkno > 0 &&
max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this
is correct; I find it hard to get my head around what
max_valid_prior_chunk is supposed to represent. But in any case I
think it can be written more simply. Just keep track of what chunk_id
we expect to extract from the next TOAST tuple. Initially it's 0.
Then:

if (chunkno < tctx->expected_chunkno)
{
   // toast value %u index scan returned chunk number %d when chunk %d
was expected
   // don't modify tctx->expected_chunkno here, just hope the next
thing matches our previous expectation
}
else
{
    if (chunkno > tctx->expected_chunkno)
        // chunks are missing from tctx->expected_chunkno through
Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter
value is greater than or equal to the former
    tctx->expected_chunkno = chunkno + 1;
}

If you do this, you only need to report extraneous chunks when chunkno
> tctx->final_expected_chunk, since chunkno < 0 is guaranteed to
trigger the first of the two complaints shown above.

In check_tuple_attribute I suggest "bool valid = false" rather than
"bool invalid = true". I think it's easier to understand.

I object to check_toasted_attribute() using 'chunkno' in a message for
the same reasons as above in regards to check_toast_tuple() i.e. I
think it's a concept which should not exist.

I think this patch could possibly be split up into multiple patches.
There's some question in my mind whether it's getting too late to
commit any of this, since some of it looks suspiciously like new
features after feature freeze. However, I kind of hate to ship this
release without at least doing something about the chunkno vs.
curchunk stuff, which is even worse in the committed code than in your
patch, and which I think will confuse the heck out of users if those
messages actually fire for anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 19, 2021, at 12:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other
similarcorruption reports. 
>
> That makes sense.
>
> I think it's odd that a range of extraneous chunks is collapsed into a
> single report if the size of each chunk happens to be
> TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
> first and last extraneous chunk and the size of each? If the next
> chunk you see is the next one in sequence and the same size as all the
> others, extend your notion of the sequence end by 1. Otherwise, report
> the range accumulated so far. It seems to me that this wouldn't be any
> more code than you have now, and might actually be less.

In all cases of uncorrupted toasted attributes, the sequence of N chunks that make up the attribute should be N-1
chunksof TOAST_MAX_CHUNK_SIZE ending with a single chunk of up to TOAST_MAX_CHUNK_SIZE.  I'd like to refer to such
sequencesas "reasonably sized" sequences to make conversation easier. 

If the toast pointer's va_extsize field leads us to believe that we should find 10 reasonably sized chunks, but instead
wefind 30 reasonably sized chunks, we know something is corrupt.  We shouldn't automatically prejudice the user against
theadditional 20 chunks.  We didn't expect them, but maybe that's because va_extsize was corrupt and gave us a false
expectation. We're not pointing fingers one way or the other. 

On the other hand, if we expect 10 chunks and find an additional 20 unreasonably sized chunks, we can and should point
fingersat the extra 20 chunks.  Even if we somehow knew that va_extsize was also corrupt, we'd still be justified in
sayingthe 20 unreasonably sized chunks are each, individually corrupt. 

I tried to write the code to report one corruption message per corruption found.  There are some edge cases where this
isa definitional challenge, so it's not easy to say that I've always achieved this goal, but I think i've done so where
thedefinitions are clear.  As such, the only time I'd want to combine toast chunks into a single corruption message is
whenthey are not in themselves necessarily *individually* corrupt.  That is why I wrote the code to use
TOAST_MAX_CHUNK_SIZErather than just storing up any series of equally sized chunks. 

On a related note, when complaining about a sequence of toast chunks, often the sequence is something like [maximal,
maximal,..., maximal, partial], but sometimes it's just [maximal...maximal], sometimes just [maximal], and sometimes
just[partial].  If I'm complaining about that entire sequence, I'd really like to do so in just one message, otherwise
itlooks like separate complaints. 

I can certainly change the code to be how you are asking, but I'd first like to know that you really understood what I
wasdoing here and why the reports read the way they do. 

> I think that report_missing_chunks() could probably just report the
> range of missing chunks and not bother reporting how big they were
> expected to be. But, if it is going to report how big they were
> expected to be, I think it should have only 2 cases rather than 4:
> either a range of missing chunks of equal size, or a single missing
> chunk of some size. If, as I propose, it doesn't report the expected
> size, then you still have just 2 cases: a range of missing chunks, or
> a single missing chunk.

Right, this is the same as above.  I'm trying not to split a single corruption complaint into separate reports.

> Somehow I have a hard time feeling confident that check_toast_tuple()
> is going to do the right thing. The logic is really complex and hard
> to understand. 'chunkno' is a counter that advances every time we move
> to the next chunk, and 'curchunk' is the value we actually find in the
> TOAST tuple. This terminology is not easy to understand. Most messages
> now report 'curchunk', but some still report 'chunkno'. Why does
> 'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
> and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
> particular reason why what you're calling 'chunkno' needs to exist
> even as a local variable, let alone be printed out. Either we haven't
> yet validated that the chunk_id extracted from the tuple is non-null
> and greater than the last chunk number we saw, in which case we can
> just complain about it if we find it to be otherwise, or we have
> already done that validation, in which case we should complain about
> that value and not 'chunkno' in any subsequent messages.

If we use tctx->last_chunk_seen as you propose, I imagine we'd set that to -1 prior to the first call to
check_toast_tuple(). In the first call, we'd extract the toast chunk_seq and store it in curchunk and verify that it's
onegreater than tctx->last_chunk_seen.  That all seems fine. 

But under corrupt conditions, curchunk = DatumGetInt32(fastgetattr(toasttup, 2, hctx->toast_rel->rd_att, &isnull))
couldreturn -1.  That's invalid, of course, but now we don't know what to do.  We're supposed to complain when we get
thesame chunk_seq from the index scan more than once in a row, but we don't know if the value in last_chunk_seen is a
realvalue or just the dummy initial value.  Worse still, when we get the next toast tuple back and it has a chunk_seq
of-2, we want to complain that the index is returning tuples in reverse order, but we can't, because we still don't
knowif the -1 in last_chunk_seen is legitimate or a dummy value because that state information isn't carried over from
theprevious call. 

Using chunkno solves this problem.  If chunkno == 0, it means this is our first call, and tctx->last_chunk_seen is
uninitialized. Otherwise, this is not the first call, and tctx->last_chunk_seen really is the chunk_seq seen in the
priorcall.  There is no ambiguity. 

I could probably change "int chunkno" to "bool is_first_call" or similar.  I had previously used chunkno in the
corruptionreport about chunks whose chunk_seq is null.  The idea was that if you have 100 chunks and the 30th chunk is
corruptlynulled out, you could say something like "toast value 178337 has toast chunk 30 with null sequence number",
butyou had me change that to "toast value 178337 has toast chunk with null sequence number", so generation of that
messageno longer needs the chunkno.  I had kept chunkno around for the other purpose of knowing whether
tctx->last_chunk_seenhas been initialized yet, but a bool for that would now be sufficient.  In any event, though you
disagreewith me about this below, I think the caller of this code still needs to track chunkno. 

> The conditionals between where you set max_valid_prior_chunk and where
> you set last_chunk_seen seem hard to understand, particularly the
> bifurcated way that missing chunks are reported. Initial missing
> chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0)
> and later missing chunks are detected by (chunkno > 0 &&
> max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this
> is correct;

When we read a chunk_seq from a toast tuple, we need to determine if it indicates a gap in the chunk sequence, but we
needto be careful.  

The (chunkno == 0) and (chunkno > 0) stuff is just distinguishing between the first call and all subsequent calls.

For illustrative purposes, imagine that we expect chunks [0..4].

On the first call, we expect chunk_seq = 0, but that's not what we actually complain about if we get chunk_seq = 15.
Wecomplain about all missing expected chunks, namely [0..4], not [0..14].  We also don't complain yet about seeing
extraneouschunk 15, because it might be the first in a series of contiguous extraneous chunks, and we want to wait and
reportthose all at once when the sequence finishes.  Simply complaining at this point that we didn't expect to see
chunk_seq15 is the kind of behavior that we already have committed and are trying to fix because the corruption reports
arenot on point. 

On subsequent calls, we expect chunk_seq = last_chunk_seen+1, but that's also not what we actually complain about if we
getsome other value for chunk_seq.  What we complain about are the missing and extraneous sequences, not the individual
chunkthat had an unexpected value. 

> I find it hard to get my head around what
> max_valid_prior_chunk is supposed to represent. But in any case I
> think it can be written more simply. Just keep track of what chunk_id
> we expect to extract from the next TOAST tuple. Initially it's 0.
> Then:
>
> if (chunkno < tctx->expected_chunkno)
> {
>   // toast value %u index scan returned chunk number %d when chunk %d
> was expected
>   // don't modify tctx->expected_chunkno here, just hope the next
> thing matches our previous expectation
> }
> else
> {
>    if (chunkno > tctx->expected_chunkno)
>        // chunks are missing from tctx->expected_chunkno through
> Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter
> value is greater than or equal to the former
>    tctx->expected_chunkno = chunkno + 1;
> }
>
> If you do this, you only need to report extraneous chunks when chunkno
>> tctx->final_expected_chunk, since chunkno < 0 is guaranteed to
> trigger the first of the two complaints shown above.

In the example above, if we're expecting chunks [0..4] and get chunk_seq = 5, the max_valid_prior_chunk is 4.  If we
insteadget chunk_seq = 6, the max_valid_prior_chunk is still 4, because chunk 5 is out of bounds. 

> In check_tuple_attribute I suggest "bool valid = false" rather than
> "bool invalid = true". I think it's easier to understand.

Yeah, I had it that way and changed it, because I don't much like having the only use of a boolean be a negation.

    bool foo = false; ... if (!foo) { ... }

seems worse to me than

    bool foo = true; ... if (foo) { ... }

But you're looking at it more from the perspective of english grammar, where "invalid = false" reads as a
double-negative. That's fine.  I can change it back. 

> I object to check_toasted_attribute() using 'chunkno' in a message for
> the same reasons as above in regards to check_toast_tuple() i.e. I
> think it's a concept which should not exist.

So if we expect 100 chunks, get chunks [0..19, 80..99], you'd have me write the message as "expected 100 chunks but
sequenceended at chunk 99"?  I think that's odd.  It makes infinitely more sense to me to say "expected 100 chunks but
sequenceended at chunk 40".  Actually, this is an argument against changing "int chunkno" to "bool is_first_call", as I
alludedto above, because we have to keep the chunkno around anyway. 

> I think this patch could possibly be split up into multiple patches.
> There's some question in my mind whether it's getting too late to
> commit any of this, since some of it looks suspiciously like new
> features after feature freeze. However, I kind of hate to ship this
> release without at least doing something about the chunkno vs.
> curchunk stuff, which is even worse in the committed code than in your
> patch, and which I think will confuse the heck out of users if those
> messages actually fire for anyone.

I'm in favor of cleaning up the committed code to have easier to understand output.  I don't really agree with any of
yourproposed changes to my patch, though, which is I think a first. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 19, 2021, at 5:07 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>
>
>> On Apr 19, 2021, at 12:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
>> <mark.dilger@enterprisedb.com> wrote:
>>> I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other
similarcorruption reports. 
>>
>> That makes sense.

I have refactored the patch to address your other concerns.  Breaking the patch into multiple pieces didn't add any
clarity,but refactoring portions of it made things simpler to read, I think, so here it is as one patch file. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I have refactored the patch to address your other concerns.  Breaking the patch into multiple pieces didn't add any
clarity,but refactoring portions of it made things simpler to read, I think, so here it is as one patch file.
 

I was hoping that this version was going to be smaller than the last
version, but instead it went from 300+ lines to 500+ lines.

The main thing I'm unhappy about in the status quo is the use of
chunkno in error messages. I have suggested several times making that
concept go away, because I think users will be confused. Here's a
minimal patch that does just that. It's 32 lines and results in a net
removal of 4 lines. It differs somewhat from my earlier suggestions,
because my priority here is to get reasonably understandable output
without needing a ton of code, and as I was working on this I found
that some of my earlier suggestions would have needed more code to
implement and I didn't think it bought enough to be worth it. It's
possible this is too simple, or that it's buggy, so let me know what
you think. But basically, I think what got committed before is
actually mostly fine and doesn't need major revision. It just needs
tidying up to avoid the confusing chunkno concept.

Now, the other thing we've talked about is adding a few more checks,
to verify for example that the toastrelid is what we expect, and I see
in your v22 you thought of a few other things. I think we can consider
those, possibly as things where we consider it tidying up loose ends
for v14, or else as improvements for v15. But I don't think that the
fairly large size of your patch comes primarily from additional
checks. I think it mostly comes from the code to produce error reports
getting a lot more complicated. I apologize if my comments have driven
that complexity, but they weren't intended to.

One tiny problem with the attached patch is that it does not make any
regression tests fail, which also makes it hard for me to tell if it
breaks anything, or if the existing code works. I don't know how
practical it is to do anything about that. Do you have a patch handy
that allows manual updates and deletes on TOAST tables, for manual
testing purposes?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> I have refactored the patch to address your other concerns.  Breaking the patch into multiple pieces didn't add any
clarity,but refactoring portions of it made things simpler to read, I think, so here it is as one patch file. 
>
> I was hoping that this version was going to be smaller than the last
> version, but instead it went from 300+ lines to 500+ lines.
>
> The main thing I'm unhappy about in the status quo is the use of
> chunkno in error messages. I have suggested several times making that
> concept go away, because I think users will be confused. Here's a
> minimal patch that does just that. It's 32 lines and results in a net
> removal of 4 lines. It differs somewhat from my earlier suggestions,
> because my priority here is to get reasonably understandable output
> without needing a ton of code, and as I was working on this I found
> that some of my earlier suggestions would have needed more code to
> implement and I didn't think it bought enough to be worth it. It's
> possible this is too simple, or that it's buggy, so let me know what
> you think. But basically, I think what got committed before is
> actually mostly fine and doesn't need major revision. It just needs
> tidying up to avoid the confusing chunkno concept.
>
> Now, the other thing we've talked about is adding a few more checks,
> to verify for example that the toastrelid is what we expect, and I see
> in your v22 you thought of a few other things. I think we can consider
> those, possibly as things where we consider it tidying up loose ends
> for v14, or else as improvements for v15. But I don't think that the
> fairly large size of your patch comes primarily from additional
> checks. I think it mostly comes from the code to produce error reports
> getting a lot more complicated. I apologize if my comments have driven
> that complexity, but they weren't intended to.
>
> One tiny problem with the attached patch is that it does not make any
> regression tests fail, which also makes it hard for me to tell if it
> breaks anything, or if the existing code works. I don't know how
> practical it is to do anything about that. Do you have a patch handy
> that allows manual updates and deletes on TOAST tables, for manual
> testing purposes?

Yes, I haven't been posting that with the patch because, but I will test your patch and see what differs.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 10:31 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> I will test your patch and see what differs.

Here are the differences between master and you patch:

UPDATE $toastname SET chunk_seq = chunk_seq + 1000 WHERE chunk_id = $value_id_to_corrupt

-                       qr/${header}toast value 16459 chunk 0 has sequence number 1000, but expected sequence number
0/,
-                       qr/${header}toast value 16459 chunk 1 has sequence number 1001, but expected sequence number
1/,
-                       qr/${header}toast value 16459 chunk 2 has sequence number 1002, but expected sequence number
2/,
-                       qr/${header}toast value 16459 chunk 3 has sequence number 1003, but expected sequence number
3/,
-                       qr/${header}toast value 16459 chunk 4 has sequence number 1004, but expected sequence number
4/,
-                       qr/${header}toast value 16459 chunk 5 has sequence number 1005, but expected sequence number
5/;
+               qr/${header}toast value 16459 index scan returned chunk 1000 when expecting chunk 0/,
+               qr/${header}toast value 16459 chunk 1000 follows last expected chunk 5/,
+               qr/${header}toast value 16459 chunk 1001 follows last expected chunk 5/,
+               qr/${header}toast value 16459 chunk 1002 follows last expected chunk 5/,
+               qr/${header}toast value 16459 chunk 1003 follows last expected chunk 5/,
+               qr/${header}toast value 16459 chunk 1004 follows last expected chunk 5/,
+               qr/${header}toast value 16459 chunk 1005 follows last expected chunk 5/;

UPDATE $toastname SET chunk_seq = chunk_seq * 1000 WHERE chunk_id = $value_id_to_corrupt

-                       qr/${header}toast value $value_id_to_corrupt chunk 1 has sequence number 1000, but expected
sequencenumber 1/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 2 has sequence number 2000, but expected
sequencenumber 2/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 3 has sequence number 3000, but expected
sequencenumber 3/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 4 has sequence number 4000, but expected
sequencenumber 4/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 5 has sequence number 5000, but expected
sequencenumber 5/; 
-
+               qr/${header}toast value 16460 index scan returned chunk 1000 when expecting chunk 1/,
+               qr/${header}toast value 16460 chunk 1000 follows last expected chunk 5/,
+               qr/${header}toast value 16460 index scan returned chunk 2000 when expecting chunk 1001/,
+               qr/${header}toast value 16460 chunk 2000 follows last expected chunk 5/,
+               qr/${header}toast value 16460 index scan returned chunk 3000 when expecting chunk 2001/,
+               qr/${header}toast value 16460 chunk 3000 follows last expected chunk 5/,
+               qr/${header}toast value 16460 index scan returned chunk 4000 when expecting chunk 3001/,
+               qr/${header}toast value 16460 chunk 4000 follows last expected chunk 5/,
+               qr/${header}toast value 16460 index scan returned chunk 5000 when expecting chunk 4001/,
+               qr/${header}toast value 16460 chunk 5000 follows last expected chunk 5/;

INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
    (SELECT chunk_id,
            10*chunk_seq + 1000,
            chunk_data
        FROM $toastname
        WHERE chunk_id = $value_id_to_corrupt)

-                       qr/${header}toast value $value_id_to_corrupt chunk 6 has sequence number 1000, but expected
sequencenumber 6/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 7 has sequence number 1010, but expected
sequencenumber 7/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 8 has sequence number 1020, but expected
sequencenumber 8/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 9 has sequence number 1030, but expected
sequencenumber 9/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 10 has sequence number 1040, but expected
sequencenumber 10/, 
-                       qr/${header}toast value $value_id_to_corrupt chunk 11 has sequence number 1050, but expected
sequencenumber 11/, 
-                       qr/${header}toast value $value_id_to_corrupt was expected to end at chunk 6, but ended at chunk
12/;
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1000 when expecting chunk 6/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1000 follows last expected chunk 5/,
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1010 when expecting chunk 1001/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1010 follows last expected chunk 5/,
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1020 when expecting chunk 1011/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1020 follows last expected chunk 5/,
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1030 when expecting chunk 1021/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1030 follows last expected chunk 5/,
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1040 when expecting chunk 1031/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1040 follows last expected chunk 5/,
+              qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1050 when expecting chunk 1041/,
+              qr/${header}toast value $value_id_to_corrupt chunk 1050 follows last expected chunk 5/;


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 23, 2021 at 2:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Here are the differences between master and you patch:

Thanks. Those messages look reasonable to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 11:05 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> Here are the differences between master and you patch:

Another difference I should probably mention is that a bunch of unrelated tests are failing with errors like:

    toast value 13465 chunk 0 has size 1995, but expected size 1996

which leads me to suspect your changes to how the size is calculated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Another difference I should probably mention is that a bunch of unrelated tests are failing with errors like:
>
>     toast value 13465 chunk 0 has size 1995, but expected size 1996
>
> which leads me to suspect your changes to how the size is calculated.

That seems like a pretty reasonable suspicion, but I can't see the problem:

-       expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE
-               : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) -
(endchunk * TOAST_MAX_CHUNK_SIZE);
+       expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
+               : extsize % TOAST_MAX_CHUNK_SIZE;

What's different?

1. The variables are renamed.

2. It uses a new variable extsize instead of recomputing
VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that
should have the same value.

3. I used modulo arithmetic (%) instead of subtracting endchunk *
TOAST_MAX_CHUNK_SIZE.

Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> Another difference I should probably mention is that a bunch of unrelated tests are failing with errors like:
>>
>>    toast value 13465 chunk 0 has size 1995, but expected size 1996
>>
>> which leads me to suspect your changes to how the size is calculated.
>
> That seems like a pretty reasonable suspicion, but I can't see the problem:
>
> -       expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE
> -               : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) -
> (endchunk * TOAST_MAX_CHUNK_SIZE);
> +       expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
> +               : extsize % TOAST_MAX_CHUNK_SIZE;
>
> What's different?
>
> 1. The variables are renamed.
>
> 2. It uses a new variable extsize instead of recomputing
> VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that
> should have the same value.
>
> 3. I used modulo arithmetic (%) instead of subtracting endchunk *
> TOAST_MAX_CHUNK_SIZE.
>
> Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert?

On  my laptop, yes, 1996 is TOAST_MAX_CHUNK_SIZE.

I'm not inserting anything.  These failures come from just regular tests that I have not changed.  I just applied your
patchand ran `make check-world` and these fail in src/bin/pg_amcheck  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> +       expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
> +               : extsize % TOAST_MAX_CHUNK_SIZE;
>
> What's different?

for one thing, if a sequence of chunks happens to fit perfectly, the final chunk will have size TOAST_MAX_CHUNK_SIZE,
butyou're expecting no larger than one less than that, given how modulo arithmetic works. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 23, 2021 at 2:36 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> > What's different?
>
> for one thing, if a sequence of chunks happens to fit perfectly, the final chunk will have size TOAST_MAX_CHUNK_SIZE,
butyou're expecting no larger than one less than that, given how modulo arithmetic works.
 

Good point.

Perhaps something like this, closer to the way you had it?

       expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
               : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE);

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 1:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Perhaps something like this, closer to the way you had it?
>
>       expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
>               : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE);

It still suffers the same failures.  I'll try to post something that accomplishes the changes to the reports that you
arelooking for. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 23, 2021, at 3:01 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> I'll try to post something that accomplishes the changes to the reports that you are looking for.

The attached patch changes amcheck corruption reports as discussed upthread.  This patch is submitted for the v14
developmentcycle as a bug fix, per your complaint that the committed code generates reports sufficiently confusing to a
useras to constitute a bug. 

All other code refactoring and additional checks discussed upthread are reserved for the v15 development cycle and are
notincluded here. 

The minimal patch (not attached) that does not rename any variables is 135 lines.  Your patch was 159 lines.  The patch
(attached)which includes your variable renaming is 174 lines. 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> The attached patch changes amcheck corruption reports as discussed upthread.  This patch is submitted for the v14
developmentcycle as a bug fix, per your complaint that the committed code generates reports sufficiently confusing to a
useras to constitute a bug. 
>
> All other code refactoring and additional checks discussed upthread are reserved for the v15 development cycle and
arenot included here. 
>
> The minimal patch (not attached) that does not rename any variables is 135 lines.  Your patch was 159 lines.  The
patch(attached) which includes your variable renaming is 174 lines. 

Hi,

I have compared this against my version. I found the following differences:

1. This version passes last_chunk_seq rather than extsize to
check_toast_tuple(). But this results in having to call
VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was
nicer to do that in the caller, so that we don't do it twice.

2. You fixed some out-of-date comments.

3. You move the test for an unexpected chunk sequence further down in
the function. I don't see the point; I had put it by the related null
check, and still think that's better. You also deleted my comment /*
Either the TOAST index is corrupt, or we don't have all chunks. */
which I would have preferred to keep.

4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
because we cannot compute a sensible expected size in that case. I
think your code will subtract a larger value from a smaller one and,
this being unsigned arithmetic, say that the expected chunk size is
something gigantic. Returning and not issuing that complaint at all
seems better.

5. You fixed the incorrect formula I had introduced for the expected
size of the last chunk.

6. You changed the variable name in check_toasted_attribute() from
expected_chunkno to chunkno, and initialized it later in the function
instead of at declaration time. I don't find this to be an
improvement; including the word "expected" seems to me to be
substantially clearer. But I think I should have gone with
expected_chunk_seq for better consistency.

7. You restored the message "toast value %u was expected to end at
chunk %d, but ended at chunk %d" which my version deleted. I deleted
that message because I thought it was redundant, but I guess it's not:
there's nothing else to complain if the sequence of chunks ends early.
I think we should change the test from != to < though, because if it's
>, then we must have already complained about unexpected chunks. Also,
I think the message is actually wrong, because even though you renamed
the variable, it still ends up being the expected next chunkno rather
than the last chunkno we actually saw.

PFA my counter-proposal based on the above analysis.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 9:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> The attached patch changes amcheck corruption reports as discussed upthread.  This patch is submitted for the v14
developmentcycle as a bug fix, per your complaint that the committed code generates reports sufficiently confusing to a
useras to constitute a bug. 
>>
>> All other code refactoring and additional checks discussed upthread are reserved for the v15 development cycle and
arenot included here. 
>>
>> The minimal patch (not attached) that does not rename any variables is 135 lines.  Your patch was 159 lines.  The
patch(attached) which includes your variable renaming is 174 lines. 
>
> Hi,
>
> I have compared this against my version. I found the following differences:

Just to be clear, I did not use your patch v1 as the starting point.  I took the code as committed to master as the
startingpoint, used your corruption report verbiage changes and at least some of your variable naming choices, but did
notuse the rest, in large part because it didn't work.  It caused corruption messages to be reported against tables
thathave no corruption.  For that matter, your v2 patch doesn't work either, and in the same way.  To wit: 

  heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 7:
     toast value 13461 chunk 0 has size 1995, but expected size 1996

I think there is something wrong with the way you are trying to calculate and use extsize, because I'm not corrupting
pg_catalog.pg_rewrite. You can get these same results by applying your patch to master, building, and running 'make
check'from src/bin/pg_amcheck/ 


> 1. This version passes last_chunk_seq rather than extsize to
> check_toast_tuple(). But this results in having to call
> VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was
> nicer to do that in the caller, so that we don't do it twice.

I don't see that VARATT_EXTERNAL_GET_EXTSIZE() is worth too much concern, given that is just a struct access and a bit
mask. You are avoiding calculating that twice, but at the expense of calculating last_chunk_seq twice, which involves
division. I don't think the division can be optimized as a mere bit shift, since TOAST_MAX_CHUNK_SIZE is not in general
apower of two.  (For example, on my laptop it is 1996.) 

I don't say this to nitpick at the performance one way vs. the other.  I doubt it makes any real difference.  I'm just
confusedwhy you want to change this particular thing right now, given that it is not a bug. 

> 2. You fixed some out-of-date comments.

Yes, because they were wrong.  That's on me.  I failed to update them in a prior patch.

> 3. You move the test for an unexpected chunk sequence further down in
> the function. I don't see the point;

Relative to your patch, perhaps.  Relative to master, no tests have been moved.

> I had put it by the related null
> check, and still think that's better. You also deleted my comment /*
> Either the TOAST index is corrupt, or we don't have all chunks. */
> which I would have preferred to keep.

That's fine.  I didn't mean to remove it.  I was just taking a minimalist approach to constructing the patch.

> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
> because we cannot compute a sensible expected size in that case. I
> think your code will subtract a larger value from a smaller one and,
> this being unsigned arithmetic, say that the expected chunk size is
> something gigantic.

Your conclusion is probably right, but I think your analysis is based on a misreading of what "last_chunk_seq" means.
It'snot the last one seen, but the last one expected.  (Should we rename the variable to avoid confusion?)  It won't
computea gigantic size.  Rather, it will expect *every* chunk with chunk_seq >= last_chunk_seq to have whatever size is
appropriatefor the last chunk.  

> Returning and not issuing that complaint at all
> seems better.

That might be best.  I had been resisting that because I don't want the extraneous chunks to be reported without chunk
sizeinformation.  When debugging corrupted toast, it may be interesting to know the size of the extraneous chunks.  If
thereare 1000 extra chunks, somebody might want to see the sizes of them. 

> 5. You fixed the incorrect formula I had introduced for the expected
> size of the last chunk.

Not really.  I just didn't introduce any change in that area.

> 6. You changed the variable name in check_toasted_attribute() from
> expected_chunkno to chunkno, and initialized it later in the function
> instead of at declaration time. I don't find this to be an
> improvement;

I think I just left the variable name and its initialization unchanged.

> including the word "expected" seems to me to be
> substantially clearer. But I think I should have gone with
> expected_chunk_seq for better consistency.

I agree that is a better name.

> 7. You restored the message "toast value %u was expected to end at
> chunk %d, but ended at chunk %d" which my version deleted. I deleted
> that message because I thought it was redundant, but I guess it's not:
> there's nothing else to complain if the sequence of chunks ends early.
> I think we should change the test from != to < though, because if it's
>> , then we must have already complained about unexpected chunks.

We can do it that way if you like.  I considered that and had trouble deciding if that made things less clear to users
whomight be less familiar with the structure of toasted attributes.  If some of the attributes have that message and
othersdon't, they might conclude that only some of the attributes ended at the wrong chunk and fail to make the
inferencethat to you or me is obvious. 

>> Also,
> I think the message is actually wrong, because even though you renamed
> the variable, it still ends up being the expected next chunkno rather
> than the last chunkno we actually saw.

If we have seen any chunks, the variable is holding the expected next chunk seq, which is one greater than the last
chunkseq we saw. 

If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain ..."expected to end at chunk 4, but ended at
chunk1".  This is clearly by design and not merely a bug, though I tend to agree with you that this is a strange
wordingchoice.  I can't remember exactly when and how we decided to word the message this way, but it has annoyed me
fora while, and I assumed it was something you suggested a while back, because I don't recall doing it.  Either way,
sinceyou seem to also be bothered by this, I agree we should change it. 

> PFA my counter-proposal based on the above analysis.




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Just to be clear, I did not use your patch v1 as the starting point.

I thought that might be the case, but I was trying to understand what
you didn't like about my version, and comparing them seemed like a way
to figure that out.

> I took the code as committed to master as the starting point, used your corruption report verbiage changes and at
leastsome of your variable naming choices, but did not use the rest, in large part because it didn't work.  It caused
corruptionmessages to be reported against tables that have no corruption.  For that matter, your v2 patch doesn't work
either,and in the same way.  To wit: 
>
>   heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 7:
>      toast value 13461 chunk 0 has size 1995, but expected size 1996
>
> I think there is something wrong with the way you are trying to calculate and use extsize, because I'm not corrupting
pg_catalog.pg_rewrite. You can get these same results by applying your patch to master, building, and running 'make
check'from src/bin/pg_amcheck/ 

Argh, OK, I didn't realize. Should be fixed in this version.

> > 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
> > because we cannot compute a sensible expected size in that case. I
> > think your code will subtract a larger value from a smaller one and,
> > this being unsigned arithmetic, say that the expected chunk size is
> > something gigantic.
>
> Your conclusion is probably right, but I think your analysis is based on a misreading of what "last_chunk_seq" means.
It's not the last one seen, but the last one expected.  (Should we rename the variable to avoid confusion?)  It won't
computea gigantic size.  Rather, it will expect *every* chunk with chunk_seq >= last_chunk_seq to have whatever size is
appropriatefor the last chunk. 

I realize it's the last one expected. That's the point: we don't have
any expectation for the sizes of chunks higher than the last one we
expected to see. If the value is 2000 bytes and the chunk size is 1996
bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes.
If not, we can complain. But it makes no sense to complain about chunk
2 being of a size we don't expect. We don't expect it to exist in the
first place, so we have no notion of what size it ought to be.

> If we have seen any chunks, the variable is holding the expected next chunk seq, which is one greater than the last
chunkseq we saw. 
>
> If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain ..."expected to end at chunk 4, but ended
atchunk 1".  This is clearly by design and not merely a bug, though I tend to agree with you that this is a strange
wordingchoice.  I can't remember exactly when and how we decided to word the message this way, but it has annoyed me
fora while, and I assumed it was something you suggested a while back, because I don't recall doing it.  Either way,
sinceyou seem to also be bothered by this, I agree we should change it. 

Can you review this version?

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> Just to be clear, I did not use your patch v1 as the starting point.
>
> I thought that might be the case, but I was trying to understand what
> you didn't like about my version, and comparing them seemed like a way
> to figure that out.
>
>> I took the code as committed to master as the starting point, used your corruption report verbiage changes and at
leastsome of your variable naming choices, but did not use the rest, in large part because it didn't work.  It caused
corruptionmessages to be reported against tables that have no corruption.  For that matter, your v2 patch doesn't work
either,and in the same way.  To wit: 
>>
>>  heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 7:
>>     toast value 13461 chunk 0 has size 1995, but expected size 1996
>>
>> I think there is something wrong with the way you are trying to calculate and use extsize, because I'm not
corruptingpg_catalog.pg_rewrite.  You can get these same results by applying your patch to master, building, and
running'make check' from src/bin/pg_amcheck/ 
>
> Argh, OK, I didn't realize. Should be fixed in this version.
>
>>> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
>>> because we cannot compute a sensible expected size in that case. I
>>> think your code will subtract a larger value from a smaller one and,
>>> this being unsigned arithmetic, say that the expected chunk size is
>>> something gigantic.
>>
>> Your conclusion is probably right, but I think your analysis is based on a misreading of what "last_chunk_seq"
means. It's not the last one seen, but the last one expected.  (Should we rename the variable to avoid confusion?)  It
won'tcompute a gigantic size.  Rather, it will expect *every* chunk with chunk_seq >= last_chunk_seq to have whatever
sizeis appropriate for the last chunk. 
>
> I realize it's the last one expected. That's the point: we don't have
> any expectation for the sizes of chunks higher than the last one we
> expected to see. If the value is 2000 bytes and the chunk size is 1996
> bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes.
> If not, we can complain. But it makes no sense to complain about chunk
> 2 being of a size we don't expect. We don't expect it to exist in the
> first place, so we have no notion of what size it ought to be.
>
>> If we have seen any chunks, the variable is holding the expected next chunk seq, which is one greater than the last
chunkseq we saw. 
>>
>> If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain ..."expected to end at chunk 4, but ended
atchunk 1".  This is clearly by design and not merely a bug, though I tend to agree with you that this is a strange
wordingchoice.  I can't remember exactly when and how we decided to word the message this way, but it has annoyed me
fora while, and I assumed it was something you suggested a while back, because I don't recall doing it.  Either way,
sinceyou seem to also be bothered by this, I agree we should change it. 
>
> Can you review this version?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
> <simply-remove-chunkno-concept-v3.patch>

As requested off-list, here are NOT FOR COMMIT, WIP patches for testing only.

The first patch allows toast tables to be updated and adds regression tests of corrupted toasted attributes.  I never
quitegot deletes from toast tables to work, and there are probably other gotchas still lurking even with inserts and
updates,but it limps along well enough for testing pg_amcheck. 

The second patch updates the expected output of pg_amcheck to match the verbiage that you suggested upthread.



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Can you review this version?

It looks mostly good to me.  There is a off-by-one error introduced with:

-   else if (chunkno != (endchunk + 1))
+   else if (expected_chunk_seq < last_chunk_seq)

I think that needs to be

+  else if (expected_chunk_seq <= last_chunk_seq)

because otherwise it won't complain if the only missing chunk is the very last one.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 30, 2021 at 3:26 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> It looks mostly good to me.  There is a off-by-one error introduced with:
>
> -   else if (chunkno != (endchunk + 1))
> +   else if (expected_chunk_seq < last_chunk_seq)
>
> I think that needs to be
>
> +  else if (expected_chunk_seq <= last_chunk_seq)
>
> because otherwise it won't complain if the only missing chunk is the very last one.

OK, how about this version?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 12:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> OK, how about this version?

I think that's committable.

The only nitpick might be

-                               psprintf("toast value %u was expected to end at chunk %d, but ended at chunk %d",
+                               psprintf("toast value %u index scan ended early while expecting chunk %d of %d",

When reporting to users about positions within a zero-based indexing scheme, what does "while expecting chunk 3 of 4"
mean? Is it talking about the last chunk from the set [0..3] which has cardinality 4, or does it mean the next-to-last
chunkfrom [0..4] which ends with chunk 4, or what?  The prior language isn't any more clear than what you have here, so
Ihave no objection to committing this, but the prior language was probably as goofy as it was because it was trying to
dealwith this issue. 

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 30, 2021 at 3:41 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I think that's committable.
>
> The only nitpick might be
>
> -                               psprintf("toast value %u was expected to end at chunk %d, but ended at chunk %d",
> +                               psprintf("toast value %u index scan ended early while expecting chunk %d of %d",
>
> When reporting to users about positions within a zero-based indexing scheme, what does "while expecting chunk 3 of 4"
mean? Is it talking about the last chunk from the set [0..3] which has cardinality 4, or does it mean the next-to-last
chunkfrom [0..4] which ends with chunk 4, or what?  The prior language isn't any more clear than what you have here, so
Ihave no objection to committing this, but the prior language was probably as goofy as it was because it was trying to
dealwith this issue. 

Hmm, I think that might need adjustment, actually. What I was trying
to do is compensate for the fact that what we now have is the next
chunk_seq value we expect, not the last one we saw, nor the total
number of chunks we've seen regardless of what chunk_seq they had. But
I thought it would be too confusing to just give the chunk number we
were expecting and not say anything about how many chunks we thought
there would be in total. So maybe what I should do is change it to
something like this:

toast value %u was expected to end at chunk %d, but ended while
expecting chunk %d

i.e. same as the currently-committed code, except for changing "ended
at" to "ended while expecting."

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 12:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Hmm, I think that might need adjustment, actually. What I was trying
> to do is compensate for the fact that what we now have is the next
> chunk_seq value we expect, not the last one we saw, nor the total
> number of chunks we've seen regardless of what chunk_seq they had. But
> I thought it would be too confusing to just give the chunk number we
> were expecting and not say anything about how many chunks we thought
> there would be in total. So maybe what I should do is change it to
> something like this:
>
> toast value %u was expected to end at chunk %d, but ended while
> expecting chunk %d
>
> i.e. same as the currently-committed code, except for changing "ended
> at" to "ended while expecting."

I find the grammar of this new formulation anomalous for hard to articulate reasons not quite the same as but akin to
mismatchedverb aspect. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Mark Dilger
Дата:

> On Apr 30, 2021, at 1:04 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>> toast value %u was expected to end at chunk %d, but ended while
>> expecting chunk %d
>>
>> i.e. same as the currently-committed code, except for changing "ended
>> at" to "ended while expecting."
>
> I find the grammar of this new formulation anomalous for hard to articulate reasons not quite the same as but akin to
mismatchedverb aspect. 

After further reflection, no other verbiage seems any better.  I'd say go ahead and commit it this way.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> After further reflection, no other verbiage seems any better.  I'd say go ahead and commit it this way.

OK. I'll plan to do that on Monday, barring objections.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_amcheck contrib application

От
Robert Haas
Дата:
On Fri, Apr 30, 2021 at 5:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > After further reflection, no other verbiage seems any better.  I'd say go ahead and commit it this way.
>
> OK. I'll plan to do that on Monday, barring objections.

Done now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com