Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: new heapcheck contrib module
Дата
Msg-id A4C9E0AE-46BA-457F-A445-4527E5AA5E3C@enterprisedb.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Robert, Peter, Andrey, Stephen, and Michael,

Attached is a new version based in part on your review comments, quoted and responded to below as necessary.

There remain a few open issues and/or things I did not implement:

- This version follows Robert's suggestion of using pg_class_aclcheck() to check that the caller has permission to
selectfrom the table being checked.  This is inconsistent with the btree checking logic, which does no such check.
Thesetwo approaches should be reconciled, but there was apparently no agreement on this issue. 

- The public facing documentation, currently live at https://www.postgresql.org/docs/13/amcheck.html, claims "amcheck
functionsmay only be used by superusers."  The docs on master still say the same.  This patch replaces that language
withalternate language explaining that execute permissions may be granted to non-superusers, along with a warning about
therisk of data leakage.  Perhaps some portion of that language in this patch should be back-patched? 

- Stephen's comments about restricting how much information goes into the returned corruption report depending on the
permissionsof the caller has not been implemented.  I may implement some of this if doing so is consistent with
whateverwe decide to do for the aclcheck issue, above, though probably not.  It seems overly complicated. 

- This version does not change clog handling, which leaves Andrey's concern unaddressed.  Peter also showed some
supportfor (or perhaps just a lack of opposition to) doing more of what Andrey suggests.  I may come back to this
issue,depending on time available and further feedback. 


Moving on to Michael's review....

> On Sep 28, 2020, at 10:56 PM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote:
>> Removed.
>
> This patch is failing to compile on Windows:
> C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error
>  C1083: Cannot open include file: 'libpq-fe.h': No such file or
>  directory [C:\projects\postgresql\pg_amcheck.vcxproj]
>
> It looks like you forgot to tweak the scripts in src/tools/msvc/.

Fixed, I think.  I have not tested on windows.


Moving on to Stephen's review....

> On Sep 23, 2020, at 6:46 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Peter Geoghegan (pg@bowt.ie) wrote:
>> On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>> But now I see that there's no secondary permission check in the
>>> verify_nbtree.c code. Is that intentional? Peter, what's the
>>> justification for that?
>>
>> As noted by comments in contrib/amcheck/sql/check_btree.sql (the
>> verify_nbtree.c tests), this is intentional. Note that we explicitly
>> test that a non-superuser role can perform verification following
>> GRANT EXECUTE ON FUNCTION ... .
>
>> As I mentioned earlier, this is supported (or at least it is supported
>> in my interpretation of things). It just isn't documented anywhere
>> outside the test itself.
>
> Would certainly be good to document this but I tend to agree with the
> comments that ideally-
>
> a) it'd be nice for a relatively low-privileged user/process could run
>   the tests in an ongoing manner
> b) we don't want to add more is-superuser checks
> c) users shouldn't really be given the ability to see rows they're not
>   supposed to have access to
>
> In other places in the code, when an error is generated and the user
> doesn't have access to the underlying table or doesn't have BYPASSRLS,
> we don't include the details or the actual data in the error.  Perhaps
> that approach would make sense here (or perhaps not, but it doesn't seem
> entirely crazy to me, anyway).  In other words:
>
> a) keep the ability for someone who has EXECUTE on the function to be
>   able to run the function against any relation
> b) when we detect an issue, perform a permissions check to see if the
>   user calling the function has rights to read the rows of the table
>   and, if RLS is enabled on the table, if they have BYPASSRLS
> c) if the user has appropriate privileges, log the detailed error, if
>   not, return a generic error with a HINT that details weren't
>   available due to lack of privileges on the relation
>
> I can appreciate the concerns regarding dead rows ending up being
> visible to someone who wouldn't normally be able to see them but I'd
> argue we could simply document that fact rather than try to build
> something to address it, for this particular case.  If there's push back
> on that then I'd suggest we have a "can read dead rows" or some such
> capability that can be GRANT'd (in the form of a default role, I would
> think) which a user would also have to have in order to get detailed
> error reports from this function.

There wasn't enough agreement on the thread about how this should work, so I left this idea unimplemented.

I'm a bit concerned that restricting the results for non-superusers would create a perverse incentive to use a
superuserrole to connect and check tables.  On the other hand, there would not be any difference in the output in the
commoncase that no corruption exists, so maybe the perverse incentive would not be too significant. 

Implementing the idea you outline would complicate the patch a fair amount, as we'd need to tailor all the reports in
thisway, and extend the tests to verify we're not leaking any information to non-superusers.  I would prefer to find a
simplersolution. 


Moving on to Robert's review....

> On Sep 21, 2020, at 2:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> Thanks for the review!
>
> +                                                         msg OUT text
> +                                                         )
>
> Looks like atypical formatting.
>
> +REVOKE ALL ON FUNCTION
> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
> +FROM PUBLIC;
>
> This too.

Changed in this next version.

> +-- Don't want this to be available to public
>
> Add "by default, but superusers can grant access" or so?

Hmm.  I borrowed the verbiage from elsewhere.

contrib/pg_buffercache/pg_buffercache--1.2.sql:-- Don't want these to be available to public.
contrib/pg_freespacemap/pg_freespacemap--1.1.sql:-- Don't want these to be available to public.
contrib/pg_visibility/pg_visibility--1.1.sql:-- Don't want these to be available to public.

> I think there should be a call to pg_class_aclcheck() here, just like
> the one in pg_prewarm, so that if the superuser does choose to grant
> access, users given access can check tables they anyway have
> permission to access, but not others. Maybe put that in
> check_relation_relkind_and_relam() and rename it. Might want to look
> at the pg_surgery precedent, too.

I don't think there are any great options here, but for this next version I've done it with pg_class_aclcheck().

> Oh, and that functions header
> comment is also wrong.

Changed in this next version.

> I think that the way the checks on the block range are performed could
> be improved. Generally, we want to avoid reporting the same problem
> with a variety of different message strings, because it adds burden
> for translators and is potentially confusing for users. You've got two
> message strings that are only going to be used for empty relations and
> a third message string that is only going to be used for non-empty
> relations. What stops you from just ripping off the way that this is
> done in pg_prewarm, which requires only 2 messages? Then you'd be
> adding a net total of 0 new messages instead of 3, and in my view they
> would be clearer than your third message, "block range is out of
> bounds for relation with block count %u: " INT64_FORMAT " .. "
> INT64_FORMAT, which doesn't say very precisely what the problem is,
> and also falls afoul of our usual practice of avoiding the use of
> INT64_FORMAT in error messages that are subject to translation. I
> notice that pg_prewarm just silently does nothing if the start and end
> blocks are swapped, rather than generating an error. We could choose
> to do differently here, but I'm not sure why we should bother.

This next version borrows pg_prewarm's messages as you suggest, except that pg_prewarm embeds INT64_FORMAT in the
messagestrings, which are replaced with %u in this next patch.  Also, there is no good way to report an invalid block
rangefor empty tables using these messages, so the patch now just exists early in such a case for invalid ranges
withoutthrowing an error.  This is a little bit non-orthogonal with how invalid block ranges are handled on non-empty
tables,but perhaps that's ok.  

>
> +                       all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
> +                       all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
> +
> +                       if ((all_frozen && skip_option ==
> SKIP_PAGES_ALL_FROZEN) ||
> +                               (all_visible && skip_option ==
> SKIP_PAGES_ALL_VISIBLE))
> +                       {
> +                               continue;
> +                       }
>
> This isn't horrible style, but why not just get rid of the local
> variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if
> ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... }
>
> Typically no braces around a block containing only one line.

Changed in this next version.

> + * table contains corrupt all frozen bits, a concurrent vacuum might skip the
>
> all-frozen?

Changed in this next version.

> + * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
> + * seems acceptable, since if we had checked it earlier in our scan it would
> + * have truly been valid at that time, and we break no MVCC guarantees by
> + * failing to notice the concurrent change in its status.
>
> I agree with the first half of this sentence, but I don't know what
> MVCC guarantees have to do with anything. I'd just delete the second
> part, or make it a lot clearer.

Changed in this next version to simply omit the MVCC related language.

>
> + * Some kinds of tuple header corruption make it unsafe to check the tuple
> + * attributes, for example when the tuple is foreshortened and such checks
> + * would read beyond the end of the line pointer (and perhaps the page).  In
>
> I think of foreshortening mostly as an art term, though I guess it has
> other meanings. Maybe it would be clearer to say something like "Some
> kinds of corruption make it unsafe to check the tuple attributes, for
> example when the line pointer refers to a range of bytes outside the
> page"?
>
> + * Other kinds of tuple header corruption do not bare on the question of
>
> bear

Changed.

> +                                                 pstrdup(_("updating
> transaction ID marked incompatibly as keys updated and locked
> only")));
> +                                                 pstrdup(_("updating
> transaction ID marked incompatibly as committed and as a
> multitransaction ID")));
>
> "updating transaction ID" might scare somebody who thinks that you are
> telling them that you changed something. That's not what it means, but
> it might not be totally clear. Maybe:
>
> tuple is marked as only locked, but also claims key columns were updated
> multixact should not be marked committed

Changed to use your verbiage.

> +
> psprintf(_("data offset differs from expected: %u vs. %u (1 attribute,
> has nulls)"),
>
> For these, how about:
>
> tuple data should begin at byte %u, but actually begins at byte %u (1
> attribute, has nulls)
> etc.

Is it ok to embed interpolated values into the message string like that?  I thought that made it harder for
translators. I agree that your language is easier to understand, and have used it in this next version of the patch.
Manyof your comments that follow raise the same issue, but I'm using your verbiage anyway. 

> +
> psprintf(_("old-style VACUUM FULL transaction ID is in the future:
> %u"),
> +
> psprintf(_("old-style VACUUM FULL transaction ID precedes freeze
> threshold: %u"),
> +
> psprintf(_("old-style VACUUM FULL transaction ID is invalid in this
> relation: %u"),
>
> old-style VACUUM FULL transaction ID %u is in the future
> old-style VACUUM FULL transaction ID %u precedes freeze threshold %u
> old-style VACUUM FULL transaction ID %u out of range %u..%u
>
> Doesn't the second of these overlap with the third?

Good point.  If the second one reports, so will the third.  I've changed it to use if/else if logic to avoid that, and
touse your suggested verbiage. 

>
> Similarly in other places, e.g.
>
> +
> psprintf(_("inserting transaction ID is in the future: %u"),
>
> I think this should change to: inserting transaction ID %u is in the future

Changed, along with similarly formatted messages.

>
> +       else if (VARATT_IS_SHORT(chunk))
> +               /*
> +                * could happen due to heap_form_tuple doing its thing
> +                */
> +               chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
>
> Add braces here, since there are multiple lines.

Changed.

>
> +                                                 psprintf(_("toast
> chunk sequence number not the expected sequence number: %u vs. %u"),
>
> toast chunk sequence number %u does not match expected sequence number %u
>
> There are more instances of this kind of thing.

Changed.

> +
> psprintf(_("toasted attribute has unexpected TOAST tag: %u"),
>
> Remove colon.

Changed.

> +
> psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
> %u (attribute length %u)"),
>
> Let's try to specify the attribute number in the attribute messages
> where we can, e.g.
>
> +
> psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
> %u (attribute length %u)"),
>
> How about: attribute %u with length %u should end at offset %u, but
> the tuple length is only %u

I had omitted the attribute numbers from the attribute corruption messages because attnum is one of the OUT parameters
fromverify_heapam.  I'm including attnum in the message text for this next version, as you request. 

> +               if (TransactionIdIsNormal(ctx->relfrozenxid) &&
> +                       TransactionIdPrecedes(xmin, ctx->relfrozenxid))
> +               {
> +                       report_corruption(ctx,
> +                                                         /*
> translator: Both %u are transaction IDs. */
> +
> psprintf(_("inserting transaction ID is from before freeze cutoff: %u
> vs. %u"),
> +
>    xmin, ctx->relfrozenxid));
> +                       fatal = true;
> +               }
> +               else if (!xid_valid_in_rel(xmin, ctx))
> +               {
> +                       report_corruption(ctx,
> +                                                         /*
> translator: %u is a transaction ID. */
> +
> psprintf(_("inserting transaction ID is in the future: %u"),
> +
>    xmin));
> +                       fatal = true;
> +               }
>
> This seems like good evidence that xid_valid_in_rel needs some
> rethinking. As far as I can see, every place where you call
> xid_valid_in_rel, you have checks beforehand that duplicate some of
> what it does, so that you can give a more accurate error message.
> That's not good. Either the message should be adjusted so that it
> covers all the cases "e.g. tuple xmin %u is outside acceptable range
> %u..%u" or we should just get rid of xid_valid_in_rel() and have
> separate error messages for each case, e.g. tuple xmin %u precedes
> relfrozenxid %u".

This next version is refactored, removing the function xid_valid_in_rel entirely, and structuring get_xid_status
differently.

> I think it's OK to use terms like xmin and xmax in
> these messages, rather than inserting transaction ID etc. We have
> existing instances of that, and while someone might judge it
> user-unfriendly, I disagree. A person who is qualified to interpret
> this output must know what 'tuplex min' means immediately, but whether
> they can understand that 'inserting transaction ID' means the same
> thing is questionable, I think.

Done.

> This is not a full review, but in general I think that this is getting
> pretty close to being committable. The error messages seem to still
> need some polishing and I wouldn't be surprised if there are a few
> more bugs lurking yet, but I think it's come a long way.

This next version has some other message rewording.  While testing, I found it odd to report an xid as out of bounds
(inthe future, or before the freeze threshold, etc.), without mentioning the xid value against which it is being
comparedunfavorably.  We don't normally need to think about the epoch when comparing two xids against each other, as
theymust both make sense relative to the current epoch; but for corruption, you can't assume the corrupt xid was
writtenrelative to any particular epoch, and only the 32-bit xid value can be printed since the epoch is unknown.  The
otherxid value (freeze threshold, etc) can be printed with the epoch information, but printing the epoch+xid merely as
xid8outdoes (in other words, as a UINT64) makes the messages thoroughly confusing.  I went with the equivalent of
sprintf("%u:%u",epoch, xid), which follows the precedent from pg_controldata.c, gistdesc.c, and elsewhere. 


Moving on to Peter's reviews....

> On Sep 22, 2020, at 4:18 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> +REVOKE ALL ON FUNCTION
>> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
>> +FROM PUBLIC;
>>
>> This too.
>
> Do we really want to use a cstring as an enum-like argument?

Perhaps not.  This next version has that as text.

>
> I think that I see a bug at this point in check_tuple() (in
> v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch):
>
>> +   /* If xmax is a multixact, it should be within valid range */
>> +   xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
>> +   if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx))
>> +   {
>
> *** SNIP ***
>
>> +   }
>> +
>> +   /* If xmax is normal, it should be within valid range */
>> +   if (TransactionIdIsNormal(xmax))
>> +   {
>
> Why should it be okay to call TransactionIdIsNormal(xmax) at this
> point? It isn't certain that xmax is an XID at all (could be a
> MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the
> value in the first place). Don't you need to check "(infomask &
> HEAP_XMAX_IS_MULTI) == 0" here?

I think you are right.  This check you suggest is used in this next version.


> On Sep 22, 2020, at 5:16 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog.  Ultimately, I
rippedthat out.  My reasoning was that a simpler patch submission was more likely to be acceptable to the community. 
>
> Isn't some kind of pragmatic compromise possible?
>
>> But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.
>
> Fair enough, but if you're alluding to what I said then about
> check_tuphdr_xids()/clog checking a while back then FWIW I didn't
> intend to block progress on clog/xact status verification at all.

I don't recall your comments factoring into my thinking on this specific issue, but rather a conversation I had
off-listwith Robert.  The clog interface may be a hot enough code path that adding a flag for non-throwing behavior
merelyto support a contrib module might be resisted.  If folks generally like such a change to the clog interface, I
couldconsider adding that as a third patch in this set. 

> I
> just don't think that it is sensible to impose an iron clad guarantee
> about having no assertion failures with corrupt clog data -- that
> leads to far too much code duplication. But why should you need to
> provide an absolute guarantee of that?
>
> I for one would be fine with making the clog checks an optional extra,
> that rescinds the no crash guarantee that you're keen on -- just like
> with the TOAST checks that you have already in v15. It might make
> sense to review how often crashes occur with simulated corruption, and
> then to minimize the number of occurrences in the real world. Maybe we
> could tolerate a usually-no-crash interface to clog -- if it could
> still have assertion failures. Making a strong guarantee about
> assertions seems unnecessary.
>
> I don't see how verify_heapam will avoid raising an error during basic
> validation from PageIsVerified(), which will violate the guarantee
> about not throwing errors. I don't see that as a problem myself, but
> presumably you will.

My concern is not so much that verify_heapam will stop with an error, but rather that it might trigger a panic that
stopsall backends.  Stopping with an error merely because it hits corruption is not ideal, as I would rather it
completedthe scan and reported all corruptions found, but that's minor compared to the damage done if verify_heapam
createsdowntime in a production environment offering high availability guarantees.  That statement might seem nuts,
giventhat the corrupt table itself would be causing downtime, but that analysis depends on assumptions about table
accesspatterns, and there is no a priori reason to think that corrupt pages are necessarily ever being accessed, or
accessedin a way that causes crashes (rather than merely wrong results) outside verify_heapam scanning the whole table. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Следующее
От: Tom Lane
Дата:
Сообщение: Recent failures on buildfarm member hornet