Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: new heapcheck contrib module
Дата
Msg-id 0C0349EB-DD85-4DA8-82C7-DF70AB714592@enterprisedb.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: new heapcheck contrib module  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers

> On Jun 28, 2020, at 9:05 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Some more comments on v9_0001.
> 1.
> + LWLockAcquire(XidGenLock, LW_SHARED);
> + nextFullXid = ShmemVariableCache->nextFullXid;
> + ctx.oldestValidXid = ShmemVariableCache->oldestXid;
> + LWLockRelease(XidGenLock);
> + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
> ...
> ...
> +
> + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
> + {
> + int32 mapbits;
> + OffsetNumber maxoff;
> + PageHeader ph;
> +
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +    &vmbuffer);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
> +
> + /* Read and lock the next page. */
> + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
> + RBM_NORMAL, ctx.bstrategy);
> + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
>
> I might be missing something, but it appears that first we are getting
> the nextFullXid and after that, we are scanning the block by block.
> So while we are scanning the block if the nextXid is advanced and it
> has updated some tuple in the heap pages,  then it seems the current
> logic will complain about out of range xid.  I did not test this
> behavior so please point me to the logic which is protecting this.

We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so.  We cannot
knowthat the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its
validity,and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check.
Thatlogic is in TransactionIdValidInRel. 

> 2.
> /*
> * Helper function to construct the TupleDesc needed by verify_heapam.
> */
> static TupleDesc
> verify_heapam_tupdesc(void)
>
> From function name, it appeared that it is verifying tuple descriptor
> but this is just creating the tuple descriptor.

In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records.  This is the tuple
descriptorfor that function.  I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as
(verify_heapam)_tupdesc. Do you have a name you would prefer? 

> 3.
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +    &vmbuffer);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
>
> Here, do we want to test that in VM the all visible bit is set whereas
> on the page it is not set?  That can lead to a wrong result in an
> index-only scan.

If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot
loadthe page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having
specifiedthese options.  Without loading the page, we cannot check the page's header bits. 

When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page
inorder to compare the two, being careful not to hold a pin on the one while performing I/O on the other.  See for
examplethe logic in heap_delete().  But I'm not sure what guarantees the system makes about agreement between these two
bits. Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is
all-visiblewill always have its all-visible bit set?  I don't know if (possibly transient) disagreement between these
twobits constitutes corruption.  Perhaps others following this thread can advise? 

> 4. One cosmetic comment
>
> + /* Skip non-varlena values, but update offset first */
> ..
> +
> + /* Ok, we're looking at a varlena attribute. */
>
> Throughout the patch, I have noticed that some of your single-line
> comments have "full stop" whereas other don't.  Can we keep them
> consistent?

I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments.  To me, a "full stop"
meansthat a sentence has reached its conclusion.  I don't intentionally use one at the end of a fragment, unless the
fragmentprecedes a full sentence, in which case the "full stop" is needed to separate the two.  Of course, I may have
violatedmy own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can
agreeon what the rule is?  (This has probably been discussed before and agreed before.  A link to the appropriate email
threadwould be sufficient.) 

For example:

    /* red, green, or blue */
    /* set to pink */
    /* set to blue.  We have not closed the file. */
    /* At this point, we have chosen the color. */

The first comment is not a sentence, but the fourth is.  The third comment is a fragment followed by a full sentence,
anda "full stop" separates the two.  As for the second comment, as I recall, verb phrases can be interpreted as a full
sentence,as in "Close the door!", when they are meant as commands to the listener, but not otherwise.  "set to pink" is
nota command to the reader, but rather a description of what the code is doing at that point, so I think of it as a
mereverb phrase and not a full sentence. 

Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that
wouldereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in
unmodifiedform.  That means that my normal commenting rules don't apply, as I'm not the comment author in such cases. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fwd: PostgreSQL: WolfSSL support
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Commitfest 2020-07