Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: new heapcheck contrib module
Дата
Msg-id CAFiTN-uYg_PiZ854=5R-D9skratapAHKhaDAG7fhJv-Vn5UhaQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Ответы Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Список pgsql-hackers
On Sat, Jun 13, 2020 at 2:36 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
>
>
>
> > On Jun 11, 2020, at 11:35 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
> > <mark.dilger@enterprisedb.com> wrote:
> >>
> >>
> >>
> >>> On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >>>
> >>> I have just browsed through the patch and the idea is quite
> >>> interesting.  I think we can expand it to check that whether the flags
> >>> set in the infomask are sane or not w.r.t other flags and xid status.
> >>> Some examples are
> >>>
> >>> - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
> >>> should not be set in new_infomask2.
> >>> - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
> >>> actually cross verify the transaction status from the CLOG and check
> >>> whether is matching the hint bit or not.
> >>>
> >>> While browsing through the code I could not find that we are doing
> >>> this kind of check,  ignore if we are already checking this.
> >>
> >> Thanks for taking a look!
> >>
> >> Having both of those bits set simultaneously appears to fall into a different category than what I wrote
verify_heapam.cto detect. 
> >
> > Ok
> >
> >
> >>  It doesn't violate any assertion in the backend, nor does it cause
> >> the code to crash.  (At least, I don't immediately see how it does
> >> either of those things.)  At first glance it appears invalid to have
> >> those bits both set simultaneously, but I'm hesitant to enforce that
> >> without good reason.  If it is a good thing to enforce, should we also
> >> change the backend code to Assert?
> >
> > Yeah, it may not hit assert or crash but it could lead to a wrong
> > result.  But I agree that it could be an assertion in the backend
> > code.
>
> For v7, I've added an assertion for this.  Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED
whenthe HEAP_XMAX_IS_MULTI bit is set."  I added an assertion for that, too.  Both new assertions are in
RelationPutHeapTuple(). I'm not sure if that is the best place to put the assertion, but I am confident that the
assertionneeds to only check tuples destined for disk, as in memory tuples can and do violate the assertion. 
>
> Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.
>
> > What about the other check, like hint bit is saying the
> > transaction is committed but actually as per the clog the status is
> > something else.  I think in general processing it is hard to check
> > such things in backend no? because if the hint bit is set saying that
> > the transaction is committed then we will directly check its
> > visibility with the snapshot.  I think a corruption checker may be a
> > good tool for catching such anomalies.
>
> I already made some design changes to this patch to avoid taking the CLogTruncationLock too often.  I'm happy to
incorporatethis idea, but perhaps you could provide a design on how to do it without all the extra locking?  If not, I
cantry to get this into v8 as an optional check, so users can turn it on at their discretion.  Having the check enabled
bydefault is probably a non-starter. 

Okay,  even I can't think a way to do it without an extra locking.

I have looked into 0001 patch and I have a few comments.

1.
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;

Isn't it a good idea to verify the Redirected Itemtid?  Because we
will still access the redirected item id to find the
actual tuple from the index scan.  Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.

2.

+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }

I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.


3.
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;

So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1,  I
did not understand that why we need to reset it to -1, do we ever
check for that.  We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.

4.
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }

I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.

5.

+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+   ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
....
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+    "oldest valid xid %u ",
+    ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }

Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg_regress cleans up tablespace twice.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: tag typos in "catalog.sgml"