Re: amcheck assert failure

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: amcheck assert failure
Дата
Msg-id CAH2-WzmkurhCqnyLHxk0VkOZqd49+ZZsp1xAJOg7j2x7dmp_XQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: amcheck assert failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: amcheck assert failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: amcheck assert failure  (Grigory Smolkin <g.smolkin@postgrespro.ru>)
Список pgsql-bugs
On Mon, Apr 22, 2019 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Nonetheless ... do we really want an assertion failure rather than
> some more-mundane error report?  Seems like the point of amcheck
> is to detect data corruption, so I'd rather get an ERROR than an
> assertion (which would not fire in production builds anyway).

(Apologies for sending this a second time, Tom -- somehow failed to CC
the list the first time)

I think that the size cross-check must have failed to catch the problem:

        /*
         * lp_len should match the IndexTuple reported length exactly, since
         * lp_len is completely redundant in indexes, and both sources of
         * tuple length are MAXALIGN()'d.  nbtree does not use lp_len all that
         * frequently, and is surprisingly tolerant of corrupt lp_len fields.
         */
        if (tupsize != ItemIdGetLength(itemid))
            ereport(ERROR,
                    (errcode(ERRCODE_INDEX_CORRUPTED),
                     errmsg("index tuple size does not equal lp_len in
index \"%s\"",
                            RelationGetRelationName(state->rel)),
                     errdetail_internal("Index tid=(%u,%u) tuple
size=%zu lp_len=%u page lsn=%X/%X.",
                                        state->targetblock, offset,
                                        tupsize, ItemIdGetLength(itemid),
                                        (uint32) (state->targetlsn >> 32),
                                        (uint32) state->targetlsn),
                     errhint("This could be a torn page problem.")));

It seems like the length from tupsize (IndexTupleSize()) happened to
match the redundant ItemIdGetLength()-reported size from the line
pointer. No idea how that could actually be possible, since the value
returned by ItemIdGetLength() would be 12592. What are the chances of
that accidentally matching what the wild itup pointer reported as its
IndexTupleSize()?

The only explanation I can think of is that the assertion failure was
from the core code -- _bt_check_natts() is itself called within core
code assertions, so it's just about possible that that was how this
happened, before amcheck was even called. Doesn't seem like a very
good explanation. Do you think that that's possible, Grigory?

In any case, you're clearly right that amcheck could be more defensive
here. My pg_hexedit tool detected that there was an LP_UNUSED item
pointer with storage (i.e. whose lp_len > 0), which is a check that
amcheck can and should perform, but doesn't. Especially because it
would prevent a deference of a likely-wild pointer (IndexTuple).

We could:

* Throw an error when any line item reports lp_len == 0.

* Throw an error when there is a line item that's either LP_UNUSED, or
LP_REDIRECT. The corrupt page sent by Grigory had both.

What do you think of the idea of committing some simple checks to do
this with contrib/amcheck on v12?

-- 
Peter Geoghegan



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: amcheck assert failure
Следующее
От: Tom Lane
Дата:
Сообщение: Re: amcheck assert failure