Re: Corrupted btree index on HEAD because of covering indexes

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Corrupted btree index on HEAD because of covering indexes
Дата
Msg-id CAH2-Wzm5brT8oHnWKda-KWV8LeA+u-B3jCkVjZddqW5BqO0=MA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Corrupted btree index on HEAD because of covering indexes  (Teodor Sigaev <teodor@sigaev.ru>)
Ответы Re: Corrupted btree index on HEAD because of covering indexes  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Corrupted btree index on HEAD because of covering indexes  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
On Thu, Apr 19, 2018 at 9:42 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
> Interesting, contrib/amcheck doesn't find any error in index. Seems, it's
> subject for further improvement.

I think that you're right that this should be detectable by
bt_index_parent_check(). I have an idea about how we can make this
happen with low overhead:

Iff we're readonly (that is, called through bt_index_parent_check()),
use a bloom filter to fingerprint downlink block numbers on each
non-leaf level, starting from the root. On the next level down, check
that we encountered a downlink at the parent level for non-P_IGNORE()
pages. If we didn't, throw an error. This should only need a small
Bloom filter to work well. I suppose we could only do it when
"heapallindexed = true" was specified to bt_index_parent_check(), and
size the Bloom filter based on the same principle as heapallindexed
verification.

This check is correct because the downlink in the parent is atomically
removed when the page (that the downlink points to) is marked
half-dead by VACUUM (and because there cannot be a concurrent VACUUM,
since we have a ShareLock on the relation). If we find a
non-P_IGNORE() page whose block number was not fingerprinted one level
up, then something must be wrong. (We should also throw an error
within bt_downlink_check() if the page turns out to be P_IGNORE(), to
make our testing "symmetrical".)

The really scary thing about corruption like this is not that it leads
to wrong answers to queries. It's that it *doesn't* lead to wrong
answers to queries (I mentioned this issue to Alexander a few weeks
ago, actually). This is true because we'll move right one level down,
maybe more than once, because it just looks like there was a
concurrent page split to index scans.

(There is actually another subtlety about this new
bt_index_parent_check() check, which is legitimate incomplete page
splits that we detect using P_INCOMPLETE_SPLIT(). The proposed check
still seems totally doable, though.)

I would like to go and implement this check now, and if all goes well
I may propose that you commit it to the master branch for v11. I don't
see this as a new feature. I see it as essential testing
infrastructure. What do you think about adding this new check soon?

> Nevertheless, seems, I found. In _bt_mark_page_halfdead() we use truncated
> high key IndexTuple as a storage of blocknumber of top parent to remove. And
> sets BTreeTupleSetNAtts(&trunctuple, 0) - it's stored in ip_posid.
>
> But some later, in _bt_unlink_halfdead_page() we check ItemPointer high key
> with ItemPointerIsValid macro - and it returns false, because offset is
> actually InvalidOffsetNumber - i.e. 0 which was set by BTreeTupleSetNAtts.
> And some wrong decisions are follows, I didn't look at that.

I'm not at all surprised. I strongly suspected that it was some simple
issue with the representation, since the INCLUDE patch didn't actually
change the page deletion algorithm in any way.

> Trivial and naive fix is attached, but for me it looks a bit annoing that we
> store pointer (leafhikey) somewhere inside unlocked page.

Well, it has worked that way for a long time. No reason to change it now.

I also think that we could have better conventional regression test
coverage here.

-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Built-in connection pooling
Следующее
От: Pavan Deolasee
Дата:
Сообщение: Re: VM map freeze corruption