Re: Corrupted btree index on HEAD because of covering indexes

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Corrupted btree index on HEAD because of covering indexes
Дата
Msg-id CAH2-Wzncb9OZ8BSMEGPVwi3m9oYF=g1Yhy-M4f0DyxkQTLoWCg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Corrupted btree index on HEAD because of covering indexes  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
On Thu, Apr 19, 2018 at 11:41 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>> heard of people using bt_index_parent_check() in production, but only
>> when they already knew that their database was corrupt, and wanted to
>> isolate the problem. I imagine that people that use
>> bt_index_parent_check() in production do so because they want as much
>> information as possible, and are willing to do whatever it takes to
>> get more information.
>
> That why I think we need improve amcheck - ideally, it should not miss any
> mistakes in tree structure.

I have finished writing a prototype that works, and passes all tests.
Without the fix, this is what the complaint from VACUUM looks like on
my machine:

ERROR:  right sibling 265 of block 134 is not next child 395 of block
135 in index "delete_test_table_pkey"

After the final VACUUM in your test case runs (the one that throws
this error), my working copy of amcheck can detect the same issue with
low overhead:

pg@~[20995]=# select bt_index_parent_check('delete_test_table_pkey', true);
ERROR:  index block lacks downlink in index "delete_test_table_pkey"
DETAIL:  Block=265 level=1 page lsn=0/909F4B18.

There is no complaint from amcheck if called before the final VACUUM
in your test case, since that's when the trouble starts (and ends).
It's easy to account for concurrent page splits within amcheck by
checking for BTP_SPLIT_END on the child page that could lack a
downlink. Don't confuse this with checking the BTP_INCOMPLETE_SPLIT
flag, which is what we set on the left half/sibling of a split -- not
the "new" right half/sibling. Of course, the right sibling is the type
of block that can lack a downlink until after the second phase of a
pagesplit (i.e. until it's fully finished), or after the first phase
of page deletion. (As you know, page deletion is like a page split "in
reverse".)

I said concurrent page split, but I really meant incomplete page
split, since a concurrent insert/split is impossible given that
amcheck holds a ShareLock here. (See commit 40dae7ec5.)

Of course, we don't see details about the next level up in the amcheck
error, unlike the VACUUM error. That shouldn't matter, because we're
already verifying a lot about the relationship between blocks at the
next level up, and even between the two levels. This enhancement adds
the one piece we were missing. It could in theory detect other
problems that VACUUM cannot detect, too.

Expect me to post a patch during the work day on Monday or Tuesday
(Pacific time). I need to polish this patch some more. I particularly
need to think about the use of memory for the Bloom filter (right now
I just use work_mem).

> Could you write it? I'm afraid I can't give good explanation why we believe
> that nobody update this page ant it's  high key while page is unlocked but
> pinned.

Okay. I'll come up with something soon.

> Hm, it seems to me, that 350ms is short enough to place it in both core and
> amcheck test. I think, basic functionality should be covered by core tests
> as we test insert/create.

I think you're right. There is really no excuse for not having
thorough code coverage for a module like nbtree.

-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Add read-only param to set_config(...) / SET that effects (at least) customized runtime options
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: Add read-only param to set_config(...) / SET that effects (atleast) customized runtime options