Re: Corrupted btree index on HEAD because of covering indexes

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Corrupted btree index on HEAD because of covering indexes
Дата
Msg-id CAH2-WzmRA7eg2QK73d3Oekp9wYucf9L+1VZ5cvJB+2cy4DoLtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Corrupted btree index on HEAD because of covering indexes  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Corrupted btree index on HEAD because of covering indexes  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I refined the amcheck enhancement quite a bit today. It will not just
> check that a downlink is not missing; It will also confirm that it
> wasn't a legitimately interrupted multi-level deletion, by descending
> to the leaf page to match the leaf high key pointer to the top most
> parent, which should be the target page (the page that lacked a
> downlink according to the new Bloom filter). We need to worry about
> multi-level deletions that are interrupted by an error or a hard
> crash, which presents a legitimate case where there'll be no downlink
> for an internal page in its parent. VACUUM is okay with that, so we
> must be too.

Attached patch lets amcheck detect the issue when
bt_index_parent_check() is called, though only when heapallindexed
verification was requested (that is, only when bt_index_parent_check()
is doing something with a Bloom filter already). The new checks will
probably also detect any possible issue with multi-level page
deletions. The patch tightens up our general expectations around
half-dead and fully deleted pages, which seems necessary but also
independently useful.

I'm using work_mem to constrain the size of the second Bloom filter,
whereas the heapallindexed Bloom filter is constrained by
maintenance_work_mem. This seems fine to me, since we have always used
an additional work_mem budget for spool2 when building a unique index
within nbtsort.c. Besides, it will probably be very common for the
downlink Bloom filter to be less than 1% the size of the first Bloom
filter when we have adequate memory for both Bloom filters (i.e. very
small). I thought about mentioning this work_mem allocation in the
docs, but decided that there was no need, since the CREATE INDEX
spool2 work_mem stuff isn't documented anywhere either.

Note that the "c.relkind = 'i'" change in the docs is about not
breaking the amcheck query when local partitioned indexes happen to be
in use (when the user changed the sample SQL query to not just look at
pg_catalog indexes). See the "Local partitioned indexes and
pageinspect" thread I just started for full details.

The new P_ISDELETED() test within bt_downlink_missing_check() is what
actually detects the corruption that the test case causes, since the
fully deleted leaf page still has a sane top parent block number left
behind (that is, we don't get as far as testing "if
(BTreeTupleGetTopParent(itup) == state->targetblock)"; that's not how
the the leaf page can get corrupt in the test case that Michael
posted). Note that there are also two new similar P_ISDELETED() tests
added to two existing functions (bt_downlink_check() and
bt_check_level_from_leftmost()), but those tests won't detect the
corruption that we saw. They're really there to nail down how we think
about fully deleted pages.

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: partitioning code reorganization
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Problem while setting the fpw with SIGHUP