Re: BUG #17245: Index corruption involving deduplicated entries

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: BUG #17245: Index corruption involving deduplicated entries
Дата	28 октября 2021 г. 01:36:11
Msg-id	CAH2-Wzm0GZnNgNeHx-W-4ogR1RB1gGk_efMiWuinXFyMWzjb=w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #17245: Index corruption involving deduplicated entries (Kamigishi Rei <iijima.yun@koumakan.jp>)
Ответы	Re: BUG #17245: Index corruption involving deduplicated entries Re: BUG #17245: Index corruption involving deduplicated entries Re: BUG #17245: Index corruption involving deduplicated entries
Список	pgsql-bugs

Дерево обсуждения

On Wed, Oct 27, 2021 at 12:00 AM Kamigishi Rei <iijima.yun@koumakan.jp> wrote:
> page and transcode.

Thanks!

One more question: Did you use pg_restore with -j (jobs) to make the
restore faster?

Just one more request: Can I have some more pages from the same
indexes? You have already provided me with
"mediawiki.transcode_key_idx.block1.page". But could you now also
provide me with blocks 2, 8, and 16 from the same index? This is a
pretty random choice. I just want to see if they all contain corrupt
posting list tuples, like page 1 (I suspect that they all will).

Note that I pushed some additional hardening for posting list splits
today. This will be in Postgres 14.1. The new code won't fix the
problem, but it will detect problems earlier, limiting the damage.

> > If my hypothesis is true, then we might expect problems to not even go
> > away following a REINDEX. Or maybe they would go away sometimes, but
> > not other times.
>
> REINDEX *seems* to have helped in my case (tested with both
> page_main_title and the three that showed up in pg_amcheck output). I am
> considering writing a script that would SELECT COUNT(*) for each
> page_title to detect the next possible page_main_title index corruption.

I took another look at your 3 page images today. I noticed some new things.

Out of your 3 pages, I saw that 2 pages were the leftmost leaf page in
the indexes (mediawiki.transcode_key_idx.block1.page and
mediawiki.page_len.block1.page), and 1 was a very low block number
(block 14 from mediawiki.page_redirect_namespace_len.block14.page).
Also, the amount of posting list corruption on each page was higher
than I first realized. For example, almost all of the tuples on the
page mediawiki.page_len.block1.page had duplicate TIDs (corruption).

We know that the underlying table (for 2 out of the 3 indexes) is at
least ~720 blocks, so these are not tiny indexes. And yet amcheck
typically complains on the first leaf page it looks at, or not at all.
So the particular indexes I've taken a look at probably completely
full of corrupt posting lists. And so it seems as if most indexes have
no detectable corruption at all, but at least a few are almost totally
corrupt. It seems to be either one extreme or the other.

Another thing that I noticed today is that all 3 of the pages (from 3
indexes) looked like they have probably never had a regular
deduplication pass, even once. It's also unlikely that they were ever
split. I can tell because the limit on posting list size is different
during CREATE INDEX. The pages looked like there were a few posting
list splits, which is normal when you have some non-HOT updates, and
probably not relevant. These pages look like CREATE INDEX was run
recently, despite all the corruption. Plus Herman (CC'd) has said that
sometimes CREATE INDEX doesn't correct his own similar problem on
Postgres 14.

And so this still very much looks like a bug in CREATE INDEX,
somewhere. This suspicion started with the fact that corruption
appeared concentrated in only 2 (really just 1) tables.

I'll take another guess: I wonder if commit 56788d21 ("Allocate
consecutive blocks during parallel seqscans") is somehow causing
parallel CREATE INDEX to produce wrong results. The author (David
Rowley) is CC'd. Does a bug in that commit seem like it might explain
this problem, David? Might parallel workers in a parallel index build
somehow become confused about which worker is supposed to scan which
heap block, leading to duplicate TIDs in the final index?

I notice that the faulty duplicate TIDs tend to be high-ish heap block
offset numbers, and often appear in groups of 3 or even 4 duplicates.
Does that seem like it might be consistent with my (admittedly not
strong) theory about commit 56788d21 being behind all this.

-- 
Peter Geoghegan

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #17245: Index corruption involving deduplicated entries