Re: BUG #17245: Index corruption involving deduplicated entries

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17245: Index corruption involving deduplicated entries
Дата
Msg-id 20211029011923.utmolntkasenzreh@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17245: Index corruption involving deduplicated entries  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17245: Index corruption involving deduplicated entries  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
Hi,

It's not the cause of this problem, but I did find a minor issue: the retry
path in lazy_scan_prune() looses track of the deleted tuple count when
retrying.

The retry codepath also made me wonder if there could be problems if we do
FreezeMultiXactId() multiple times due to retry. I think we can end up
creating multiple multixactids for the same tuple (if the members change,
which is likely in the retry path). But that should be fine, I think.


On 2021-10-28 16:04:44 -0700, Peter Geoghegan wrote:
> > Didn't 14 change the logic when index vacuums are done? That could cause
> > previously existing issues to manifest with a higher likelihood.
>
> I don't follow. The new logic that skips index vacuuming kicks in 1)
> in an anti-wraparound vacuum emergency, and 2) when there are very few
> LP_DEAD line pointers in the heap. We can rule 1 out, I think, because
> the XIDs we see are in the low millions, and our starting point was a
> database that was upgraded via a dump and reload.

Right.


> The second criteria for skipping index vacuuming (the "less than 2% of
> heap pages have any LP_DEAD items" thing) might well have been hit on
> these tables -- it is after all very common. But I don't see how that
> could matter. We're never going to get to a code path inside
> vacuumlazy.c that sets LP_DEAD items from VACUUM's dead_tuples array
> to LP_UNUSED (how could reached such a code path without also index
> vacuuming, given the way things are set up inside lazy_vacuum()?).
> We're always going to have the opportunity to do index vacuuming with
> any left-behind LP_DEAD line pointers in the next VACUUM -- right
> after the later VACUUM successfully returns from
> lazy_vacuum_all_indexes().

Shrug. It doesn't seem that hard to believe that repeatedly trying to prune
the same page could unearth some bugs. E.g. via the heap_prune_record_unused()
path in heap_prune_chain().

Hm. I assume somebody checked and verified that old_snapshot_threshold is not
in use? Seems unlikely, but wrongly entering that heap_prune_record_unused()
path could certainly cause issues like we're observing.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #17245: Index corruption involving deduplicated entries
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17253: Composite partition table configuration error