Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Дата
Msg-id CAEze2Wj7O5tnM_U151Baxr5ObTJafwH=71_JEmgJV+6eBgjL7g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-bugs
On Fri, 29 Oct 2021 at 20:17, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Oct 29, 2021 at 6:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > I can propose the debugging patch to reproduce the issue that replaces
> > the hang with the assert and modifies a pair of crash-causing test
> > scripts to simplify the reproducing. (Sorry, I have no time now to prune
> > down the scripts further as I have to leave for a week.)
>
> This bug is similar to the one fixed in commit d9d8aa9b. And so I
> wonder if code like GlobalVisTestFor() is missing something that it
> needs for partitioned tables.

Without `autovacuum = off; fsync = off` I could not replicate the
issue in the configured 10m time window; with those options I did get
the reported trace in minutes.

I think that I also have found the culprit, which is something we
talked about in [0]: GlobalVisState->maybe_needed was not guaranteed
to never move backwards when recalculated, and because vacuum can
update its snapshot bounds (heap_prune_satisfies_vacuum ->
GlobalVisTestIsRemovableFullXid -> GlobalVisUpdate) this maybe_needed
could move backwards, resulting in the observed behaviour.

It was my understanding based on the mail conversation that Andres
would fix this observed issue too while fixing [0] (whose fix was
included with beta 2), but apparently I was wrong; I can't find the
code for 'maybe_needed'-won't-move-backwards-in-a-backend.

I (again) propose the attached patch, which ensures that this
maybe_needed field will not move backwards for a backend. It is
based on 14, but should be applied on head as well, because it's
lacking there as well.

Another alternative would be to replace the use of vacrel->OldestXmin
with `vacrel->vistest->maybe_needed` in lazy_scan_prune, but I believe
that is not legal in how vacuum works (we cannot unilaterally decide
that we want to retain tuples < OldestXmin).

Note: After fixing the issue with retreating maybe_needed I also hit
your segfault, and I'm still trying to find out what the source of
that issue might be. I do think it is an issue seperate from stuck
vacuum, though.


Kind regards,

Matthias van de Meent

[0]
https://www.postgresql.org/message-id/flat/20210609184506.rqm5rikoikm47csf%40alap3.anarazel.de#e9d55b5cfff34238a24dc85c8c75a46f

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types