Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Дата	8 июня 2021 г. 23:52:40
Msg-id	CAH2-Wzk_X17FLaETCiOx5krvoY3kTSTncBSoeJ9BKEHfGtr3sQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic (Justin Pryzby <pryzby@telsasoft.com>)
Ответы	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jun 8, 2021 at 12:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > They're running this:
> > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
> > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"
> ...
> > $ date -d @1623121264
> > Mon Jun  7 22:01:04 CDT 2021

Perhaps reindex was waiting on the VACUUM process to finish, while
VACUUM was (in effect) busy waiting on the REINDEX to finish. If the
bug is hard to reproduce then it might just be that the circumstances
that lead to livelock require that things line up exactly and the heap
page + XID level -- which I'd expect to be tricky to reproduce. As I
said upthread, I'm almost certain that the "goto retry" added by
commit 8523492d is a factor here -- that is what I mean by busy
waiting inside VACUUM. It's possible that busy waiting like this
happens much more often than an actual undetected deadlock/livelock.
We only expect to "goto retry" in the event of a concurrently aborting
transaction.

The other bug that you reported back in July of last year [1] (which
involved a "REINDEX INDEX pg_class_tblspc_relfilenode_index") was
pretty easy to recreate, just by running the REINDEX in a tight loop.
Could you describe how tricky it is to repro this issue now?

If you instrument the "goto retry" code added to lazy_scan_prune() by
commit 8523492d, then you might notice that it is hit in contexts that
it was never intended to work with. If you can reduce reproducing the
problem to reproducing hitting that goto in the absence of an aborted
transaction, then it might be a lot easier to produce a simple repro.
The livelock/deadlock is probably nothing more than the worst
consequence of the same issue, and so may not need to be reproduced
directly to fix the issue.

[1] https://www.postgresql.org/message-id/CAH2-WzkjjCoq5Y4LeeHJcjYJVxGm3M3SAWZ0%3D6J8K1FPSC9K0w%40mail.gmail.com
-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alvaro Herrera
Дата: 08 июня 2021 г., 23:51:57
Сообщение: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Следующее

От: Justin Pryzby
Дата: 09 июня 2021 г., 00:23:37
Сообщение: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Предыдущее

Следующее