Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Дата
Msg-id CAEze2Wi6WrXo_PajFmwfved1AsU1mdXdA=+NsBqZ5E3sXszX1w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > I'll leave the instance running for a little bit before restarting (or kill-9)
> > > in case someone requests more info.
> >
> > How about dumping the page image out, and sharing it with the list?
> > This procedure should work fine from gdb:
> >
> >
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB
>
> > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> > imagine that you have the instance already stuck in an infinite loop,
> > so what we'll probably see from the page image is the page after the
> > first prune and another no-progress prune.
>
> The cluster was again rejecting with "too many clients already".
>
> I was able to open a shell this time, but it immediately froze when I tried to
> tab complete "pg_stat_acti"...
>
> I was able to dump the page image, though - attached.  I can send you its
> "data" privately, if desirable.  I'll also try to step through this.

Could you attach a dump of lazy_scan_prune's vacrel, all the global
visibility states (GlobalVisCatalogRels, and possibly
GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels),  and
heap_page_prune's PruneState?

Additionally, the locals of lazy_scan_prune (more specifically, the
'offnum' when it enters heap_page_prune) would also be appreciated, as
it helps indicate the tuple.

I've been looking at whatever might have done this, and I'm currently
stuck on lacking information in GlobalVisCatalogRels and the
PruneState.

One curiosity that I did notice is that the t_xmax of the problematic
tuples has been exactly one lower than the OldestXmin. Not weird, but
a curiosity.


With regards,

Matthias van de Meent.


PS. Attached a few of my current research notes, which are mainly
comparisons between heap_prune_satisfies_vacuum and
HeapTupleSatisfiesVacuum.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Decoding speculative insert with toast leaks memory
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic