Re: BUG #17741: vacuum process hangs after pg_surgery manipulations

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: BUG #17741: vacuum process hangs after pg_surgery manipulations
Дата
Msg-id CAD21AoBYvTfc9E+3p6ecN2n=UsftggWaQiZo1xtYnObQ-uTiQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17741: vacuum process hangs after pg_surgery manipulations  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: BUG #17741: vacuum process hangs after pg_surgery manipulations
Список pgsql-bugs
On Tue, Jan 17, 2023 at 12:37 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Jan-09, PG Bug reporting form wrote:
>
> > On the REL_15_STABLE, you can hang vacuum freeze. Maybe this is not
> > desired?
> > https://www.postgresql.org/docs/current/pgsurgery.html
> >
> > reproduce script:
> > create extension pg_surgery;
>
> Using pg_surgery is the equivalent of introducing corruption in your
> data.  It has, of course, completely valid uses, but if you break the
> system while using it, it's on you to fix it.
>
> The pg_surgery documentation you cite states:
>
> : These functions are unsafe by design and using them may corrupt (or
> : further corrupt) your database.
>
> So, you've been warned.

While this is completely true and I agree, can we improve this
situation somewhat so that it ends up with an error instead of getting
hanged?

In this case, the tuple with a = 1, the root of the HOT chain, was
killed, and the tuple with a = 2 was heap-only tuple and HOT-updated.
In heap_page_prune(), we normally can prune the tuple with a = 2 as
part of pruning its chain, but since the root tuple was already killed
we could not prune this tuple. Then, we ended up retrying
heap_page_prune() since we saw as if the tuple became dead since
heap_page_prune() looked. Normally retrying heap_page_prune() works
but in this case since we didn't have the root tuple it misses again,
and gets hanged after all. I think that we didn't have this hang
before 8523492d4e3 even in the same corruption case. One idea is to
improve this situation is that we have a sanity check that we have
retired due to the same tuple.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Следующее
От: "Sam.Mesh"
Дата:
Сообщение: index not used for bigint without explicit cast