Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Дата
Msg-id CALj2ACV+acrnWUdwSNUXNzXLNA+kFkfuT8t=wiMoPhBWKrWUeA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Andres Freund <andres@anarazel.de>)
Ответы Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-hackers
On Fri, Jan 14, 2022 at 1:08 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote:
> > Currently the server is erroring out when unable to remove/parse a
> > logical rewrite file in CheckPointLogicalRewriteHeap wasting the
> > amount of work the checkpoint has done and preventing the checkpoint
> > from finishing.
>
> This seems like it'd make failures to remove the files practically
> invisible. Which'd have it's own set of problems?
>
> What motivated proposing this change?

We had an issue where there were many mapping files generated during
the crash recovery and end-of-recovery checkpoint was taking a lot of
time. We had to manually intervene and delete some of the mapping
files (although it may not sound sensible) to make end-of-recovery
checkpoint faster. Because of the race condition between manual
deletion and checkpoint deletion, the unlink error occurred which
crashed the server and the server entered the recovery again wasting
the entire earlier recovery work.

In summary, with the changes (emitting LOG-only messages for unlink
failures and continuing with the other files) proposed for
CheckPointLogicalRewriteHeap in this thread and the existing code in
CheckPointSnapBuild, I'm sure it will help not waste the recovery
that's has been done in case  unlink fails for any reasons.

Regards,
Bharath Rupireddy.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: missing indexes in indexlist with partitioned tables
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: pg_replslotdata - a tool for displaying replication slot information