Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles

Поиск

Список

Период

Сортировка

От	Nitin Motiani
Тема	Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles
Дата	9 сентября 18:03:42
Msg-id	CAH5HC95Gmmb=f3LxHLw-x5wSae7A-9uNasUL0wU8V0HaKmfGfw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles (Fujii Masao <masao.fujii@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Sep 9, 2025 at 1:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Mon, Sep 8, 2025 at 6:33 PM Nitin Motiani <nitinmotiani@google.com> wrote:
> >
> > Hi Hackers,
> >
> > I'd like to propose a patch to allow accepting connections post recovery without waiting for the removal of old
xlogfiles. 
>
> As another idea, could crash recovery avoid waiting for the end-of-recovery
> checkpoint itself to finish, similar to archive recovery? In other words,
> crash recovery would write the end-of-recovery WAL record and request
> a checkpoint, but not block until it completes. Thought?
>

Thanks for the feedback Fujii. I'll look into this. Although based on
Dilip's reply it is probably not feasible.

> One concern, though: in your case, the first checkpoint after crash recovery
> could take a very long time, since it needs to remove a large number of
> WAL files. This could delay subsequent checkpoints beyond checkpoint_timeout.
> If so, perhaps we'd need to limit how many WAL files a single checkpoint
> can remove.
>

The limiting of WAL files is something we only want to do for this
checkpoint or in general for all checkpoints? A couple of thoughts on
these options :

1. If we only do it for the post end-of-recovery checkpoint, we will
have to add special handling for that case and perhaps that reduces
the simplicity of this approach. Also if we just do it for the first
checkpoint after recovery, a future checkpoint might again spend a lot
of time removing these files and delay subsequent checkpoints.
2. We can do it for all checkpoints but that can cause the bloat to
last for a far longer period.

One alternative might be to provide a guc to set the num/size of wal
files or a timeout for this step which would require some tuning from
the users. Also what do you think of the simple method of skipping
removal of files at recovery time and let the future checkpoints take
care of it?

One reason I went with this solution over the others was that in the
current state, the system is down for all the time of removal of
files. But with this, the only thing which might be delayed is the
checkpoint and that seems like an improvement. But it would be great
to get your thoughts on this and the other alternatives.

Thanks & Regards,
Nitin Motiani
Google

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles