Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles
От | Amit Kapila |
---|---|
Тема | Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles |
Дата | |
Msg-id | CAA4eK1LANwLdEhavTfTtmOD8LJ8uUoMY7FtPX_3YF7ge=Z7TcA@mail.gmail.com обсуждение исходный текст |
Ответ на | [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles (Nitin Motiani <nitinmotiani@google.com>) |
Ответы |
Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles
|
Список | pgsql-hackers |
On Mon, Sep 8, 2025 at 3:03 PM Nitin Motiani <nitinmotiani@google.com> wrote: > > I'd like to propose a patch to allow accepting connections post recovery without waiting for the removal of old xlog files. > > Why : We have seen instances where the crash recovery takes very long (tens of minutes to hours) if a large number of accumulatedWAL files need to be cleaned up (eg : Cleaning up 2M old WAL files took close to 4 hours). > > This WAL accumulation is usually caused by : > > 1. Inactive replication slot > 2. PITR failing to keep up > > In the above cases when the resolution (deleting inactive slot/disabling PITR) is followed by a crash (before checkpointcould run), we see the recovery take a very long time. Note that in these cases the actual WAL replay is done relativelyquickly and most of the delay is due to RemoveOldXlogFiles(). > Isn't it better to fix the reasons for WAL accumulation? Because even without recovery, this can fill up the disk. For example, one can use idle_replication_slot_timeout for inactive slots. Similarly, we can see what leads to slow PITR and try to avoid that. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: