Re: Unnecessary delay in streaming replication due to replay lag

Поиск
Список
Период
Сортировка
От Asim R P
Тема Re: Unnecessary delay in streaming replication due to replay lag
Дата
Msg-id CANXE4TewY1WNgu5J5ek38RD+2m9F2K=fgbWubjv9yG0BeyFxRQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unnecessary delay in streaming replication due to replay lag  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Unnecessary delay in streaming replication due to replay lag
Список pgsql-hackers
On Fri, Jan 17, 2020 at 11:08 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote:
> >
> >     0001 - TAP test to demonstrate the problem.
>
> There is no real need for debug_replay_delay because we have already
> recovery_min_apply_delay, no?  That would count only after consistency
> has been reached, and only for COMMIT records, but your test would be
> enough with that.
>

Indeed, we didn't know about recovery_min_apply_delay.  Thank you for
the suggestion, the updated test is attached.

>
> > This is a POC, we are looking for early feedback on whether the
> > problem is worth solving and if it makes sense to solve if along this
> > route.
>
> You are not the first person interested in this problem, we have a
> patch registered in this CF to control the timing when a WAL receiver
> is started at recovery:
> https://commitfest.postgresql.org/26/1995/
> https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f65e@postgrespro.ru
>

Great to know about this patch and the discussion.  The test case and
the part that saves next start point in control file from our patch
can be combined with Konstantin's patch to solve this problem.  Let me
work on that.

> I am pretty sure that we should not change the default behavior to
> start the WAL receiver after replaying everything from the archives to
> avoid copying some WAL segments for nothing, so being able to use a
> GUC switch should be the way to go, and Konstantin's latest patch was
> using this approach.  Your patch 0002 adds visibly a third mode: start
> immediately on top of the two ones already proposed:
> - Start after replaying all WAL available locally and in the
> archives.
> - Start after reaching a consistent point.

Consistent point should be reached fairly quickly, in spite of large
replay lag.  Min recovery point is updated during XLOG flush and that
happens when a commit record is replayed.  Commits should occur
frequently in the WAL stream.  So I do not see much value in starting
WAL receiver immediately as compared to starting it after reaching a
consistent point.  Does that make sense?

That said, is there anything obviously wrong with starting WAL receiver
immediately, even before reaching consistent state?  A consequence is
that WAL receiver may overwrite a WAL segment while startup process is
reading and replaying WAL from it.  But that doesn't appear to be a
problem because the overwrite should happen with identical content as
before.

Asim
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mahendra Singh Thalor
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum
Следующее
От: Kohei KaiGai
Дата:
Сообщение: Re: TRUNCATE on foreign tables