Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uDY+dyj+PBXeeuZ=f2bxh-L_Mv42_sGKaj1shTULHadxQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Jan 22, 2024 at 12:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 19, 2024 at 3:55 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > Thank you for updating the patch. I have some comments:
> > >
> > > ---
> > > +        latestWalEnd = GetWalRcvLatestWalEnd();
> > > +        if (remote_slot->confirmed_lsn > latestWalEnd)
> > > +        {
> > > +                elog(ERROR, "exiting from slot synchronization as the
> > > received slot sync"
> > > +                         " LSN %X/%X for slot \"%s\" is ahead of the
> > > standby position %X/%X",
> > > +                         LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> > > +                         remote_slot->name,
> > > +                         LSN_FORMAT_ARGS(latestWalEnd));
> > > +        }
> > >
> > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> > > typically the primary server's flush position and doesn't mean the LSN
> > > where the walreceiver received/flushed up to.
> >
> > yes. I think it makes more sense to use something which actually tells
> > flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
> > with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
> > enabled the slot-sync feature in a running standby, in that case we
> > are all good (flushedUpto is the same as actual flush-position
> > indicated by LogstreamResult.Flush). But if I restart standby, then I
> > observed that the startup process sets flushedUpto to some value 'x'
> > (see [1]) while when the wal-receiver starts, it sets
> > 'LogstreamResult.Flush' to another value (see [2]) which is always
> > greater than 'x'. And we do not update flushedUpto with the
> > 'LogstreamResult.Flush' value in walreceiver until we actually do an
> > operation on primary. Performing a data change on primary sends WALs
> > to standby which then hits XLogWalRcvFlush() and updates flushedUpto
> > same as LogstreamResult.Flush. Until then we have a situation where
> > slots received on standby are ahead of flushedUpto and thus slotsync
> > worker keeps one erroring out. I am yet to find out why flushedUpto is
> > set to a lower value than 'LogstreamResult.Flush' at the start of
> > standby.  Or maybe am I using the wrong function
> > GetWalRcvFlushRecPtr() and should be using something else instead?
> >
>
> Can we think of using GetStandbyFlushRecPtr()? We probably need to
> expose this function, if this works for the required purpose.

I think we can. For the records, the problem while using flushedUpto
(or GetWalRcvFlushRecPtr()) directly is that it is not set to the
latest flushed position immediately after startup.  It points to some
prior location (perhaps segment or page start) after startup until
some data  is flushed next which then updates it to the latest flushed
position, thus we can not use it directly.  GetStandbyFlushRecPtr()
OTOH takes care of it i.e. it returns correct flushed-location at any
point of time. I have changed v65 to use this one.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: introduce dynamic shared memory registry
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby