Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uDfgZcJxTcmnRoiL8zNfRht-iMGb51_CyGJ6ybxjT4H2w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby
Список pgsql-hackers
On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> Thank you for updating the patch. I have some comments:
>
> ---
> +        latestWalEnd = GetWalRcvLatestWalEnd();
> +        if (remote_slot->confirmed_lsn > latestWalEnd)
> +        {
> +                elog(ERROR, "exiting from slot synchronization as the
> received slot sync"
> +                         " LSN %X/%X for slot \"%s\" is ahead of the
> standby position %X/%X",
> +                         LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> +                         remote_slot->name,
> +                         LSN_FORMAT_ARGS(latestWalEnd));
> +        }
>
> IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> typically the primary server's flush position and doesn't mean the LSN
> where the walreceiver received/flushed up to.

yes. I think it makes more sense to use something which actually tells
flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
enabled the slot-sync feature in a running standby, in that case we
are all good (flushedUpto is the same as actual flush-position
indicated by LogstreamResult.Flush). But if I restart standby, then I
observed that the startup process sets flushedUpto to some value 'x'
(see [1]) while when the wal-receiver starts, it sets
'LogstreamResult.Flush' to another value (see [2]) which is always
greater than 'x'. And we do not update flushedUpto with the
'LogstreamResult.Flush' value in walreceiver until we actually do an
operation on primary. Performing a data change on primary sends WALs
to standby which then hits XLogWalRcvFlush() and updates flushedUpto
same as LogstreamResult.Flush. Until then we have a situation where
slots received on standby are ahead of flushedUpto and thus slotsync
worker keeps one erroring out. I am yet to find out why flushedUpto is
set to a lower value than 'LogstreamResult.Flush' at the start of
standby.  Or maybe am I using the wrong function
GetWalRcvFlushRecPtr() and should be using something else instead?

[1]:
Startup process sets 'flushedUpto' here:
ReadPageInternal-->XLogPageRead-->WaitForWALToBecomeAvailable-->RequestXLogStreaming

[2]:
Walreceiver sets 'LogstreamResult.Flush' here but do not update
'flushedUpto' here:
WalReceiverMain():  LogstreamResult.Write = LogstreamResult.Flush =
GetXLogReplayRecPtr(NULL)


> Does it really happen
> that the slot's confirmed_flush_lsn is higher than the primary's flush
> lsn?

It may happen if we have not configured standby_slot_names on primary.
In such a case, slots may get updated w/o confirming that standby has
taken the change and thus slot-sync worker may fetch the slots which
have lsns ahead of the latest WAL position on standby.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Matthias Kuhn
Дата:
Сообщение: Re: Build versionless .so for Android
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Increasing IndexTupleData.t_info from uint16 to uint32