Re: Switching XLog source from archive to streaming when primary available

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Switching XLog source from archive to streaming when primary available
Дата
Msg-id CALj2ACUXb1ngQBeagiPBdb_6X2VN_QNFfRFEfy8hguDHbxM85A@mail.gmail.com
обсуждение исходный текст
Ответ на Switching XLog source from archive to streaming when primary available  (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>)
Ответы Re: Switching XLog source from archive to streaming when primary available
Список pgsql-hackers
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Hi Hackers,
>
> When the standby couldn't connect to the primary it switches the XLog source from streaming to archive and continues
inthat state until it can get the WAL from the archive location. On a server with high WAL activity, typically getting
theWAL from the archive is slower than streaming it from the primary and couldn't exit from that state. This not only
increasesthe lag on the standby but also adversely impacts the primary as the WAL gets accumulated, and vacuum is not
ableto collect the dead tuples. DBAs as a mitigation can however remove/advance the slot or remove the restore_command
onthe standby but this is a manual work I am trying to avoid. I would like to propose the following, please let me know
yourthoughts. 
>
> Automatically attempt to switch the source from Archive to streaming when the primary_conninfo is set after replaying
'N'wal segment governed by the GUC retry_primary_conn_after_wal_segments 
> when  retry_primary_conn_after_wal_segments is set to -1 then the feature is disabled
> When the retry attempt fails, then switch back to the archive

I've gone through the state machine in WaitForWALToBecomeAvailable and
I understand it this way: failed to receive WAL records from the
primary causes the current source to switch to archive and the standby
continues to get WAL records from archive location unless some failure
occurs there the current source is never going to switch back to
stream. Given the fact that getting WAL from archive location causes
delay in production environments, we miss to take the advantage of the
reconnection to primary after previous failed attempt.

So basically, we try to attempt to switch to streaming from archive
(even though fetching from archive can succeed) after a certain amount
of time or WAL segments. I prefer timing-based switch to streaming
from archive instead of after a number of WAL segments fetched from
archive. Right now, wal_retrieve_retry_interval is being used to wait
before switching to archive after failed attempt from streaming, IMO,
a similar GUC (that gets set once the source switched from streaming
to archive and on timeout it switches to streaming again) can be used
to switch from archive to streaming after the specified amount of
time.

Thoughts?

Regards,
Bharath Rupireddy.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Building Postgres with lz4 on Visual Studio
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Missing can't-assign-to-constant checks in plpgsql