Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uB+EHfGpd31L4q=qUfew5H7Uc91UCfykwxgSw5DZd0T2Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Wed, Oct 4, 2023 at 9:56 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 3, 2023 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Tue, Oct 3, 2023 at 7:56 PM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 10/3/23 12:54 PM, Amit Kapila wrote:
> > > > > On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand
> > > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >>
> > > > >> On 9/29/23 1:33 PM, Amit Kapila wrote:
> > > > >>> On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
> > > > >>> <bertranddrouvot.pg@gmail.com> wrote:
> > > > >>>>
> > > > >>>
> > > > >>>> - probably open corner cases like: what if a standby is down? would that mean
> > > > >>>> that synchronize_slot_names not being send to the primary would allow the decoding
> > > > >>>> on the primary to go ahead?
> > > > >>>>
> > > > >>>
> > > > >>> Good question. BTW, irrespective of whether we have
> > > > >>> 'standby_slot_names' parameters or not, how should we behave if
> > > > >>> standby is down? Say, if 'synchronize_slot_names' is only specified on
> > > > >>> standby then in such a situation primary won't be even aware that some
> > > > >>> of the logical walsenders need to wait.
> > > > >>
> > > > >> Exactly, that's why I was thinking keeping standby_slot_names to address
> > > > >> this scenario. In such a case one could simply decide to keep or remove
> > > > >> the associated physical replication slot from standby_slot_names. Keep would
> > > > >> mean "wait" and removing would mean allow to decode on the primary.
> > > > >>
> > > > >>> OTOH, one can say that users
> > > > >>> should configure 'synchronize_slot_names' on both primary and standby
> > > > >>> but note that this value could be different for different standby's,
> > > > >>> so we can't configure it on primary.
> > > > >>>
> > > > >>
> > > > >> Yeah, I think that's a good use case for standby_slot_names, what do you think?
> > > > >>
> > > > >
> > > > > But, even if we keep 'standby_slot_names' for this purpose, the
> > > > > primary doesn't know the value of 'synchronize_slot_names' once the
> > > > > standby is down and or the primary is restarted. So, how will we know
> > > > > which logical WAL senders needs to wait for 'standby_slot_names'?
> > > > >
> > > >
> > > > Yeah right, I also think we'd need:
> > > >
> > > > - synchronize_slot_names on both primary and standby
> > > >
> > > > But now we would need to take care of different standby having different values (
> > > > as you said up-thread)....
> > > >
> > > > Thinking out loud: What about a single GUC on the primary (not standby_slot_names nor
> > > > synchronize_slot_names) but say logical_slots_wait_for_standby that could be a list of say
> > > > "logical_slot_name:physical_slot".
> > > >
> > > > I think this GUC would help us define each walsender behavior (should the standby(s)
> > > > be up or down):
> > > >
> > >
> > > It may help in defining the walsender's behaviour better for sure. But
> > > the problem I see once we start defining sync-slot-names on primary
> > > (in any form whether as independent GUC or as above mapping GUC) is
> > > that it needs to be then in sync with standbys, as each standby for
> > > sure needs to maintain its own sync-slot-names GUC to make it aware of
> > > what all it needs to sync.
> >
> > Yes, I also think so. Also, defining such a GUC where user wants to
> > sync all the slots which would normally be the case would be a night
> > mare for the users.
> >
> > >
> > > This brings us to the original question of
> > > how do we actually keep these configurations in sync between primary
> > > and standby if we plan to maintain it on both?
> > >
> > >
> > > > - don't wait if its associated logical_slot is not listed in this GUC
> > > > - or wait based on its associated "list" of mapped physical slots (would probably
> > > > have to deal with the min restart_lsn for all the corresponding mapped ones).
> > > >
> > > > I don't think we can avoid having to define at least one GUC on the primary (at least to
> > > > handle the case of standby(s) being down).
> > > >
> >
> > How about an alternate scheme where we define sync_slot_names on
> > standby but then store the physical_slot_name in the corresponding
> > logical slot (ReplicationSlotPersistentData) to be synced? So, the
> > standby will send the list of 'sync_slot_names' and the primary will
> > add the physical standby's slot_name in each of the corresponding
> > sync_slot. Now, if we do this then even after restart, we should be
> > able to know for which physical slot each logical slot needs to wait.
> > We can even provide an SQL API to reset the value of
> > standby_slot_names in logical slots as a way to unblock decoding in
> > case of emergency (for example, corresponding when physical standby
> > never comes up).
> >
>
>
> Looks like a better approach to me. It solves most of the pain points like:
> 1) Avoids the need of multiple GUCs
> 2) Primary and standby need not to worry to be in sync if we maintain
> sync-slot-names GUC on both
> 3) User still gets the flexibility to remove a standby from wait-lost
> of primary's logical-walsenders' using reset SQL API.
>
> Now some initial thoughts:
> 1) Since each logical slot could be needed to be synched by multiple
> physical-standbys, so in ReplicationSlotPersistentData, we need to
> hold a list of standby's name. So this brings us to question as in how
> much shall we allocate initially in shared-memory? Shall it be for
> max_replication_slots (worst case scenario) in each
> ReplicationSlotPersistentData to hold physical-standby names?
>
> 2) If standby sends '*', then we need to update each logical-slot with
> that standby-name. Or do we have better way to deal with '*'? Need to
> think more on this.
>
> JFYI, on the similar line, currently in ReplicationSlotPersistentData,
> we are maintaining a flag for slot-sync feature which is:
>
>         bool            synced; /* Is this a slot created by a
> sync-slot worker? */
>
> This flag currently holds significance only on physical-standby. This
> has been added to distinguish between a slot created by user for
> logical decoding purpose and the ones being synced from primary. It is
> needed when we have to choose obsolete slots (synced ones) to drop on
> standby or block get_changes on standby for synced slots. It can be
> reused on primary for above approach if needed.
>
> thanks
> Shveta


The most simplistic approach would be:

1) maintain standby_slot_names GUC on primary
2) maintain synchronize_slot_names GUC on physical standby alone.

On primary, let all logical-walsenders wait on physical-standbys
configured in standby_slot_names GUC. This will work and will avoid
all the complexity involved in designs discussed above. But  this
simplistic approach comes with disadvantages like below:

1) Even if the associated slot of logical-walsender is not part of
synchronize_slot_names of any of the physical-standbys, it is still
waiting for all the configured standbys to finish.
2) If associated slot of logical walsender is part of
synchronize_slot_names of standby1, it is still waiting on standby2,3
etc to finish i.e. waiting on rest of the standbys configured in
standby_slot_names which have not even marked that logical slot in
their synchronize_slot_names.

So we need to weigh our options here.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby