Re: Synchronizing slots from primary to standby
От | Drouvot, Bertrand |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | 64056e35-1916-461c-a816-26e40ffde3a0@gmail.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Список | pgsql-hackers |
Hi, On 11/10/23 4:31 AM, shveta malik wrote: > On Thu, Nov 9, 2023 at 9:15 PM Drouvot, Bertrand > <bertranddrouvot.pg@gmail.com> wrote: >> Yeah I think so, because there is a time window when one could "use" the slot >> after the promotion and before it is removed. Producing things like: >> >> " >> 2023-11-09 15:16:50.294 UTC [2580462] LOG: dropped replication slot "logical_slot2" of dbid 5 as it was not sync-ready >> 2023-11-09 15:16:50.295 UTC [2580462] LOG: dropped replication slot "logical_slot3" of dbid 5 as it was not sync-ready >> 2023-11-09 15:16:50.297 UTC [2580462] LOG: dropped replication slot "logical_slot4" of dbid 5 as it was not sync-ready >> 2023-11-09 15:16:50.297 UTC [2580462] ERROR: replication slot "logical_slot5" is active for PID 2594628 >> " >> >> After the promotion one was able to use logical_slot5 and now we can now drop it. > > Yes, I was suspicious about this small window which may allow others > to use this slot, that is why I was thinking of putting it in the > promotion flow and thus asked that question earlier. But the slot-sync > worker may end up creating it again in case it has not exited. Sorry, there is a typo up-thread, I meant "After the promotion one was able to use logical_slot5 and now we can NOT drop it.". We can not drop it because it is in use. > So we > need to carefully decide at what all places we need to put 'not-in > recovery' checks in slot-sync workers. In the previous version, > synchronize_one_slot() had that check and it was skipping sync if > '!RecoveryInProgress'. But I have removed that check in v32 thinking > that the slots which the worker has already fetched from the primary, > let them all get synced and exit after that nstead of syncing half > and leaving rest. But now on rethinking, was the previous behaviour > correct i.e. skip sync at that point onward where we see it is no > longer in standby-mode while few of the slots have already been synced > in that sync-cycle. Thoughts? > I think we still need to think/discuss the promotion flow. I think we would need to have the slot sync worker shutdown during the promotion (as suggested by Amit in [1]) but before that let the sync slot worker knows it is now acting during promotion. Something like: - let the sync worker know it is now acting under promotion - do what needs to be done while acting under promotion - shutdown the sync worker That way we would avoid any "risk" of having the sync worker doing something we don't expect while not in recovery anymore. Regarding "do what needs to be done while acting under promotion": - Ensure all slots in 'r' state are synced - drop slots that are in 'i' state Thoughts? [1]: https://www.postgresql.org/message-id/CAA4eK1J2Pc%3D5TOgty5u4bp--y7ZHaQx3_2eWPL%3DVPJ7A_0JF2g%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: