Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uC6t6hZVrkDM9RErCie2-rM7EETqx+AHcjAVKiB1JzYQA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Fri, Dec 22, 2023 at 7:59 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote:
> > PFA v53. Changes are:
>
> Thanks!
>
> > patch002:
> > 2) Addressed comments in [2] for v52-002.
> > 3) Fixed CFBot failure. The failure was caused by an assert in
> > wait_for_primary_slot_catchup() for null confirmed_lsn received. In
> > wait_for_primary_slot_catchup(), we had an assumption that if
> > restart_lsn is valid and 'conflicting' is also false, then we must
> > have non-null confirmed_lsn. But this is not true. It is possible to
> > get null values for confirmed_lsn and catalog_xmin if on the primary
> > server the slot is just created with a valid restart_lsn and slot-sync
> > worker has fetched the slot before the primary server could set valid
> > confirmed_lsn and catalog_xmin. In
> > pg_create_logical_replication_slot(), there is a small window between
> > CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
> > restart_lsn and DecodingContextFindStartpoint() which sets
> > confirmed_lsn. If the slot-sync worker fetches the slot in this
> > window, confirmed_lsn received will be NULL. Corrected the code to
> > remove assert and added one additional condition that confirmed_lsn
> > should be valid before moving the slot to 'r'.
> >
>
> Looking at v53-0002 commit message:
>
> It states:
>
> "
> If a logical slot on the primary is valid but is invalidated on the standby,
> then that slot is dropped and recreated on the standby in next sync-cycle.
> "
>
> and one of the reasons mentioned is:
>
> "
>     - The primary changes wal_level to a level lower than logical.
> "
>
> I think that as long at there is still logical replication slot on the primary
> that should not be possible. The primary should fail to start with messages like:
>
> "
> 2023-12-22 14:06:09.281 UTC [31824] FATAL:  logical replication slot "logical_slot" exists, but wal_level < logical
> "

Yes, right. It fails in such a case.

>
> Now, if:
>
> - The standby is shutdown
> - All the logical replication slots are removed on the primary
> - wal_level is set to < logical on the primary and it is restarted
>
> Then when the standby starts, the "synced" slots will be invalidated and later
> removed but not re-created on the next sync-cycle (because they don't exist
> anymore on the primary).
>
> Worth to reword a bit that part?

yes, will change these details. Thanks!

> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nazir Bilal Yavuz
Дата:
Сообщение: Re: Show WAL write and fsync stats in pg_stat_io
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby