Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAA4eK1+ORhnb2zUUpO_FcgWUUbwvQX5+=tdCehx7x2nB2x8VMg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> +static bool
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
> {
> ...
> + /* Slot ready for sync, so sync it. */
> + else
> + {
> + /*
> + * Sanity check: With hot_standby_feedback enabled and
> + * invalidations handled appropriately as above, this should never
> + * happen.
> + */
> + if (remote_slot->restart_lsn < slot->data.restart_lsn)
> + elog(ERROR,
> + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> + " to remote slot's LSN(%X/%X) as synchronization"
> + " would move it backwards", remote_slot->name,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> ...
> }
>
> I was thinking about the above code in the patch and as far as I can
> think this can only occur if the same name slot is re-created with
> prior restart_lsn after the existing slot is dropped. Normally, the
> newly created slot (with the same name) will have higher restart_lsn
> but one can mimic it by copying some older slot by using
> pg_copy_logical_replication_slot().
>
> I don't think as mentioned in comments even if hot_standby_feedback is
> temporarily set to off, the above shouldn't happen. It can only lead
> to invalidated slots on standby.
>
> To close the above race, I could think of the following ways:
> 1. Drop and re-create the slot.
> 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> ahead of local_slot's LSN then we can update it; but as mentioned in
> your previous comment, we need to update all other fields as well. If
> we follow this then we probably need to have a check for catalog_xmin
> as well.
>

The second point as mentioned is slightly misleading, so let me try to
rephrase it once again: Emit LOG/WARNING in this case and once
remote_slot's LSN moves ahead of local_slot's LSN then we can update
it; additionally, we need to update all other fields like two_phase as
well. If we follow this then we probably need to have a check for
catalog_xmin as well along remote_slot's restart_lsn.

> Now, related to this the other case which needs some handling is what
> if the remote_slot's restart_lsn is greater than local_slot's
> restart_lsn but it is a re-created slot with the same name. In that
> case, I think the other properties like 'two_phase', 'plugin' could be
> different. So, is simply copying those sufficient or do we need to do
> something else as well?
>

Bertrand, Dilip, Sawada-San, and others, please share your opinion on
this problem as I think it is important to handle this race condition.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Postgres Partitions Limitations (5.11.2.3)
Следующее
От: vignesh C
Дата:
Сообщение: Re: A failure in t/038_save_logical_slots_shutdown.pl