Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAA4eK1LDzUJDt3eeuGLh78fEvN8aJ_c86KW8c+hncPdt9_7xxw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Thu, Jan 11, 2024 at 9:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > +static bool
> > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
> > > {
> > > ...
> > > + /* Slot ready for sync, so sync it. */
> > > + else
> > > + {
> > > + /*
> > > + * Sanity check: With hot_standby_feedback enabled and
> > > + * invalidations handled appropriately as above, this should never
> > > + * happen.
> > > + */
> > > + if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + elog(ERROR,
> > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > > + " to remote slot's LSN(%X/%X) as synchronization"
> > > + " would move it backwards", remote_slot->name,
> > > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > > ...
> > > }
> > >
> > > I was thinking about the above code in the patch and as far as I can
> > > think this can only occur if the same name slot is re-created with
> > > prior restart_lsn after the existing slot is dropped. Normally, the
> > > newly created slot (with the same name) will have higher restart_lsn
> > > but one can mimic it by copying some older slot by using
> > > pg_copy_logical_replication_slot().
> > >
> > > I don't think as mentioned in comments even if hot_standby_feedback is
> > > temporarily set to off, the above shouldn't happen. It can only lead
> > > to invalidated slots on standby.
>
> I also think so.
>
> > >
> > > To close the above race, I could think of the following ways:
> > > 1. Drop and re-create the slot.
> > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > your previous comment, we need to update all other fields as well. If
> > > we follow this then we probably need to have a check for catalog_xmin
> > > as well.
>
> IIUC, this would be a sync slot (so not usable until promotion) that could
> not be used anyway (invalidated), so I'll vote for drop / re-create then.
>

No, it can happen for non-sync slots as well.

> > > Now, related to this the other case which needs some handling is what
> > > if the remote_slot's restart_lsn is greater than local_slot's
> > > restart_lsn but it is a re-created slot with the same name. In that
> > > case, I think the other properties like 'two_phase', 'plugin' could be
> > > different. So, is simply copying those sufficient or do we need to do
> > > something else as well?
> > >
> >
>
> I'm not sure to follow here. If the remote slot is re-created then it would
> be also dropped / re-created locally, or am I missing something?
>

As our slot-syncing mechanism is asynchronous (from time to time we
check the slot information on primary), isn't it possible that the
same name slot is dropped and recreated between slot-sync worker's
checks?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: torikoshia
Дата:
Сообщение: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: speed up a logical replica setup