Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uBY1x_mjqUk6dyD3iGtihwboy5mnrnL4tzZxTD3vy7X4A@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Ответы Re: Synchronizing slots from primary to standby
Список pgsql-hackers
On Fri, Dec 22, 2023 at 3:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, December 21, 2023 5:39 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Dec 21, 2023 at 02:23:12AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Attach the V51 patch set which addressed Kuroda-san's comments.
> > > > I also tried to improve the test in 0003 to make it stable.
> > >
> > > The patches conflict with a recent commit dc21234.
> > > Here is the rebased V51_2 version, there is no code changes in this version.
> > >
> >
> > Thanks!
> >
> > I've a few remarks regarding 0001:
>
> Thanks for the comments!
>
> >
> > 1 ===
> >
> > In the commit message what about replacing "Allow logical walsenders to wait
> > for the physical standbys" with "Force some logical walsenders to wait for the
> > physical standbys"?
>
> I feel 'Allow' is OK, as the GUC standby_slot_names is optional for user. ISTM, 'force'
> means we always wait for physical standbys regardless of the GUC.
>
> >
> > Also I think it would be better to first explain what we are trying to achieve and
> > after explain how we do it (adding a new flag in CREATE SUBSCRIPTION and so
> > on).
>
> Noted. We are about to split the patches, so will improve each commit message after that.
>
> >
> > 4 ===
> >
> > @@ -248,10 +262,13 @@ ReplicationSlotValidateName(const char *name, int
> > elevel)
> >   *     during getting changes, if the two_phase option is enabled it can skip
> >   *     prepare because by that time start decoding point has been moved. So
> > the
> >   *     user will only get commit prepared.
> > + * failover: If enabled, allows the slot to be synced to physical standbys so
> > + *     that logical replication can be resumed after failover.
> >
> > s/allows/forces ?
>
> I think whether the slot is synced also depends on the
> GUC setting on standby, so I feel 'allow' is fine here.
>
> >
> > 5 ===
> >
> > +       bool            ok;
> >
> > parse_ok maybe?
>
> The flag is also used to store the slot type check result, so I feel 'ok' is
> better here.
>
> >
> > 6 ===
> >
> > +       /* Need a modifiable copy of string. */
> > +       rawname = pstrdup(*newval);
> >
> > It seems to me that the single line comments in the neighborhood functions
> > (see
> > RestoreSlotFromDisk() for example) don't finish with ".". Worth to follow the
> > same format for all what we add in slot.c?
>
> I felt we have both styles in slot.c, but it seems Kuroda-san also
> prefer removing the ".", so will address.
>
> >
> > 7 ===
> >
> > +static void
> > +parseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)
> >
> > ParseAlterReplSlotOptions instead?
>
> I think it followed parseCreateReplSlotOptions, but I agree that it looks
> inconsistent with other names. Will address.
>
> > 11 ===
> >
> > +    * When the wait event is WAIT_FOR_STANDBY_CONFIRMATION, wait on
> > another
> > +    * CV that is woken up by physical walsenders when the walreceiver has
> > +    * confirmed the receipt of LSN.
> >
> > s/that is woken up by/that is broadcasted by/ ?
>
> Will reword the comment here.
>
> >
> > 12 ===
> >
> > We are mentioning in several places that the replication can be resumed after a
> > failover. Should we add a few words about possible lag? (see [1])
> >
> > [1]:
> > https://www.postgresql.org/message-id/CAA4eK1KihniOK21mEVYtSOHRQiG
> > NyToUmENWp7hPbH_PMsqzkA%40mail.gmail.com
>
> It feels like the implementation detail to me, but noted. We will think more
> about the document.
>
>
> The comments not mentioned above look good to me.
>
> Best Regards,
> Hou zj


PFA v53. Changes are:

patch001:
1) Addressed comments in [1] for v51-001. Thanks Hou-san for working on this.

patch002:
2) Addressed comments in [2] for v52-002.
3) Fixed CFBot failure. The failure was caused by an assert in
wait_for_primary_slot_catchup() for null confirmed_lsn received. In
wait_for_primary_slot_catchup(), we had an assumption that if
restart_lsn is valid and 'conflicting' is also false, then we must
have non-null confirmed_lsn. But this is not true. It is possible to
get null values for confirmed_lsn and catalog_xmin if on the primary
server the slot is just created with a valid restart_lsn and slot-sync
worker has fetched the slot before the primary server could set valid
confirmed_lsn and catalog_xmin. In
pg_create_logical_replication_slot(), there is a small window between
CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
restart_lsn and DecodingContextFindStartpoint() which sets
confirmed_lsn. If the slot-sync worker fetches the slot in this
window, confirmed_lsn received will be NULL. Corrected the code to
remove assert and added one additional condition that confirmed_lsn
should be valid before moving the slot to 'r'.

[1]: https://www.postgresql.org/message-id/ZYQHvgBpH0GgQaJK%40ip-10-97-1-34.eu-west-3.compute.internal
[2]:
https://www.postgresql.org/message-id/TY3PR01MB98893274D5A4FD4F86CC04A0F595A%40TY3PR01MB9889.jpnprd01.prod.outlook.com

thanks
Shveta

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: Optimization outcome depends on the index order
Следующее
От: Christoph Berg
Дата:
Сообщение: Re: Set log_lock_waits=on by default