RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB57166FB3812761AD4B0DE60A943F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Ответы Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
Список pgsql-hackers
On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > >
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments.
> > >
> >
> > The other approach to fix this issue could be that the slotsync worker
> > get the serialized snapshot using pg_read_binary_file() corresponding
> > to restart_lsn and writes those at standby. But there are cases when
> > we won't have such a file like (a) when we initially create the slot
> > and reach the consistent_point, or (b) also by the time the slotsync
> > worker starts to read the remote snapshot file, the snapshot file
> > could have been removed by the checkpointer on the primary (if the
> > restart_lsn of the remote has been advanced in this window). So, in
> > such cases, we anyway need to advance the slot. I think these could be
> optimizations that we could do in the future.
> >
> > Few comments:
> 
> Thanks for the comments.
> 
> > =============
> > 1.
> > - if (slot->data.database != MyDatabaseId)
> > + if (slot->data.database != MyDatabaseId && !fast_forward)
> >   ereport(ERROR,
> >   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("replication slot \"%s\" was not created in this database",
> > @@ -526,7
> > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> >   * Do not allow consumption of a "synchronized" slot until the standby
> >   * gets promoted.
> >   */
> > - if (RecoveryInProgress() && slot->data.synced)
> > + if (RecoveryInProgress() && slot->data.synced &&
> > + !IsSyncingReplicationSlots())
> >
> >
> > Add comments at both of the above places.
> 
> Added.
> 
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the
> > name. How about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name
> > is too big, we can think of a shorter name. Any ideas?
> 
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
> LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be
> renamed in next version if we agree).
> 
> Attach the V3 patch which addressed above comments and Kuroda-san's
> comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
> after syncing, as the restart_lsn could be different from the remote one due to
> the new slot_advance() call. I am also testing some optimization idea locally and
> will share if ready.

Attach the V4 patch which includes the optimization to skip the decoding if
the snapshot at the syncing restart_lsn is already serialized. It can avoid most
of the duplicate decoding in my test, and I am doing some more tests locally.

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: Commitfest Manager for March
Следующее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: Commitfest Manager for March