Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uCALOpGY+9vdD=qaCgoPOeLfGpDzVOt3qEJHCnpQ9SqWw@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Synchronizing slots from primary to standby  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
On Mon, Oct 2, 2023 at 4:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shveta,
>
> Thank you for updating the patch!
>
> I found another ERROR due to the slot removal. Is this a real issue?
>
> 1. applied add_sleep.txt, which emulated the case the tablesync worker stucked
>    and the primary crashed during the
>    initial sync.
> 2. executed test_0925_v2.sh (You attached in [1])
> 3. secondary could not start the logical replication because the slot was not
>    created (log files were also attached).
>
>
> Here is my analysis. The cause is that the slotsync worker aborts the slot creation
> on secondary server because the restart_lsn of secondary ahead the primary's one.
> IIUC it can be occurred when tablesync workers finishes initial copy before
> walsenders stream changes. In this case, the relstate of the worker is set to
> SUBREL_STATE_CATCHUP and the apply worker waits till the relation becomes
> SUBREL_STATE_SYNCDONE. From here the slot on primary will not be updated until
> the relation is caught up. If some changes are come and the primary crashes at
> that time, the syncslot worker will abort the slot creation.
>

Kuroda-San, we need to let slot-creation on standby finish before we
start expecting it to support logical replication on failover. In the
current case, as you stated the slot-creation itself is aborted and
thus it can not support logical-replication later.  We are currently
trying to think of possibilities to advance remote_lsn on primary
internally by slot-sync workers in order to accelerate slot-creation
on standby for cases where slot-creation is stuck due to primary's
restart_lsn lagging behind standby's restart_lsn. But till then, the
way to proceed for testing is to execute workload on primary for such
cases in order to accelerate slot-creation.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Row pattern recognition