RE: Exit walsender before confirming remote flush in logical replication

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Exit walsender before confirming remote flush in logical replication
Дата
Msg-id TYAPR01MB58665435C61D6CDBA67CA644F5C19@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Exit walsender before confirming remote flush in logical replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Exit walsender before confirming remote flush in logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Dear Horiguchi-san,

> If I'm grabbing the discussion here correctly, in my memory, it is
> because: physical replication needs all records that have written on
> primary are written on standby for switchover to succeed. It is
> annoying that normal shutdown occasionally leads to switchover
> failure. Thus WalSndDone explicitly waits for remote flush/write
> regardless of the setting of synchronous_commit.

AFAIK the condition (sentPtr == replicatedPtr) seemed to be introduced for the purpose[1].
You meant to say that the conditon (!pq_is_send_pending()) has same motivation, right?

> Thus apply delay
> doesn't affect shutdown (AFAICS), and that is sufficient since all the
> records will be applied at the next startup.

I was not clear the word "next startup", but I agreed that we can shut down the
walsender in case of recovery_min_apply_delay > 0 and synchronous_commit = remote_apply.
The startup process will be not terminated even if the primary crashes, so I
think the process will apply the transaction sooner or later.

> In logical replication apply preceeds write and flush so we have no
> indication whether a record is "replicated" to standby by other than
> apply LSN. On the other hand, logical recplication doesn't have a
> business with switchover so that assurarance is useless. Thus I think
> we can (practically) ignore apply_lsn at shutdown. It seems subtly
> irregular, though.

Another consideration is that the condition (!pq_is_send_pending()) ensures that
there are no pending messages, including other packets. Currently we force walsenders
to clean up all messages before shutting down, even if it is a keepalive one.
I cannot have any problems caused by this, but I can keep the condition in case of
logical replication.

I updated the patch accordingly. Also, I found that the previous version
did not work well in case of streamed transactions. When a streamed transaction
is committed on publisher but the application is delayed on subscriber, the
process sometimes waits until there is no pending write. This is done in
ProcessPendingWrites(). I added another termination path in the function.

[1]: https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Exit walsender before confirming remote flush in logical replication
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: pgsql: Doc: add XML ID attributes to and tags.