Improving the latch handling between logical replication launcher and worker processes.

Поиск
Список
Период
Сортировка
От vignesh C
Тема Improving the latch handling between logical replication launcher and worker processes.
Дата
Msg-id CALDaNm01_KEgHM1tKtgXkCGLJ5209SMSmGw3UmhZbOz365_=eA@mail.gmail.com
обсуждение исходный текст
Ответы RE: Improving the latch handling between logical replication launcher and worker processes.  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
RE: Improving the latch handling between logical replication launcher and worker processes.  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
Hi,

Currently the launcher's latch is used for the following: a) worker
process attach b) worker process exit and c) subscription creation.
Since this same latch is used for multiple cases, the launcher process
is not able to handle concurrent scenarios like: a) Launcher started a
new apply worker and waiting for apply worker to attach and b) create
subscription sub2 sending launcher wake up signal. In this scenario,
both of them will set latch of the launcher process, the launcher
process is not able to identify that both operations have occurred 1)
worker is attached 2) subscription is created and apply worker should
be started. As a result the apply worker does not get started for the
new subscription created immediately and gets started after the
timeout of 180 seconds.

I have started a new thread for this based on suggestions at [1].

We could improvise this by one of the following:
a) Introduce a new latch to handle worker attach and exit.
b) Add a new GUC launcher_retry_time which gives more flexibility to
users as suggested by Amit at [1]. Before 5a3a953, the
wal_retrieve_retry_interval plays a similar role as the suggested new
GUC launcher_retry_time, e.g. even if a worker is launched, the
launcher only wait wal_retrieve_retry_interval time before next round.
c) Don't reset the latch at worker attach and allow launcher main to
identify and handle it. For this there is a patch v6-0002 available at
[2].

I'm not sure which approach is better in this case. I was thinking if
we should add a new latch to handle worker attach and exit.
Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1KR29XfBi5rObgV06xcBLn7y%2BLCuxcSMdKUkKZK740L2w%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm10R7L0Dxq%2B-J%3DPp3AfM_yaokpbhECvJ69QiGH8-jQquw%40mail.gmail.com

Regards,
Vignesh



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: broken reading on standby (PostgreSQL 16.2)
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: broken reading on standby (PostgreSQL 16.2)