Re: Improving the latch handling between logical replication launcher and worker processes.

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Improving the latch handling between logical replication launcher and worker processes.
Дата
Msg-id CALDaNm1TdrV6VjzYgNLDkzgBe=rf1ZP7Tc0hM-k+zi8foo4D5w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improving the latch handling between logical replication launcher and worker processes.  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Improving the latch handling between logical replication launcher and worker processes.
Re: Improving the latch handling between logical replication launcher and worker processes.
Список pgsql-hackers
On Fri, 5 Jul 2024 at 18:38, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 05/07/2024 14:07, vignesh C wrote:
> > On Thu, 4 Jul 2024 at 16:52, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>
> >> I'm don't quite understand the problem we're trying to fix:
> >>
> >>> Currently the launcher's latch is used for the following: a) worker
> >>> process attach b) worker process exit and c) subscription creation.
> >>> Since this same latch is used for multiple cases, the launcher process
> >>> is not able to handle concurrent scenarios like: a) Launcher started a
> >>> new apply worker and waiting for apply worker to attach and b) create
> >>> subscription sub2 sending launcher wake up signal. In this scenario,
> >>> both of them will set latch of the launcher process, the launcher
> >>> process is not able to identify that both operations have occurred 1)
> >>> worker is attached 2) subscription is created and apply worker should
> >>> be started. As a result the apply worker does not get started for the
> >>> new subscription created immediately and gets started after the
> >>> timeout of 180 seconds.
> >>
> >> I don't see how we could miss a notification. Yes, both events will set
> >> the same latch. Why is that a problem?
> >
> > The launcher process waits for the apply worker to attach at
> > WaitForReplicationWorkerAttach function. The launcher's latch is
> > getting set concurrently for another create subscription and apply
> > worker attached. The launcher now detects the latch is set while
> > waiting at WaitForReplicationWorkerAttach, it resets the latch and
> > proceed to the main loop and waits for DEFAULT_NAPTIME_PER_CYCLE(as
> > the latch has already been reset). Further details are provided below.
> >
> > The loop will see that the new
> >> worker has attached, and that the new subscription has been created, and
> >> process both events. Right?
> >
> > Since the latch is reset at WaitForReplicationWorkerAttach, it skips
> > processing the create subscription event.
> >
> > Slightly detailing further:
> > In the scenario when we execute two concurrent create subscription
> > commands, first CREATE SUBSCRIPTION sub1, followed immediately by
> > CREATE SUBSCRIPTION sub2.
> > In a few random scenarios, the following events may unfold:
> > After the first create subscription command(sub1), the backend will
> > set the launcher's latch because of ApplyLauncherWakeupAtCommit.
> > Subsequently, the launcher process will reset the latch and identify
> > the addition of a new subscription in the pg_subscription list. The
> > launcher process will proceed to request the postmaster to start the
> > apply worker background process (sub1) and request the postmaster to
> > notify the launcher once the apply worker(sub1) has been started.
> > Launcher will then wait for the apply worker(sub1) to attach in the
> > WaitForReplicationWorkerAttach function.
> > Meanwhile, the second CREATE SUBSCRIPTION command (sub2) which was
> > executed concurrently, also set the launcher's latch(because of
> > ApplyLauncherWakeupAtCommit).
> > Simultaneously when the launcher remains in the
> > WaitForReplicationWorkerAttach function waiting for apply worker of
> > sub1 to start, the postmaster will start the apply worker for
> > subscription sub1 and send a SIGUSR1 signal to the launcher process
> > via ReportBackgroundWorkerPID. Upon receiving this signal, the
> > launcher process will then set its latch.
> >
> > Now, the launcher's latch has been set concurrently after the apply
> > worker for sub1 is started and the execution of the CREATE
> > SUBSCRIPTION sub2 command.
> >
> > At this juncture, the launcher, which had been awaiting the attachment
> > of the apply worker, detects that the latch is set and proceeds to
> > reset it. The reset operation of the latch occurs within the following
> > section of code in WaitForReplicationWorkerAttach:
> > ...
> > rc = WaitLatch(MyLatch,
> >     WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> >     10L, WAIT_EVENT_BGWORKER_STARTUP);
> >
> > if (rc & WL_LATCH_SET)
> > {
> > ResetLatch(MyLatch);
> > CHECK_FOR_INTERRUPTS();
> > }
> > ...
> >
> > After resetting the latch here, the activation signal intended to
> > start the apply worker for subscription sub2 is no longer present. The
> > launcher will return to the ApplyLauncherMain function, where it will
> > await the DEFAULT_NAPTIME_PER_CYCLE, which is 180 seconds, before
> > processing the create subscription request (i.e., creating a new apply
> > worker for sub2).
> > The issue arises from the latch being reset in
> > WaitForReplicationWorkerAttach, which can occasionally delay the
> > synchronization of table data for the subscription.
>
> Ok, I see it now. Thanks for the explanation. So the problem isn't using
> the same latch for different purposes per se. It's that we're trying to
> use it in a nested fashion, resetting it in the inner loop.
>
> Looking at the proposed patch more closely:
>
> > @@ -221,13 +224,13 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
> >                * We need timeout because we generally don't get notified via latch
> >                * about the worker attach.  But we don't expect to have to wait long.
> >                */
> > -             rc = WaitLatch(MyLatch,
> > +             rc = WaitLatch(&worker->launcherWakeupLatch,
> >                                          WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> >                                          10L, WAIT_EVENT_BGWORKER_STARTUP);
> >
> >               if (rc & WL_LATCH_SET)
> >               {
> > -                     ResetLatch(MyLatch);
> > +                     ResetLatch(&worker->launcherWakeupLatch);
> >                       CHECK_FOR_INTERRUPTS();
> >               }
> >       }
>
> The comment here is now outdated, right? The patch adds an explicit
> SetLatch() call to ParallelApplyWorkerMain(), to notify the launcher
> about the attachment.

I will update the comment for this.

> Is the launcherWakeupLatch field protected by some lock, to protect
> which process owns it? OwnLatch() is called in
> logicalrep_worker_stop_internal() without holding a lock for example. Is
> there a guarantee that only one process can do that at the same time?

I have analyzed a few scenarios, I will analyze the remaining
scenarios and update on this.

> What happens if a process never calls DisownLatch(), e.g. because it
> bailed out with an error while waiting for the worker to attach or stop?

I tried a lot of scenarios by erroring out after ownlatch call and did
not hit the panic code of OwnLatch yet. I will try a few more
scenarios that I have in mind and see if we can hit the PANIC in
OwnLatch and update on this.

> As an alternative, smaller fix, I think we could do the attached. It
> forces the launcher's main loop to do another iteration, if it calls
> logicalrep_worker_launch(). That extra iteration should pick up any
> missed notifications.

This also works. I had another solution in similar lines like you
proposed at [1], where we don't reset the latch at the inner loop. I'm
not sure which solution we should proceed with. Thoughts?

> Your solution with an additional latch seems better because it makes
> WaitForReplicationWorkerAttach() react more quickly, without the 10 s
> wait. I'm surprised we have that in the first place, 10 s seems like a
> pretty long time to wait for a parallel apply worker to start. Why was
> that ever OK?

Here 10 means 10 milliseconds,  so it just waits for 10 milliseconds
before going to the next iteration.

[1] - https://www.postgresql.org/message-id/CALDaNm10R7L0Dxq%2B-J%3DPp3AfM_yaokpbhECvJ69QiGH8-jQquw%40mail.gmail.com

Regards,
Vignesh



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Дата:
Сообщение: RE: Partial aggregates pushdown
Следующее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: Partial aggregates pushdown