Обсуждение: Fix stats reporting delays in logical parallel apply worker

Поиск
Список
Период
Сортировка

Fix stats reporting delays in logical parallel apply worker

От
"Zhijie Hou (Fujitsu)"
Дата:
Hi,

When implementing another feature, I noticed that parallel apply workers
currently do not report statistics while idle in their main loop. This can cause
stats from the last processed transaction to be arbitrarily delayed, especially
when there are long gaps between streamed transactions.

The issue is demonstrated in 0002, where a TAP test fails when attempting to
collect stats from a parallel apply worker that has no subsequent transaction to
trigger a stats report.

0001 fixes this issue by forcing a stats report when the worker is idle in the
main loop, matching the behavior already present in LogicalRepApplyLoop() for
regular logical apply workers.

Best Regards,
Hou zj

Вложения

RE: Fix stats reporting delays in logical parallel apply worker

От
"Zhijie Hou (Fujitsu)"
Дата:
On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> Hi,
>
> When implementing another feature, I noticed that parallel apply workers
> currently do not report statistics while idle in their main loop. This can cause
> stats from the last processed transaction to be arbitrarily delayed, especially
> when there are long gaps between streamed transactions.
>
> The issue is demonstrated in 0002, where a TAP test fails when attempting to
> collect stats from a parallel apply worker that has no subsequent transaction
> to
> trigger a stats report.
>
> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> main loop, matching the behavior already present in LogicalRepApplyLoop()
> for
> regular logical apply workers.

Regarding 0002, I realized that the streaming option is now set to 'parallel' by
default so can avoid adjusting the option again. The test needs to be adjusted
to increase the worker limit so that a parallel worker can start. Here are the
updated patches.

Best Regards,
Hou zj

Вложения

Re: Fix stats reporting delays in logical parallel apply worker

От
Amit Kapila
Дата:
On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> When implementing another feature, I noticed that parallel apply workers
> currently do not report statistics while idle in their main loop. This can cause
> stats from the last processed transaction to be arbitrarily delayed, especially
> when there are long gaps between streamed transactions.
>
> The issue is demonstrated in 0002, where a TAP test fails when attempting to
> collect stats from a parallel apply worker that has no subsequent transaction to
> trigger a stats report.
>
> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> main loop, matching the behavior already present in LogicalRepApplyLoop() for
> regular logical apply workers.
>

LGTM. We should backpatch this change.

--
With Regards,
Amit Kapila.



Re: Fix stats reporting delays in logical parallel apply worker

От
Chao Li
Дата:

> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>> Hi,
>>
>> When implementing another feature, I noticed that parallel apply workers
>> currently do not report statistics while idle in their main loop. This can cause
>> stats from the last processed transaction to be arbitrarily delayed, especially
>> when there are long gaps between streamed transactions.
>>
>> The issue is demonstrated in 0002, where a TAP test fails when attempting to
>> collect stats from a parallel apply worker that has no subsequent transaction
>> to
>> trigger a stats report.
>>
>> 0001 fixes this issue by forcing a stats report when the worker is idle in the
>> main loop, matching the behavior already present in LogicalRepApplyLoop()
>> for
>> regular logical apply workers.
>
> Regarding 0002, I realized that the streaming option is now set to 'parallel' by
> default so can avoid adjusting the option again. The test needs to be adjusted
> to increase the worker limit so that a parallel worker can start. Here are the
> updated patches.
>
> Best Regards,
> Hou zj
>
<v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-0002-Test-the-stats-report-in-parallel-apply-worker.patch>

I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT together, so we can do “else if (rc & WL_TIMEOUT) &&
!IsTransactionState())”,so that upon WL_LATCH_SET, it skips the WL_TIMEOUT check, which could be slightly more
efficient.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







RE: Fix stats reporting delays in logical parallel apply worker

От
"Zhijie Hou (Fujitsu)"
Дата:
On Friday, April 17, 2026 3:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
> 
> > On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >> Hi,
> >>
> >> When implementing another feature, I noticed that parallel apply workers
> >> currently do not report statistics while idle in their main loop. This can
> cause
> >> stats from the last processed transaction to be arbitrarily delayed,
> especially
> >> when there are long gaps between streamed transactions.
> >>
> >> The issue is demonstrated in 0002, where a TAP test fails when attempting
> to
> >> collect stats from a parallel apply worker that has no subsequent
> transaction
> >> to
> >> trigger a stats report.
> >>
> >> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> >> main loop, matching the behavior already present in
> LogicalRepApplyLoop()
> >> for
> >> regular logical apply workers.
> >
> > Regarding 0002, I realized that the streaming option is now set to 'parallel'
> by
> > default so can avoid adjusting the option again. The test needs to be
> adjusted
> > to increase the worker limit so that a parallel worker can start. Here are the
> > updated patches.
> >
> > Best Regards,
> > Hou zj
> > <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-
> 0002-Test-the-stats-report-in-parallel-apply-worker.patch>
> 
> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT
> together, so we can do “else if (rc & WL_TIMEOUT)
> && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the
> WL_TIMEOUT check, which could be slightly more efficient.

I'm not sure we should assume that WaitLatch will set only one flag at a time.
even if that assumption holds for this specific case, handling bit flags this way looks a bit odd.
AFAICS, we don't use this style elsewhere in the code.
Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c)
all use if ... if logic.

Best Regards,
Hou zj

Re: Fix stats reporting delays in logical parallel apply worker

От
Chao Li
Дата:

> On Apr 17, 2026, at 17:20, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Friday, April 17, 2026 3:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
>> wrote:
>>>
>>> On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu)
>> <houzj.fnst@fujitsu.com> wrote:
>>>> Hi,
>>>>
>>>> When implementing another feature, I noticed that parallel apply workers
>>>> currently do not report statistics while idle in their main loop. This can
>> cause
>>>> stats from the last processed transaction to be arbitrarily delayed,
>> especially
>>>> when there are long gaps between streamed transactions.
>>>>
>>>> The issue is demonstrated in 0002, where a TAP test fails when attempting
>> to
>>>> collect stats from a parallel apply worker that has no subsequent
>> transaction
>>>> to
>>>> trigger a stats report.
>>>>
>>>> 0001 fixes this issue by forcing a stats report when the worker is idle in the
>>>> main loop, matching the behavior already present in
>> LogicalRepApplyLoop()
>>>> for
>>>> regular logical apply workers.
>>>
>>> Regarding 0002, I realized that the streaming option is now set to 'parallel'
>> by
>>> default so can avoid adjusting the option again. The test needs to be
>> adjusted
>>> to increase the worker limit so that a parallel worker can start. Here are the
>>> updated patches.
>>>
>>> Best Regards,
>>> Hou zj
>>> <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-
>> 0002-Test-the-stats-report-in-parallel-apply-worker.patch>
>>
>> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT
>> together, so we can do “else if (rc & WL_TIMEOUT)
>> && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the
>> WL_TIMEOUT check, which could be slightly more efficient.
>
> I'm not sure we should assume that WaitLatch will set only one flag at a time.
> even if that assumption holds for this specific case, handling bit flags this way looks a bit odd.
> AFAICS, we don't use this style elsewhere in the code.
> Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c)
> all use if ... if logic.
>
> Best Regards,
> Hou zj

WL_TIMEOUT is not a real event. If we look at the code of WaitLatch:
```
   if (WaitEventSetWait(LatchWaitSet,
                   (wakeEvents & WL_TIMEOUT) ? timeout : -1,
                   &event, 1,
                   wait_event_info) == 0)
      return WL_TIMEOUT;
   else
      return event.events;
```
WL_TIMEOUT won’t be union with other events at all.

Anyway, that’s not a big concern.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Fix stats reporting delays in logical parallel apply worker

От
Amit Kapila
Дата:
On Fri, Apr 17, 2026 at 12:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > When implementing another feature, I noticed that parallel apply workers
> > currently do not report statistics while idle in their main loop. This can cause
> > stats from the last processed transaction to be arbitrarily delayed, especially
> > when there are long gaps between streamed transactions.
> >
> > The issue is demonstrated in 0002, where a TAP test fails when attempting to
> > collect stats from a parallel apply worker that has no subsequent transaction to
> > trigger a stats report.
> >
> > 0001 fixes this issue by forcing a stats report when the worker is idle in the
> > main loop, matching the behavior already present in LogicalRepApplyLoop() for
> > regular logical apply workers.
> >
>
> LGTM. We should backpatch this change.
>

Pushed now.

--
With Regards,
Amit Kapila.