Обсуждение: Fix stats reporting delays in logical parallel apply worker
Hi, When implementing another feature, I noticed that parallel apply workers currently do not report statistics while idle in their main loop. This can cause stats from the last processed transaction to be arbitrarily delayed, especially when there are long gaps between streamed transactions. The issue is demonstrated in 0002, where a TAP test fails when attempting to collect stats from a parallel apply worker that has no subsequent transaction to trigger a stats report. 0001 fixes this issue by forcing a stats report when the worker is idle in the main loop, matching the behavior already present in LogicalRepApplyLoop() for regular logical apply workers. Best Regards, Hou zj
Вложения
On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > Hi, > > When implementing another feature, I noticed that parallel apply workers > currently do not report statistics while idle in their main loop. This can cause > stats from the last processed transaction to be arbitrarily delayed, especially > when there are long gaps between streamed transactions. > > The issue is demonstrated in 0002, where a TAP test fails when attempting to > collect stats from a parallel apply worker that has no subsequent transaction > to > trigger a stats report. > > 0001 fixes this issue by forcing a stats report when the worker is idle in the > main loop, matching the behavior already present in LogicalRepApplyLoop() > for > regular logical apply workers. Regarding 0002, I realized that the streaming option is now set to 'parallel' by default so can avoid adjusting the option again. The test needs to be adjusted to increase the worker limit so that a parallel worker can start. Here are the updated patches. Best Regards, Hou zj
Вложения
On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > When implementing another feature, I noticed that parallel apply workers > currently do not report statistics while idle in their main loop. This can cause > stats from the last processed transaction to be arbitrarily delayed, especially > when there are long gaps between streamed transactions. > > The issue is demonstrated in 0002, where a TAP test fails when attempting to > collect stats from a parallel apply worker that has no subsequent transaction to > trigger a stats report. > > 0001 fixes this issue by forcing a stats report when the worker is idle in the > main loop, matching the behavior already present in LogicalRepApplyLoop() for > regular logical apply workers. > LGTM. We should backpatch this change. -- With Regards, Amit Kapila.
> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: >> Hi, >> >> When implementing another feature, I noticed that parallel apply workers >> currently do not report statistics while idle in their main loop. This can cause >> stats from the last processed transaction to be arbitrarily delayed, especially >> when there are long gaps between streamed transactions. >> >> The issue is demonstrated in 0002, where a TAP test fails when attempting to >> collect stats from a parallel apply worker that has no subsequent transaction >> to >> trigger a stats report. >> >> 0001 fixes this issue by forcing a stats report when the worker is idle in the >> main loop, matching the behavior already present in LogicalRepApplyLoop() >> for >> regular logical apply workers. > > Regarding 0002, I realized that the streaming option is now set to 'parallel' by > default so can avoid adjusting the option again. The test needs to be adjusted > to increase the worker limit so that a parallel worker can start. Here are the > updated patches. > > Best Regards, > Hou zj > <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-0002-Test-the-stats-report-in-parallel-apply-worker.patch> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT together, so we can do “else if (rc & WL_TIMEOUT) && !IsTransactionState())”,so that upon WL_LATCH_SET, it skips the WL_TIMEOUT check, which could be slightly more efficient. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Friday, April 17, 2026 3:41 PM Chao Li <li.evan.chao@gmail.com> wrote: > > > On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > >> Hi, > >> > >> When implementing another feature, I noticed that parallel apply workers > >> currently do not report statistics while idle in their main loop. This can > cause > >> stats from the last processed transaction to be arbitrarily delayed, > especially > >> when there are long gaps between streamed transactions. > >> > >> The issue is demonstrated in 0002, where a TAP test fails when attempting > to > >> collect stats from a parallel apply worker that has no subsequent > transaction > >> to > >> trigger a stats report. > >> > >> 0001 fixes this issue by forcing a stats report when the worker is idle in the > >> main loop, matching the behavior already present in > LogicalRepApplyLoop() > >> for > >> regular logical apply workers. > > > > Regarding 0002, I realized that the streaming option is now set to 'parallel' > by > > default so can avoid adjusting the option again. The test needs to be > adjusted > > to increase the worker limit so that a parallel worker can start. Here are the > > updated patches. > > > > Best Regards, > > Hou zj > > <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2- > 0002-Test-the-stats-report-in-parallel-apply-worker.patch> > > I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT > together, so we can do “else if (rc & WL_TIMEOUT) > && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the > WL_TIMEOUT check, which could be slightly more efficient. I'm not sure we should assume that WaitLatch will set only one flag at a time. even if that assumption holds for this specific case, handling bit flags this way looks a bit odd. AFAICS, we don't use this style elsewhere in the code. Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c) all use if ... if logic. Best Regards, Hou zj
> On Apr 17, 2026, at 17:20, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Friday, April 17, 2026 3:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
>> wrote:
>>>
>>> On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu)
>> <houzj.fnst@fujitsu.com> wrote:
>>>> Hi,
>>>>
>>>> When implementing another feature, I noticed that parallel apply workers
>>>> currently do not report statistics while idle in their main loop. This can
>> cause
>>>> stats from the last processed transaction to be arbitrarily delayed,
>> especially
>>>> when there are long gaps between streamed transactions.
>>>>
>>>> The issue is demonstrated in 0002, where a TAP test fails when attempting
>> to
>>>> collect stats from a parallel apply worker that has no subsequent
>> transaction
>>>> to
>>>> trigger a stats report.
>>>>
>>>> 0001 fixes this issue by forcing a stats report when the worker is idle in the
>>>> main loop, matching the behavior already present in
>> LogicalRepApplyLoop()
>>>> for
>>>> regular logical apply workers.
>>>
>>> Regarding 0002, I realized that the streaming option is now set to 'parallel'
>> by
>>> default so can avoid adjusting the option again. The test needs to be
>> adjusted
>>> to increase the worker limit so that a parallel worker can start. Here are the
>>> updated patches.
>>>
>>> Best Regards,
>>> Hou zj
>>> <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-
>> 0002-Test-the-stats-report-in-parallel-apply-worker.patch>
>>
>> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT
>> together, so we can do “else if (rc & WL_TIMEOUT)
>> && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the
>> WL_TIMEOUT check, which could be slightly more efficient.
>
> I'm not sure we should assume that WaitLatch will set only one flag at a time.
> even if that assumption holds for this specific case, handling bit flags this way looks a bit odd.
> AFAICS, we don't use this style elsewhere in the code.
> Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c)
> all use if ... if logic.
>
> Best Regards,
> Hou zj
WL_TIMEOUT is not a real event. If we look at the code of WaitLatch:
```
if (WaitEventSetWait(LatchWaitSet,
(wakeEvents & WL_TIMEOUT) ? timeout : -1,
&event, 1,
wait_event_info) == 0)
return WL_TIMEOUT;
else
return event.events;
```
WL_TIMEOUT won’t be union with other events at all.
Anyway, that’s not a big concern.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Fri, Apr 17, 2026 at 12:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > When implementing another feature, I noticed that parallel apply workers > > currently do not report statistics while idle in their main loop. This can cause > > stats from the last processed transaction to be arbitrarily delayed, especially > > when there are long gaps between streamed transactions. > > > > The issue is demonstrated in 0002, where a TAP test fails when attempting to > > collect stats from a parallel apply worker that has no subsequent transaction to > > trigger a stats report. > > > > 0001 fixes this issue by forcing a stats report when the worker is idle in the > > main loop, matching the behavior already present in LogicalRepApplyLoop() for > > regular logical apply workers. > > > > LGTM. We should backpatch this change. > Pushed now. -- With Regards, Amit Kapila.