Re: Implement waiting for wal lsn replay: reloaded

Поиск
Список
Период
Сортировка
От Xuneng Zhou
Тема Re: Implement waiting for wal lsn replay: reloaded
Дата
Msg-id CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Implement waiting for wal lsn replay: reloaded  (Tomas Vondra <tomas@vondra.me>)
Ответы Re: Implement waiting for wal lsn replay: reloaded
Список pgsql-hackers
Hi Tomas,

On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas@vondra.me> wrote:
>
> On 11/5/25 10:51, Alexander Korotkov wrote:
> > Hi!
> >
> > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> >>> On 2025-Nov-03, Alexander Korotkov wrote:
> >>>
> >>>> I'd like to give this subject another chance for pg19.  I'm going to
> >>>> push this if no objections.
> >>>
> >>> Sure.  I don't understand why patches 0002 and 0003 are separate though.
> >>
> >> FWIW, I appreciate such splits. Even if the functionality isn't usable
> >> independently, it's still different type of code that's affected. And the
> >> patches are each big enough to make that worthwhile for easier review.
> >
> > Thank you for the feedback, pushed.
> >
>
> Hi,
>
> The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
> it takes a long time - about 65 seconds on my laptop. That's about 25%
> of the whole src/test/recovery, more than any other test.
>
> And most of the time there's nothing happening - these are the two log
> messages showing the 60-second wait:
>
> 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG:  checkpoint
> complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
> added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
> s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
> estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060
>
> 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
> ERROR:  recovery is not in progress
>
> So there's a checkpoint, 60 seconds of nothing, and then a failure. I
> haven't looked into why it waits for 1 minute exactly, but adding 60
> seconds to check-world is somewhat annoying.

Thanks for looking into this!

I did a quick analysis for this prolonged waiting:

In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly
handled InvalidXLogRecPtr:
/* Fast path check */
if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
    return;  // Issue: Returns early when currentLSN = 0

When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters",
the check compared:
- minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE
- Result: function returned early without waking anyone

When It Happened
During standby promotion, xlog.c:6246 calls:

WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);

This should wake all LSN waiters, but the bug prevented it. WAIT FOR
LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68
seconds instead of ~9 seconds.

if the above analysis is sound, the fix could be like:

Proposed fix:
Added a validity check before the comparison:
/*
 * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
 * "wake all waiters" (e.g., during promotion when recovery ends).
 */
if (XLogRecPtrIsValid(currentLSN) &&
    pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
    return;

Result:
Test time: 68s → 9s
WAIT FOR LSN exits immediately on promotion (62ms vs 60s)

> While at it, I noticed a couple comments refer to WaitForLSNReplay, but
> but I think that got renamed simply to WaitForLSN.

Please check the attached patch for replacing them.

--
Best,
Xuneng

Вложения

В списке pgsql-hackers по дате отправления: