Обсуждение: sync request forward function ForwardSyncRequest() might hang for some time in a corner case?

Поиск
Список
Период
Сортировка
Hi hackers,

I found this when reading the related code. Here is the scenario:

bool
RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
                    bool retryOnError)

For the case retryOnError is true, the function would in loop call
ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
we can see if we run into the below branch, RegisterSyncRequest() will
need to loop until the checkpointer absorbs the existing requests so
ForwardSyncRequest() might hang for some time until a checkpoint
request is triggered. This scenario seems to be possible in theory
though the chance is not high.

ForwardSyncRequest():

    if (CheckpointerShmem->checkpointer_pid == 0 ||
        (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
         !CompactCheckpointerRequestQueue()))
    {
        /*
         * Count the subset of writes where backends have to do their own
         * fsync
         */
        if (!AmBackgroundWriterProcess())
            CheckpointerShmem->num_backend_fsync++;
        LWLockRelease(CheckpointerCommLock);
        return false;
    }

One fix is to add below similar code in RegisterSyncRequest(), trigger
a checkpoint for the scenario.

// checkpointer_triggered: variable for one trigger only.
if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
!checkpointer_triggered)
        SetLatch(ProcGlobal->checkpointerLatch);

Any comments?

Regards,
Paul Guo (Vmware)



On Tue, May 25, 2021 at 4:39 PM Paul Guo <paulguo@gmail.com> wrote:
>
> Hi hackers,
>
> I found this when reading the related code. Here is the scenario:
>
> bool
> RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
>                     bool retryOnError)
>
> For the case retryOnError is true, the function would in loop call
> ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
> we can see if we run into the below branch, RegisterSyncRequest() will
> need to loop until the checkpointer absorbs the existing requests so
> ForwardSyncRequest() might hang for some time until a checkpoint
> request is triggered. This scenario seems to be possible in theory
> though the chance is not high.

It seems like a really unlikely scenario, but maybe possible if you
use a lot of unlogged tables maybe (as you could eventually
dirty/evict more than NBuffers buffers without triggering enough WALs
activity to trigger a checkpoint with any sane checkpoint
configuration).

> ForwardSyncRequest():
>
>     if (CheckpointerShmem->checkpointer_pid == 0 ||
>         (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
>          !CompactCheckpointerRequestQueue()))
>     {
>         /*
>          * Count the subset of writes where backends have to do their own
>          * fsync
>          */
>         if (!AmBackgroundWriterProcess())
>             CheckpointerShmem->num_backend_fsync++;
>         LWLockRelease(CheckpointerCommLock);
>         return false;
>     }
>
> One fix is to add below similar code in RegisterSyncRequest(), trigger
> a checkpoint for the scenario.
>
> // checkpointer_triggered: variable for one trigger only.
> if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> !checkpointer_triggered)
>         SetLatch(ProcGlobal->checkpointerLatch);
>
> Any comments?

It looks like you intended to set the checkpointer_triggered var but
didn't.  Also this will wake up the checkpointer but won't force a
checkpoint (unlike RequestCheckpoint()).  It may be a good thing
though as it would only absorb the requests and go back to sleep if no
other threshold is reachrf.  Apart from the implementation details it
seems like it could help in this unlikely event.



On Thu, May 27, 2021 at 7:11 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, May 25, 2021 at 4:39 PM Paul Guo <paulguo@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > I found this when reading the related code. Here is the scenario:
> >
> > bool
> > RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
> >                     bool retryOnError)
> >
> > For the case retryOnError is true, the function would in loop call
> > ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
> > we can see if we run into the below branch, RegisterSyncRequest() will
> > need to loop until the checkpointer absorbs the existing requests so
> > ForwardSyncRequest() might hang for some time until a checkpoint
> > request is triggered. This scenario seems to be possible in theory
> > though the chance is not high.
>
> It seems like a really unlikely scenario, but maybe possible if you
> use a lot of unlogged tables maybe (as you could eventually
> dirty/evict more than NBuffers buffers without triggering enough WALs
> activity to trigger a checkpoint with any sane checkpoint
> configuration).

RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and
SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for
buffer sync.

> > ForwardSyncRequest():
> >
> >     if (CheckpointerShmem->checkpointer_pid == 0 ||
> >         (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
> >          !CompactCheckpointerRequestQueue()))
> >     {
> >         /*
> >          * Count the subset of writes where backends have to do their own
> >          * fsync
> >          */
> >         if (!AmBackgroundWriterProcess())
> >             CheckpointerShmem->num_backend_fsync++;
> >         LWLockRelease(CheckpointerCommLock);
> >         return false;
> >     }
> >
> > One fix is to add below similar code in RegisterSyncRequest(), trigger
> > a checkpoint for the scenario.
> >
> > // checkpointer_triggered: variable for one trigger only.
> > if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> > !checkpointer_triggered)
> >         SetLatch(ProcGlobal->checkpointerLatch);
> >
> > Any comments?
>
> It looks like you intended to set the checkpointer_triggered var but

Yes this is just pseduo code.

> didn't.  Also this will wake up the checkpointer but won't force a
> checkpoint (unlike RequestCheckpoint()).  It may be a good thing

I do not expect an immediate checkpoint. AbsorbSyncRequests()
is enough since after that RegisterSyncRequest() could finish.

> though as it would only absorb the requests and go back to sleep if no
> other threshold is reachrf.  Apart from the implementation details it
> seems like it could help in this unlikely event.



-- 
Paul Guo (Vmware)



On Thu, May 27, 2021 at 9:59 PM Paul Guo <paulguo@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 7:11 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Tue, May 25, 2021 at 4:39 PM Paul Guo <paulguo@gmail.com> wrote:
> > >
> > > Hi hackers,
> > >
> > > I found this when reading the related code. Here is the scenario:
> > >
> > > bool
> > > RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
> > >                     bool retryOnError)
> > >
> > > For the case retryOnError is true, the function would in loop call
> > > ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
> > > we can see if we run into the below branch, RegisterSyncRequest() will
> > > need to loop until the checkpointer absorbs the existing requests so
> > > ForwardSyncRequest() might hang for some time until a checkpoint
> > > request is triggered. This scenario seems to be possible in theory
> > > though the chance is not high.
> >
> > It seems like a really unlikely scenario, but maybe possible if you
> > use a lot of unlogged tables maybe (as you could eventually
> > dirty/evict more than NBuffers buffers without triggering enough WALs
> > activity to trigger a checkpoint with any sane checkpoint
> > configuration).
>
> RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and
> SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for
> buffer sync.
>
> > > ForwardSyncRequest():
> > >
> > >     if (CheckpointerShmem->checkpointer_pid == 0 ||
> > >         (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
> > >          !CompactCheckpointerRequestQueue()))
> > >     {
> > >         /*
> > >          * Count the subset of writes where backends have to do their own
> > >          * fsync
> > >          */
> > >         if (!AmBackgroundWriterProcess())
> > >             CheckpointerShmem->num_backend_fsync++;
> > >         LWLockRelease(CheckpointerCommLock);
> > >         return false;
> > >     }
> > >
> > > One fix is to add below similar code in RegisterSyncRequest(), trigger
> > > a checkpoint for the scenario.
> > >
> > > // checkpointer_triggered: variable for one trigger only.
> > > if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> > > !checkpointer_triggered)
> > >         SetLatch(ProcGlobal->checkpointerLatch);
> > >
> > > Any comments?
> >
> > It looks like you intended to set the checkpointer_triggered var but
>
> Yes this is just pseduo code.
>
> > didn't.  Also this will wake up the checkpointer but won't force a
> > checkpoint (unlike RequestCheckpoint()).  It may be a good thing
>
> I do not expect an immediate checkpoint. AbsorbSyncRequests()
> is enough since after that RegisterSyncRequest() could finish.
>
> > though as it would only absorb the requests and go back to sleep if no
> > other threshold is reachrf.  Apart from the implementation details it
> > seems like it could help in this unlikely event.
>

Also note that ForwardSyncRequest() does wake up the checkpointer if
it thinks the requests in shared memory are "too full", but does not
wake up when the request is actually full. This does not seem to be reasonable.
See below code in ForwardSyncRequest

    /* If queue is more than half full, nudge the checkpointer to empty it */
    too_full = (CheckpointerShmem->num_requests >=
                CheckpointerShmem->max_requests / 2);

    /* ... but not till after we release the lock */
    if (too_full && ProcGlobal->checkpointerLatch)
        SetLatch(ProcGlobal->checkpointerLatch);

-- 
Paul Guo (Vmware)



On Thu, May 27, 2021 at 9:59 PM Paul Guo <paulguo@gmail.com> wrote:
>
> > It seems like a really unlikely scenario, but maybe possible if you
> > use a lot of unlogged tables maybe (as you could eventually
> > dirty/evict more than NBuffers buffers without triggering enough WALs
> > activity to trigger a checkpoint with any sane checkpoint
> > configuration).
>
> RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and
> SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for
> buffer sync.

I know, but the checkpointer can hold up to NBuffers requests, so I
highly doubt that you can end up filling the buffer with those.

> > > ForwardSyncRequest():
> > >
> > >     if (CheckpointerShmem->checkpointer_pid == 0 ||
> > >         (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
> > >          !CompactCheckpointerRequestQueue()))
> > >     {
> > >         /*
> > >          * Count the subset of writes where backends have to do their own
> > >          * fsync
> > >          */
> > >         if (!AmBackgroundWriterProcess())
> > >             CheckpointerShmem->num_backend_fsync++;
> > >         LWLockRelease(CheckpointerCommLock);
> > >         return false;
> > >     }
> > >
> > > One fix is to add below similar code in RegisterSyncRequest(), trigger
> > > a checkpoint for the scenario.
> > >
> > > // checkpointer_triggered: variable for one trigger only.
> > > if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> > > !checkpointer_triggered)
> > >         SetLatch(ProcGlobal->checkpointerLatch);
> > >
> > > Any comments?
> >
> > It looks like you intended to set the checkpointer_triggered var but
>
> Yes this is just pseduo code.
>
> > didn't.  Also this will wake up the checkpointer but won't force a
> > checkpoint (unlike RequestCheckpoint()).  It may be a good thing
>
> I do not expect an immediate checkpoint. AbsorbSyncRequests()
> is enough since after that RegisterSyncRequest() could finish.

You said "trigger a checkpoint", which sounded more like forcing a
checkpointer rather than waking up the checkpointer so that it can
absorb the pending requests, so it seems worth to mention what it
would really do.



> You said "trigger a checkpoint", which sounded more like forcing a
> checkpointer rather than waking up the checkpointer so that it can
> absorb the pending requests, so it seems worth to mention what it
> would really do.

Yes it is not accurate. Thanks for the clarification.

-- 
Paul Guo (Vmware)



On Thu, May 27, 2021 at 10:05 PM Paul Guo <paulguo@gmail.com> wrote:
>
> Also note that ForwardSyncRequest() does wake up the checkpointer if
> it thinks the requests in shared memory are "too full", but does not
> wake up when the request is actually full. This does not seem to be reasonable.
> See below code in ForwardSyncRequest
>
>     /* If queue is more than half full, nudge the checkpointer to empty it */
>     too_full = (CheckpointerShmem->num_requests >=
>                 CheckpointerShmem->max_requests / 2);
>
>     /* ... but not till after we release the lock */
>     if (too_full && ProcGlobal->checkpointerLatch)
>         SetLatch(ProcGlobal->checkpointerLatch);

Ah indeed.  Well it means that the checkpointer it woken up early
enough to avoid reaching that point.  I'm not sure that it's actually
possible to reach a point where the list if full and the checkpointer
is sitting idle.



On Thu, May 27, 2021 at 10:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 10:05 PM Paul Guo <paulguo@gmail.com> wrote:
> >
> > Also note that ForwardSyncRequest() does wake up the checkpointer if
> > it thinks the requests in shared memory are "too full", but does not
> > wake up when the request is actually full. This does not seem to be reasonable.
> > See below code in ForwardSyncRequest
> >
> >     /* If queue is more than half full, nudge the checkpointer to empty it */
> >     too_full = (CheckpointerShmem->num_requests >=
> >                 CheckpointerShmem->max_requests / 2);
> >
> >     /* ... but not till after we release the lock */
> >     if (too_full && ProcGlobal->checkpointerLatch)
> >         SetLatch(ProcGlobal->checkpointerLatch);
>
> Ah indeed.  Well it means that the checkpointer it woken up early
> enough to avoid reaching that point.  I'm not sure that it's actually
> possible to reach a point where the list if full and the checkpointer
> is sitting idle.

In theory this is possible (when the system is under heavy parallel write)
else we could remove that part code (CompactCheckpointerRequestQueue())
:-), though the chance is not high.

If we encounter this issue those affected queries would suddenly hang
until the next checkpointer wakeup.