Re: O(n) tasks cause lengthy startups and checkpoints

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: O(n) tasks cause lengthy startups and checkpoints
Дата
Msg-id 20220703172732.wembjsb55xl63vuw@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: O(n) tasks cause lengthy startups and checkpoints  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: O(n) tasks cause lengthy startups and checkpoints  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
Hi,

On 2022-07-03 10:07:54 -0700, Nathan Bossart wrote:
> Thanks for the prompt review.
> 
> On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
> >> +        /* Obtain requested tasks */
> >> +        SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +        flags = CustodianShmem->cust_flags;
> >> +        CustodianShmem->cust_flags = 0;
> >> +        SpinLockRelease(&CustodianShmem->cust_lck);
> > 
> > Just resetting the flags to 0 is problematic. Consider what happens if there's
> > two tasks and and the one processed first errors out. You'll loose information
> > about needing to run the second task.
> 
> I think we also want to retry any failed tasks.

I don't think so, at least not if it's just going to retry that task straight
away - then we'll get stuck on that one task forever. If we had the ability to
"queue" it the end, to be processed after other already dequeued tasks, it'd
be a different story.


> The way v6 handles this is by requesting all tasks after an exception.

Ick. That strikes me as a bad idea.


> >> +/*
> >> + * RequestCustodian
> >> + *        Called to request a custodian task.
> >> + */
> >> +void
> >> +RequestCustodian(int flags)
> >> +{
> >> +    SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +    CustodianShmem->cust_flags |= flags;
> >> +    SpinLockRelease(&CustodianShmem->cust_lck);
> >> +
> >> +    if (ProcGlobal->custodianLatch)
> >> +        SetLatch(ProcGlobal->custodianLatch);
> >> +}
> > 
> > With this representation we can't really implement waiting for a task or
> > such. And it doesn't seem like a great API for the caller to just specify a
> > mix of flags.
> 
> At the moment, the idea is that nothing should need to wait for a task
> because the custodian only handles things that are relatively non-critical.

Which is just plainly not true as the patchset stands...

I think we're going to have to block if some cleanup as part of a checkpoint
hasn't been completed by the next checkpoint - otherwise it'll just end up
being way too confusing and there's absolutely no backpressure anymore.


> If that changes, this could probably be expanded to look more like
> RequestCheckpoint().
> 
> What would you suggest using instead of a mix of flags?

I suspect an array of tasks with requested and completed counters or such?
With a condition variable to wait on?


> >> +    /* let the custodian know what it can remove */
> >> +    CustodianSetLogicalRewriteCutoff(cutoff);
> > 
> > Setting this variable in a custodian datastructure and then fetching it from
> > there seems architecturally wrong to me.
> 
> Where do you think it should go?  I previously had it in the checkpointer's
> shared memory, but you didn't like that the functions were declared in
> bgwriter.h (along with the other checkpoint stuff).  If the checkpointer
> shared memory is the right place, should we create checkpointer.h and use
> that instead?

Well, so far I have not understood what the whole point of the shared state
is, so i have a bit of a hard time answering this ;)


> >> +/*
> >> + * Remove all mappings not needed anymore based on the logical restart LSN saved
> >> + * by the checkpointer.  We use this saved value instead of calling
> >> + * ReplicationSlotsComputeLogicalRestartLSN() so that we don't interfere with an
> >> + * ongoing call to CheckPointLogicalRewriteHeap() that is flushing mappings to
> >> + * disk.
> >> + */
> > 
> > What interference could there be?
> 
> My concern is that the custodian could obtain a later cutoff than what the
> checkpointer does, which might cause files to be concurrently unlinked and
> fsync'd.  If we always use the checkpointer's cutoff, that shouldn't be a
> problem.  This could probably be better explained in this comment.

How about having a Datum argument to RequestCustodian() that is forwarded to
the task?


> >> +void
> >> +RemoveOldLogicalRewriteMappings(void)
> >> +{
> >> +    XLogRecPtr    cutoff;
> >> +    DIR           *mappings_dir;
> >> +    struct dirent *mapping_de;
> >> +    char        path[MAXPGPATH + 20];
> >> +    bool        value_set = false;
> >> +
> >> +    cutoff = CustodianGetLogicalRewriteCutoff(&value_set);
> >> +    if (!value_set)
> >> +        return;
> > 
> > Afaics nothing clears values_set - is that a good idea?
> 
> I'm using value_set to differentiate the case where InvalidXLogRecPtr means
> the checkpointer hasn't determined a value yet versus the case where it
> has.  In the former, we don't want to take any action.  In the latter, we
> want to unlink all the files.  Since we're moving to a request model for
> the custodian, I might be able to remove this value_set stuff completely.
> If that's not possible, it probably deserves a better comment.

It would.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: replacing role-level NOINHERIT with a grant-level option
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 15beta1 tab completion of extension versions