Re: Refactoring backend fork+exec code

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Refactoring backend fork+exec code
Дата
Msg-id 20240214213711.2gjlezqll37mg7lg@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Refactoring backend fork+exec code  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Refactoring backend fork+exec code  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2024-02-08 13:19:53 +0200, Heikki Linnakangas wrote:
> > > -    /*
> > > -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> > > -     * have a BackendId, the slot is statically allocated based on the
> > > -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
> > > -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> > > -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
> > > -     *
> > > -     * This will need rethinking if we ever want more than one of a particular
> > > -     * auxiliary process type.
> > > -     */
> > > -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> > > +    ProcSignalInit();
> >
> > Now that we don't need the offset here, we could move ProcSignalInit() into
> > BsaeInit() I think?
>
> Hmm, doesn't feel right to me. BaseInit() is mostly concerned with setting
> up backend-private structures, and it's also called for a standalone
> backend.

It already initializes a lot of shared subsystems (pgstat, replication slots
and arguable things like the buffer pool, temporary file access and WAL). And
note that it already requires that MyProc is already set (but it's not yet
"added" to the procarray, i.e. doesn't do visibility stuff at that stage).

I don't think that BaseInit() being called by standalone backends really poses
a problem? So is InitPostgres(), which does call ProcSignalInit() in
standalone processes.

My mental model is that BaseInit() is for stuff that's shared between
processes that do attach to databases and those that don't. Right now the
initialization flow is something like this ascii diagram:

standalone:     \
    /-> StartupXLOG() \
 
                 -> InitProcess()      -\                 /-> ProcArrayAdd() -> SharedInvalBackendInit() ->
ProcSignalInit()-                  -> pgstat_beinit() -> attach to db -> pgstat_bestart()
 
normal backend: /                        \               /
                                          -> BaseInit() -
aux process:    InitAuxiliaryProcess()  -/               \--                                             ->
ProcSignalInit()                   -> pgstat_beinit()                 -> pgstat_bestart()
 


The only reason ProcSignalInit() happens kinda late is that historically we
used BackendIds as the index, which were only assigned in
SharedInvalBackendInit() for normal processes. But that doesn't make sense
anymore after your changes.

Similarly, we do pgstat_beinit() quite late, but that's again only because it
uses MyBackendId, which today is only assigned during
SharedInvalBackendInit().  I don't think we can do pgstat_bestart() earlier
though, which is a shame, given the four calls to it inside InitPostgres().


> I feel the process initialization codepaths could use some cleanup in
> general. Not sure what exactly.

Very much agreed.


> > > +/*
> > > + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
> > > + *
> > > + * The result may be out of date arbitrarily quickly, so the caller
> > > + * must be careful about how this information is used.  NULL is
> > > + * returned if the backend is not active.
> > > + */
> > > +PGPROC *
> > > +BackendIdGetProc(int backendID)
> > > +{
> > > +    PGPROC       *result;
> > > +
> > > +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> > > +        return NULL;
> >
> > Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
> > about being out of date or such.
>
> Perhaps. I just followed the example of the old implementation, which also
> returns NULL on bogus inputs.

Fair enough. Makes it harder to not notice bugs, but that's not on this patchset to fix...



> I think the last remaining question here is about the 0- vs 1-based indexing
> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
> it, should we reserve PGPROC 0. I'm on the fence on this one.

I lean towards it being a good idea. Having two internal indexing schemes was
bad enough so far, but at least one would fairly quickly notice if one used
the wrong one. If they're just offset by 1, it might end up taking longer,
because that'll often also be a valid id.  But I think you have the author's
prerogative on this one.

If we do so, I think it might be better to standardize on MyProcNumber instead
of MyBackendId. That'll force looking at code where indexing shifts by 1 - and
it also seems more descriptive, as inside postgres it's imo clearer what a
"proc number" is than what a "backend id" is. Particularly because the latter
is also used for things that aren't backends...


The only exception are SQL level users, for those I think it might make sense
to keep the current 1 based indexing, there's just a few functions where we'd
need to translate.



> @@ -791,6 +792,7 @@ ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
>  static void
>  ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
>  {
> +    int            pgprocno = GetNumberFromPGProc(proc);
>      PROC_HDR   *procglobal = ProcGlobal;
>      uint32        nextidx;
>      uint32        wakeidx;

This one is the only one where I could see the additional math done in
GetNumberFromPGProc() hurting.  Which is somewhat silly, because the proc
passed in is always MyProc.  In the most unrealistic workload imaginable (many
backends doing nothing but assigning xids and committing, server-side), it
indeed seems to make a tiny difference. But not enough to worry about, I think.

FWIW, if I use GetNumberFromPGProc(MyProc) instead of MyProcNumber in
LWLockQueueSelf(), that does show up a bit more noticeable.


>  void
> -ProcSignalInit(int pss_idx)
> +ProcSignalInit(void)
>  {
>      ProcSignalSlot *slot;
>      uint64        barrier_generation;
>
> -    Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
> -
> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
> +    if (MyBackendId <= 0)
> +        elog(ERROR, "MyBackendId not set");
> +    if (MyBackendId > NumProcSignalSlots)
> +        elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots);
> +    slot = &ProcSignal->psh_slot[MyBackendId - 1];
>
>      /* sanity check */
>      if (slot->pss_pid != 0)
>          elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
> -             MyProcPid, pss_idx);
> +             MyProcPid, (int) (slot - ProcSignal->psh_slot));

Hm, why not use MyBackendId - 1 as above? Am I missing something?


>  /*
> @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx)
>  static void
>  CleanupProcSignalState(int status, Datum arg)
>  {
> -    int            pss_idx = DatumGetInt32(arg);
> -    ProcSignalSlot *slot;
> -
> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
> -    Assert(slot == MyProcSignalSlot);
> +    ProcSignalSlot *slot = MyProcSignalSlot;

Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was
checked via the assertion above.


> +            if (i != segP->numProcs - 1)
> +                segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1];
> +            break;

Hm. This means the list will be out-of-order more and more over time, leading
to less cache efficient access patterns. Perhaps we should keep this sorted,
like we do for ProcGlobal->xids etc?


> @@ -148,19 +148,11 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>      PGPROC       *proc;
>      BackendId    backendId = InvalidBackendId;
>
> -    proc = BackendPidGetProc(pid);
> -
>      /*
>       * See if the process with given pid is a backend or an auxiliary process.
> -     *
> -     * If the given process is a backend, use its backend id in
> -     * SendProcSignal() later to speed up the operation. Otherwise, don't do
> -     * that because auxiliary processes (except the startup process) don't
> -     * have a valid backend id.
>       */
> -    if (proc != NULL)
> -        backendId = proc->backendId;
> -    else
> +    proc = BackendPidGetProc(pid);
> +    if (proc == NULL)
>          proc = AuxiliaryPidGetProc(pid);
>
>      /*
> @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>          PG_RETURN_BOOL(false);
>      }
>
> +    if (proc != NULL)
> +        backendId = GetBackendIdFromPGProc(proc);

How can proc be NULL here?



Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: index prefetching
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: index prefetching