Re: dynamic background workers, round two

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: dynamic background workers, round two
Дата
Msg-id 20130724213406.GC10784@alap2.anarazel.de
обсуждение исходный текст
Ответ на dynamic background workers, round two  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: dynamic background workers, round two  (Michael Paquier <michael.paquier@gmail.com>)
Re: dynamic background workers, round two  (Michael Paquier <michael.paquier@gmail.com>)
Re: dynamic background workers, round two  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2013-07-24 12:46:21 -0400, Robert Haas wrote:
> The attached patch attempts to remedy this problem.  When you register
> a background worker, you can obtain a "handle" that can subsequently
> be used to query for the worker's PID.  If you additionally initialize
> bgw_notify_pid = MyProcPid, then the postmaster will send you SIGUSR1
> when worker startup has been attempted (successfully or not).  You can
> wait for this signal by passing your handle to
> WaitForBackgroundWorkerStartup(), which will return only when either
> (1) an attempt has been made to start the worker or (2) the postmaster
> is determined to have died.  This interface seems adequate for
> something like worker_spi, where it's useful to know whether the child
> was started or not (and return the PID if so) but that's about all
> that's really needed.

This seems like a sensible idea to me. But, in the context of dynamic
query, don't we also need the reverse infrastructure of notifying a
bgworker that the client, that requested it to be started, has died?
Ending up with a dozen bgworkers running because the normal backend
FATALed doesn't seem nice.

> More complex notification interfaces can also be coded using the
> primitives introduced by this patch.  Setting bgw_notify_pid will
> cause the postmaster to send SIGUSR1 every time it starts the worker
> and every time the worker dies.  Every time you receive that signal,
> you can call GetBackgroundWorkerPid() for each background worker to
> find out which ones have started or terminated.  The only slight
> difficulty is in waiting for receipt of that signal, which I
> implemented by adding a new global called set_latch_on_sigusr1.
> WaitForBackgroundWorkerStartup() uses that flag internally, but
> there's nothing to prevent a caller from using it as part of their own
> event loop.  I find the set_latch_on_sigusr1 flag  to be slightly
> ugly, but it seems better and far safer than having the postmaster try
> to actually set the latch itself - for example, it'd obviously be
> unacceptable to pass a Latch * to the postmaster and have the
> postmaster assume that pointer valid.

I am with you with finding it somewhat ugly.


> --- a/contrib/worker_spi/worker_spi.c
> +++ b/contrib/worker_spi/worker_spi.c

Btw, I've posted a minimal regression test for bworkers/worker_spi in
20130724175742.GD10713@alap2.anarazel.de - I'd like to see some coverage
of this...

> +    int         bgw_notify_pid;

It really should be pid_t even though we're not consistently doing that
in the code.



> diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
> index f7ebd1a..6414291 100644
> --- a/src/backend/commands/async.c
> +++ b/src/backend/commands/async.c
> @@ -207,8 +207,6 @@ typedef struct QueueBackendStatus
>      QueuePosition pos;            /* backend has read queue up to here */
>  } QueueBackendStatus;
>  

> -#define InvalidPid                (-1)
> -

> diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> index edced29..0aa540a 100644
> --- a/src/include/miscadmin.h
> +++ b/src/include/miscadmin.h
> @@ -28,6 +28,8 @@
>  
>  #define PG_BACKEND_VERSIONSTR "postgres (PostgreSQL) " PG_VERSION "\n"
>  
> +#define InvalidPid                (-1)
> +

That value strikes me as a rather bad idea. As a pid that represents
init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd
rather not spread that outside async.c.

> +/*
> + * Report the PID of a newly-launched background worker in shared memory.
> + *
> + * This function should only be called from the postmaster.
> + */
> +void
> +ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
> +{
> +    BackgroundWorkerSlot *slot;
> +
> +    Assert(rw->rw_shmem_slot < max_worker_processes);
> +    slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
> +    slot->pid = rw->rw_pid;
> +
> +    if (rw->rw_worker.bgw_notify_pid != 0)
> +        kill(rw->rw_worker.bgw_notify_pid, SIGUSR1);
> +}

Any reason not to use SendProcSignal() properly here? Just that you
don't know the BackendId? I seems unclean to reuse SIGUSR1 without using
the infrastructure built for that.

I first thought you didn't want to do it because you it touches shmem,
but given that ReportBackgroundWorkerPID() does so itself...

> + * If handle != NULL, we'll set *handle to a pointer that can subsequently
> + * be used as an argument to GetBackgroundWorkerPid().  The caller can
> + * free this pointer using pfree(), if desired.
>   */
>  bool
> -RegisterDynamicBackgroundWorker(BackgroundWorker *worker)
> +RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
> +
> BackgroundWorkerHandle **handle)

I'd just let the caller provide the memory, but whatever ;)

> +/*
> + * Wait for a background worker to start up.
> + *
> + * The return value is the same as for GetBackgroundWorkerPID, except that
> + * we only return InvalidPid if the postmaster has died (and therefore we
> + * know that the worker will never be started).
> + */
> +int
> +WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle)
> +{
> +    int        pid;
> +    int        rc;
> +
> +    set_latch_on_sigusr1 = true;
> +
> +    PG_TRY();
> +    {

None of the containing code should be allowed to raise signals
afaics. Was there anything specific you were thinking of? Or just
defense against leaking set_latch_on_sigusr1 = true?

> +        for (;;)
> +        {
> +            pid = GetBackgroundWorkerPid(handle);
> +            if (pid != InvalidPid)
> +                break;
> +            rc = WaitLatch(&MyProc->procLatch,
> +                           WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
> +            if (rc & WL_POSTMASTER_DEATH)
> +                return InvalidPid;

This basically ignores postmaster death. With unix_latch.c that
shouldn't be a problem because afaics it's implementation correctly
should report an error in the next WaitLatch without waiting as well. I
am not sure about the win32 implementation, is that ok there as well?

Also, shouldn't we somewhere check for interrupts in the loop? This
looks like it would run forever if we start to wait while postmaster is
about to shut down, because afair postmaster won't start new children in
that case. But we won't notice that we're asked to get shut down.

The usual and documented pattern for latches is to do* for (;;)* {*       ResetLatch();*       if (work to do)*
 Do Stuff();*       WaitLatch();* }
 
any reason for differing here? I don't really see a problem with the
pattern used here, but ...

> +    set_latch_on_sigusr1 = false;

So, the set_latch_on_sigusr1 logical exists so MyProc->procLatch doesn't
get SetLatch'ed unless we're in +WaitForBackgroundWorkerStartup()?

> +/*
> + * When a backend asks to be notified about worker state changes, we
> + * set a flag in its backend entry.  The background worker machinery needs
> + * to know when such backends exit.
> + */
> +bool
> +PostmasterMarkPIDForWorkerNotify(int pid)
> +{
> +    dlist_iter    iter;
> +    Backend    *bp;
> +
> +    dlist_foreach(iter, &BackendList)
> +    {
> +        bp = dlist_container(Backend, elem, iter.cur);
> +        if (bp->pid == pid)
> +        {
> +            bp->bgworker_notify = true;
> +            return true;
> +        }
> +    }
> +    return false;
> +}

Maybe we should have MyBackend?

Btw, you seem to want to support this in bgworkers started by a
bgworker. That's not going to work without some changes if the
"intermediate" bgworker is one without a backend since those don't use
procsignal_sigusr1_handler.

So, that's the first crop of things that came to mind. I'll sleep over
it, but so far I like it generally ;)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: ilist.h is not useful as-is
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Review: UNNEST (and other functions) WITH ORDINALITY