Re: BGWorkers, shared memory pointers, and postmaster restart

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: BGWorkers, shared memory pointers, and postmaster restart
Дата
Msg-id CA+TgmoZahp2qzKqF_g4jVvjKPgAZ4wMJtG6tKUgQ94PZPd8zhw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BGWorkers, shared memory pointers, and postmaster restart  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Apr 16, 2014 at 8:46 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 04/17/2014 12:16 AM, Robert Haas wrote:
>> On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>>> - A flag like BGW_UNREGISTER_ON_RESTART;
>>
>> I would be OK with this, maybe modulo the name.
>>
>>> - To always unregister dynamic bgws on postmaster shm clear + restart;
>>
>> I don't particularly care for this.  Let's suppose the background
>> worker is a long-running daemon, like a PG equivalent of cron.  In
>> static-background worker land, the admin has to restart the cluster to
>> get this going.  In dynamic-background worker land, he can load it on
>> the fly.  But once he gets it up and running, he wants it to stay up
>> and running, surviving crashes and everything.  That's a big part of
>> the value of having a background worker interface in the first place.
>
> It'd be the job of the extension that provides the bgworker to make sure
> that it gets relaunched on postmaster restart.
>
> I tend to agree though. The problem I'm describing only causes issues
> for extensions that launch dynamic bgworkers during extension startup.
>
> Extensions that launch bgworkers in response to SQL commands and don't
> rely on passing state to the bgworker via shared memory shouldn't have
> to deal with restarting them on postmaster restart.
>
>>
>>> - A way to list bgws, inspect their BackgroundWorker structs and obtain
>>> their handles; or
>>
>> This is certainly a good idea.
>>
>>> - A way to idempotently register a bgw only if it doesn't already exist
>>
>> This is probably a good idea, too.
>
> I think we need _one_ solution for 9.4, and need it soon.
>
> The simplest is probably to flush all dynamic bgworkers. But I think
> you're probably right - that solves the problem discussed here, but is
> likely to make life harder for other use cases.
>
> A last-minute API addition for bgworker listing/inspection might not be
> a great idea - it's too late for it to see much testing and analysis and
> it might introduce bigger API problems than it solves.
>
> Duplicate-free registration might be OK, but there are some questions
> around how we'd handle differing parameters, what should be the
> determinant for uniquenes, whether we should go for idempotency or
> return/raise an error to indicate it already exists, etc. So similar
> issue with doing it at the last minute.
>
> To me, that says "let's add a flag to allow a dynamic bgworker to be
> unregistered on postmaster restart". Seems simple and low risk.
>
> I'll follow up with a proposed patch, then we can spend some quality
> shed time on the flag name ;-)

I think I can live with that.  However, I tend to think that the best
solution here is really "don't put try to pass pointers via the
BackgroundWorker structure, because it doesn't [ expletive ] work."
We've had several instances of that already.  When I added support for
dynamic background workers, I had to add bgw_library_name and
bgw_function_name members to that structure because bgw_main won't
work for extension code unless the library is loaded from
shared_preload_libraries AND either we're not running under
EXEC_BACKEND (i.e. Windows) or the system is kind enough to load the
shared library at the same address in both processes, in which case it
will accidentally fail to fail.  We would have had to give bgw_sighup
and bgw_sigterm the same treatment, but since they weren't really
necessary in the first place we just ripped them out instead.

worker_spi also had a bad case of this disease.  It went to elaborate
lengths to pass a pointer via bgw_main_arg, but the pointer was to
*backend-private memory*, so it was completely broken on EXEC_BACKEND
builds.  I'm not sure whether it actually failed there, or managed to
work just because the background worker backend also ran the _PG_init
hook and managed to accidentally place the same data structure at the
same address. I fixed that as part of introducing the dynamic
background worker facility; now it passes an index instead of a
pointer.

What you're complaining about here is basically another instance of
the same problem.  It's not as bad because the main shared memory
segment never moves or has any memory freed except after a
crash-and-restart cycle, so your suggested plug seems likely to be
adequate.  But it also requires that you can allocate enough space in
shared memory to pass around whatever state you need to carry around,
and space in the main shared memory segment is at a premium; storage
there also can't be freed.  Noah suggested to me a while back that we
might do better to change bgw_main_arg to something like char[64]
rather than Datum, which would allow passing a reasonable-size payload
without having to futz with shared memory.  Then, instead of passing a
pointer, you can pass the name of a shared memory region to look up
and an index into the data structure stored there, or something like
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: four minor proposals for 9.5
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: assertion failure 9.3.4