BGWorkers, shared memory pointers, and postmaster restart

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема BGWorkers, shared memory pointers, and postmaster restart
Дата
Msg-id 534E250F.2060705@2ndquadrant.com
обсуждение исходный текст
Ответы Re: BGWorkers, shared memory pointers, and postmaster restart  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
Hi all

I've been using the dynamic BGWorker support for some recent work, and I
think I've found an issue with how postmaster restarts are handled.

TL;DR: I don't think there's a safe way to use a BGWorker (static or
dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
Datum that points into shared memory, and think we might need a API
change to fix that.



If the postmaster restarts due to a crash, it calls shared memory
startup hooks and sets the found pointer to false, so they're re-inited.
This makes sense as shm is presumed to be corrupt.

Dynamic BGWorkers are killed as a part of restart. However, they're not
unregistered, and are relaunched if there's a bgw_restart_time set.
They're called with the same bgw_main_arg Datum as with the original launch.

If this Datum is a pointer into the shared memory that just got zeroed
and re-inited, exciting things happen.

In my case the worker restart generally happens after shm is re-inited,
and since it gets reinitialized with the same contents the dynamic
bgworker registered during postmaster startup restarts normally. I only
noticed the problem because my shared memory init hook launches
bgworkers once shm is set up, and I was getting two copies of a bgworker
after postmaster restart.

This also affects static bgworkers that use a pointer into shared memory
to pass data on EXEC_BACKEND platforms (Windows).

For dynamic bgworkers, it seems like it'd be OK to just require
extensions to re-register them after a postmaster restart, or add a
BGW_UNREGISTER_ON_POSTMASTER_RESTART flag to let that be controlled so
bgworkers that didn't receive pointers into shm didn't have to deal with
it. So any pointer into shared memory that's being re-initialized would
be discarded when the bgworker was unregistered during postmater
restart. Dynamic bgworkers are new in 9.4, so we still have the freedom
to change behaviour for them.

But that won't fix static bgworkers that have a pointer into shm as an
argument. In non-EXEC_BACKEND platforms we can just pass a pointer to
palloc'd postmaster memory, but that won't work if we have to exec()
after fork(). I don't think it's reasonable to say "well, don't do that"
- if you can only send a single pass-by-value Datum to a bgworker's main
function, their utility is greatly reduced.


I could always set up an exit hook / SIGQUIT handler that tries to
unregister dynamic bgworkers using their BGWorkerHandle s, from the
worker that initially registered them. If the worker that registered
them is the one that crashed and triggered a postmaster restart this
won't do any good, so it's a half-solution at best.

I can't stash the BGWorkerHandle s for the launched workers in shm and
unregister them during postmaster restart either. For one thing, shm is
presumed corrupt. For another, BGWorkerHandle is an incomplete struct
with no exposed size, so I can't copy it into shm anyway.

This seems like a bit of a pickle. Am I missing something obvious?

The only way around it that I can currently see is to have a single
static bgworker with no pointer argument launch and manage all the other
workers required for the extension. It would launch them all with
bgw_restart_time = BGW_NEVER_RESTART and set its self as the
bgw_notify_pid . If they die, it unregisters them then registers a new
one. Effectively, it's doing the work the bgworker code does already,
except that it makes sure the bgworkers don't get relaunched on
postmaster restart. Since it doesn't depend on shm being valid, this
should work, but it's messy and seems like there should be a better way.

Do we need to change how we define BGWorkers to require that a BGWorker
with a Datum that points to shared memory be automatically unregistered
by the postmaster on restart? This would have to apply to static
BGWorkers too, so we'd want to think about adding a flag for it and
maybe even backpatching the flag into 9.3; it'd only affect extensions
that actually used the combination described above.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Create function prototype as part of PG_FUNCTION_INFO_V1
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?