Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown
Дата	21 августа 2017 г. 06:47:41
Msg-id	CAMsr+YE_nH+c6r16Pe=-u4aqwxE5D-stS_dd_J6KQ56V2hELkg@mail.gmail.com обсуждение исходный текст
Ответ на	[HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown (Craig Ringer <craig@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On 21 August 2017 at 10:57, Craig Ringer <craig@2ndquadrant.com> wrote:

Hi all

I've noticed a possible bug / design limitation where shm_mq_wait_internal sleep in a latch wait forever, and the postmaster gets stuck waiting for the bgworker the wait is running in to exit.

This happens when the shm_mq does not have an associated bgworker handle registered because the other end is not known at mq creation time or is a normal backend not a bgworker. So a BGW handle cannot be passed.

shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any way; it just merrily resets its latch and keeps looping.

It will bail out correctly on SIGQUIT.

If the proc waiting to attach was known at queue creation time and was a bgworker, we'd pass a bgworker handle and the mq would notice it failed to start and stop waiting. There's only a problem if no bgworker handle can be supplied.

The underlying problem is that CHECK_FOR_INTERRUPTS() doesn't care about SIGTERM or have any way to test for it. And we don't have any global management of SIGTERM like we do SIGQUIT so the shm_mq_wait_internal loop can't test for it.

The only ways I can see to fix this are:

* Generalize SIGTERM handling across postgres, so there's a global "got_SIGTERM" global that shm_mq_wait_internal can test to break out of its loop, and every backend's signal handler must set it. Lots of churn.

* In a proc's signal handler, use globals set before entry and after exit from shm_mq operations to detect if we're currently in shm_mq and promote SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so CHECK_FOR_INTERRUPTS() will notice when the handler returns.

* Allow passing of a *bool that tests for SIGTERM, or a function pointer called on each iteration to test whether looping should continue, to be passed to shm_mq_attach. So if you can't supply a bgw handle, you supply that instead. Provide a shm_mq_set_handle equivalent for it too.

Any objections to the last approach?

BTW, you can work around it in extension code for existing versions with something like this in your bgworker:

volatile bool in_shm_mq = false;

void

my_handle_sigterm(SIGNAL_ARGS)

{

...

if (in_shm_mq)

{

* shm_mq can get stuck in shm_mq_wait_internal on SIGTERM; see

* https://www.postgresql.org/message-id/CAMsr+YHmm=01LsuEYR6YdZ8CLGfNK_fgdgi+QXUjF+JeLPvZQg@mail.gmail.com

* To work around this we keep track of whether we're in shmem_mq

* and generate a fake interrupt for CHECK_FOR_INTERRUPTS() to

* process if so.

* The guard around in_shm_mq may not be necessary, but without

* it any SIGTERM will likely cause ereport(FATAL) with

* "terminating connection due to administrator command"

* which isn't ideal.

InterruptPending = true;

ProcDiePending = true;

}

....

}

inline static shm_mq_handle *

myext_shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle *handle)

{

shm_mq_handle *ret;

in_shm_mq = true;

ret = shm_mq_attach(mq, seg, handle);

in_shm_mq = false;

return ret;

}

/* repeated for shm_mq_receive, shm_mq_send, shm_mq_sendv, shm_mq_wait_for_attach */

You can instead use non-blocking sends instead, and sleep on your own latch, doing the same work as shm_mq_wait_internal yourself.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown