Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects
Дата	7 ноября 05:39:42
Msg-id	CA+hUKGJo6hu6GToiXarBRF+AqhFPnzMTW2Nksm0x-+9m2=dskQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects (Bryan Green <dbryan.green@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Nov 7, 2025 at 3:13 AM Bryan Green <dbryan.green@gmail.com> wrote:
> The reason to still do this patch and clean up the handle inheritance
> mess is that there are states (suspended state, infinite loop, spinlock
> hold, whatever) that a process can be in that keeps it from processing
> the event.  We don't need to wait on the children to voluntarily exit
> when postmaster crashes.

Agreed on all points.  We'd recently come to the same conclusion on this thread:

https://www.postgresql.org/message-id/flat/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru

I think there might arguably be a sort of weak forward progress
guarantee in the existing design and it's been a while since we've had
problem reports AFAIR*: locks were releases (which turns out to be
fundamentally unsafe at least while in a critical section as analysed
in that thread, but it does allow progress in blocked backends, so
that they can learn of the postmaster's demise), and no one should
enter WaitEventSet() while holding a spinlock, and infinite loops are
against the law, and it's previously been considered acceptable-ish
that a backend might continue to run a long query until completion
before exiting (without supporting auxiliary or worker backends, which
sounds potentially suspect, but at least you can't wait for another
backend without learning of the PostgreSQL's demise assuming the only
possible waits are LWLocks or latches).  But clearly it's not good
enough.

The fact that Windows backends are born in suspended state until the
postmaster resumes them is indeed a new and significant hole in that
theory.  Preemptive termination is the only thing that makes sense.

*We used to have places that waited but forgot to handle PM exit, and
I don't recall "manual orphan cleanup needed" reports since we
enforced a central handler.  But see also my earlier note about
systemd potentially hiding problems these days, if using "mixed" mode
to SIGKILL the whole cgroup.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects