Re: Parallel query vs smart shutdown and Postmaster death

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Parallel query vs smart shutdown and Postmaster death
Дата
Msg-id CA+hUKG+7m3hvhf8LBRftnpX6FmYa282XtgFhjw=aLd1RBxi-Bg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel query vs smart shutdown and Postmaster death  (Arseny Sher <a.sher@postgrespro.ru>)
Ответы Re: Parallel query vs smart shutdown and Postmaster death
Список pgsql-hackers
On Sun, Mar 17, 2019 at 5:53 PM Arseny Sher <a.sher@postgrespro.ru> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>
> > 1.  Why does pmdie()'s SIGTERM case terminate parallel workers
> > immediately?  That breaks aborts running parallel queries, so they
> > don't get to end their work normally.
> > 2.  Why are new parallel workers not allowed to be started while in
> > this state?  That hangs future parallel queries forever, so they don't
> > get to end their work normally.
> > 3.  Suppose we fix the above cases; should we do it for parallel
> > workers only (somehow), or for all bgworkers?  It's hard to say since
> > I don't know what all bgworkers do.
>
> Attached patch fixes 1 and 2. As for the 3, the only other internal
> bgworkers currently are logical replication launcher and workers, which
> arguably should be killed immediately.

Hi Arseny,

Thanks for working on this!  Yes, it seems better to move forwards
rather than backwards, and fix this properly as you have done in this
patch.

Just a thought: instead of the new hand-coded loop you added in
pmdie(), do you think it would make sense to have a new argument
"exclude_class_mask" for SignalSomeChildren()?  If we did that, I
would consider renaming the existing parameter "target" to
"include_type_mask" to make it super clear what's going on.

> > Here's an initial sketch of a
> > patch like that: it gives you "ERROR:  parallel worker failed to
> > initialize" and "HINT:  More details may be available in the server
> > log." if you try to run a parallel query.
>
> Reporting bgworkers that postmaster is never going to start is probably
> worthwhile doing anyway to prevent potential further deadlocks like
> this. However, I think there is a problem in your patch: we might be in
> post PM_RUN states due to FatalError, not because of shutdown. In this
> case, we shouldn't refuse to run bgws in the future. I would also merge
> the check into bgworker_should_start_now.

Hmm, yeah, I haven't tested or double-checked in detail yet but I
think you're probably right about all of that.

-- 
Thomas Munro
https://enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Iwata, Aya"
Дата:
Сообщение: RE: [PATCH] get rid of StdRdOptions, use individual binaryreloptions representation for each relation kind instead
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Offline enabling/disabling of data checksums