Re: [HACKERS] parallel.c oblivion of worker-startup failures

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] parallel.c oblivion of worker-startup failures
Дата
Msg-id CA+TgmoaiSS91k7OigH7nUXye5EiVnJiTN-jS93FX4U_Axiu9tA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Looks good to me.

Committed and back-patched to 9.4 where dynamic background workers
were introduced.

>> Barring objections, I'll commit and
>> back-patch this; then we can deal with the other part of this
>> afterward.
>
> Sure, I will rebase on top of the commit unless you have some comments
> on the remaining part.

I'm not in love with that part of the fix; the other parts of that if
statement are just testing variables, and a function call that takes
and releases an LWLock is a lot more expensive.  Furthermore, that
test will often be hit in practice, because we'll often arrive at this
function before workers have actually finished.  On top of that, we'll
typically arrive here having already communicated with the worker in
some way, such as by receiving tuples, which means that in most cases
we already know that the worker was alive at least at some point, and
therefore the extra test isn't necessary.  We only need that test, if
I understand correctly, to cover the failure-to-launch case, which is
presumably very rare.

All that having been said, I don't quite know how to do it better.  It
would be easy enough to modify this function so that if it ever
received a result other then BGWH_NOT_YET_STARTED for a worker, it
wouldn't call this function again for that worker.  However, that's
only mitigating the problem, not really fixing it.  We'll still end up
frequently inquiring - admittedly only once - about the status of a
worker with which we've previously communicated via the tuple queue.
Maybe we just have to live with that problem, but do you (or does
anyone else) have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Usage of epoch in txid_current
Следующее
От: Andrew Dunstan
Дата:
Сообщение: ALTER TABLE ADD COLUMN fast default