Re: [HACKERS] parallel.c oblivion of worker-startup failures

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] parallel.c oblivion of worker-startup failures
Дата
Msg-id CAA4eK1JYZeiA5g4ciZtRT3=73gt3O+hgMk3e6dwTBUhaZcDGBA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Dec 14, 2017 at 3:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 13, 2017 at 1:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>> This also doesn't appear to be completely safe.  If we add
>> proc_exit(1) after attaching to error queue (say after
>> pq_set_parallel_master) in the worker, then it will lead to *hang* as
>> anyone_alive will still be false and as it will find that the sender
>> is set for the error queue, it won't return any failure.  Now, I think
>> even if we check worker status (which will be stopped) and break after
>> the new error condition, it won't work as it will still return zero
>> rows in the case reported by you above.
>
> Hmm, there might still be a problem there.  I was thinking that once
> the leader attaches to the queue, we can rely on the leader reaching
> "ERROR: lost connection to parallel worker" in HandleParallelMessages.
> However, that might not work because nothing sets
> ParallelMessagePending in that case.  The worker will have detached
> the queue via shm_mq_detach_callback, but the leader will only
> discover the detach if it actually tries to read from that queue.
>

I think it would have been much easier to fix this problem if we would
have some way to differentiate whether the worker has stopped
gracefully or not.  Do you think it makes sense to introduce such a
state in the background worker machinery?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Basebackups reported as idle
Следующее
От: Amit Langote
Дата:
Сообщение: non-bulk inserts and tuple routing