Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id CAH2-Wzm-Zfz-1aH3vtz35xJmOr+ihcN0mQ+WQJXQ87BGtR7mPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Список pgsql-hackers
On Wed, Jan 17, 2018 at 5:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I could still reproduce it. I think the way you have fixed it has a
> race condition.  In _bt_parallel_scan_and_sort(), the value of
> brokenhotchain is set after you signal the leader that the worker is
> done (by incrementing workersFinished). Now, the leader is free to
> decide based on the current shared state which can give the wrong
> value.  Similarly, I think the value of havedead and reltuples can
> also be wrong.

> You neither seem to have fixed nor responded to the second problem
> mentioned in my email upthread [1].  To reiterate, the problem is that
> we can't assume that the workers we have launched will always start
> and finish. It is possible that postmaster fails to start the worker
> due to fork failure. In such conditions, tuplesort_leader_wait will
> hang indefinitely because it will wait for the workersFinished count
> to become equal to launched workers (+1, if leader participates) which
> will never happen.  Am I missing something due to which this won't be
> a problem?

I think that both problems (the live _bt_parallel_scan_and_sort() bug,
as well as the general issue with needing to account for parallel
worker fork() failure) are likely solvable by not using
tuplesort_leader_wait(), and instead calling
WaitForParallelWorkersToFinish(). Which you suggested already.

Separately, I will need to monitor that bugfix patch, and check its
progress, to make sure that what I add is comparable to what
ultimately gets committed for parallel query.

-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: [PATCH] Logical decoding of TRUNCATE
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] GnuTLS support