Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id CA+TgmoZA0ceYKPwqPunJB+yofaPag9=3d01g9p7Z1q9RAuG2gA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Список pgsql-hackers
On Mon, Jan 22, 2018 at 10:13 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> _bt_leader_heapscan() can detect when workers exit early, at least in
> the vast majority of cases. It can do this simply by processing
> interrupts and automatically propagating any error -- nothing special
> about that. It can also detect when workers have finished
> successfully, because of course, that's the main reason for its
> existence. What remains, exactly?

As Amit says, what remains is the case where fork() fails or the
worker dies before it reaches the line in ParallelWorkerMain that
reads shm_mq_set_sender(mq, MyProc).  In those cases, no error will be
signaled until you call WaitForParallelWorkersToFinish().  If you wait
prior to that point for a number of workers equal to
nworkers_launched, you will wait forever in those cases.

I am going to repeat my previous suggest that we use a Barrier here.
Given the discussion subsequent to my original proposal, this can be a
lot simpler than what I suggested originally.  Each worker does
BarrierAttach() before beginning to read tuples (exiting if the phase
returned is non-zero) and BarrierArriveAndDetach() when it's done
sorting.  The leader does BarrierAttach() before launching workers and
BarrierArriveAndWait() when it's done sorting.  If we don't do this,
we're going to have to invent some other mechanism to count the
participants that actually initialize successfully, but that seems
like it's just duplicating code.

This proposal has some minor advantages even when no fork() failure or
similar occurs.  If, for example, one or more workers take a long time
to start, the leader doesn't have to wait for them before writing out
the index.  As soon as all the workers that attached to the Barrier
have arrived at the end of phase 0, the leader can build a new tape
set from all of the tapes that exist at that time.  It does not need
to wait for the remaining workers to start up and create empty tapes.
This is only a minor advantage since we probably shouldn't be doing
CREATE INDEX in parallel in the first place if the index build is so
short that this scenario is likely to occur, but we get it basically
for free, so why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mithun Cy
Дата:
Сообщение: Possible performance regression in version 10.1 with pgbenchread-write tests.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOASTtable does not exist