Re: [DESIGN] ParallelAppend

Поиск

Список

Период

Сортировка

От	Kouhei Kaigai
Тема	Re: [DESIGN] ParallelAppend
Дата	28 июля 2015 г. 12:58:31
Msg-id	9A28C8860F777E439AA12E8AEA7694F80111F3FC@BPXM15GP.gisp.nec.co.jp обсуждение исходный текст
Ответ на	Re: [DESIGN] ParallelAppend (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Ответы	Re: [DESIGN] ParallelAppend
Список	pgsql-hackers

Дерево обсуждения

> KaiGai-san,
> 
> On 2015-07-27 PM 11:07, Kouhei Kaigai wrote:
> >
> >   Append
> >    --> Funnel
> >         --> PartialSeqScan on rel1 (num_workers = 4)
> >    --> Funnel
> >         --> PartialSeqScan on rel2 (num_workers = 8)
> >    --> SeqScan on rel3
> >
> >  shall be rewritten to
> >   Funnel
> >     --> PartialSeqScan on rel1 (num_workers = 4)
> >     --> PartialSeqScan on rel2 (num_workers = 8)
> >     --> SeqScan on rel3        (num_workers = 1)
> >
> 
> In the rewritten plan, are respective scans (PartialSeq or Seq) on rel1,
> rel2 and rel3 asynchronous w.r.t each other? Or does each one wait for the
> earlier one to finish? I would think the answer is no because then it
> would not be different from the former case, right? Because the original
> premise seems that (partitions) rel1, rel2, rel3 may be on different
> volumes so parallelism across volumes seems like a goal of parallelizing
> Append.
> 
> From my understanding of parallel seqscan patch, each worker's
> PartialSeqScan asks for a block to scan using a shared parallel heap scan
> descriptor that effectively keeps track of division of work among
> PartialSeqScans in terms of blocks. What if we invent a PartialAppend
> which each worker would run in case of a parallelized Append. It would use
> some kind of shared descriptor to pick a relation (Append member) to scan.
> The shared structure could be the list of subplans including the mutex for
> concurrency. It doesn't sound as effective as proposed
> ParallelHeapScanDescData does for PartialSeqScan but any more granular
> might be complicated. For example, consider (current_relation,
> current_block) pair. If there are more workers than subplans/partitions,
> then multiple workers might start working on the same relation after a
> round-robin assignment of relations (but of course, a later worker would
> start scanning from a later block in the same relation). I imagine that
> might help with parallelism across volumes if that's the case.
>
I initially thought ParallelAppend kicks fixed number of background workers
towards sub-plans, according to the estimated cost on the planning stage.
However, I'm now inclined that background worker picks up an uncompleted
PlannedStmt first. (For more details, please see the reply to Amit Kapila)
It looks like less less-grained worker's job distribution.
Once number of workers gets larger than number of volumes / partitions,
it means more than two workers begin to assign same PartialSeqScan, thus
it takes fine-grained job distribution using shared parallel heap scan.

> MergeAppend
> parallelization might involve a bit more complication but may be feasible
> with a PartialMergeAppend with slightly different kind of coordination
> among workers. What do you think of such an approach?
>
Do we need to have something special in ParallelMergeAppend?
If individual child nodes are designed to return sorted results,
what we have to do seems to me same.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [DESIGN] ParallelAppend