Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel Seq Scan
Дата
Msg-id CAA4eK1KxH4MiD77851MBvRr8LaOBfNtRr-4c4XtdzdTx1FHTog@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Tue, Apr 21, 2015 at 2:29 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I've also been thinking about how, instead of having to have a special PartialSeqScan node which contains a bunch of code to store tuples in a shared memory queue, could we not have a "TupleBuffer", or "ParallelTupleReader" node, one of which would always be the root node of a plan branch that's handed off to a worker process. This node would just try to keep it's shared tuple store full, and perhaps once it fills it could have a bit of a sleep and be woken up when there's a bit more space on the queue. When no more tuples were available from the node below this, then the worker could exit. (providing there was no rescan required)

I think between the Funnel node and a ParallelTupleReader we could actually parallelise plans that don't even have parallel safe nodes.... Let me explain:

Let's say we have a 4 way join, and the join order must be {a,b}, {c,d} => {a,b,c,d}, Assuming the cost of joining a to b and c to d are around the same, the Parallelizer may notice this and decide to inject a Funnel and then ParallelTupleReader just below the node for c join d and have c join d in parallel. Meanwhile the main worker process could be executing the root node, as normal. This way the main worker wouldn't have to go to the trouble of joining c to d itself as the worker would have done all that hard work.

I know the current patch is still very early in the evolution of PostgreSQL's parallel query, but how would that work with the current method of selecting which parts of the plan to parallelise?

The Funnel node is quite generic and can handle the case as
described by you if we add Funnel on top of join node (c join d).
It currently passes plannedstmt to worker which can contain any
type of plan (though we need to add some more code to make it
work if want to execute any node other than Result or PartialSeqScan
node.)
 
I really think the plan needs to be a complete plan before it can be best analysed on how to divide the workload between workers, and also, it would be quite useful to know how many workers are going to be able to lend a hand in order to know best how to divide the plan up as evenly as possible.


I think there is some advantage of changing an already built plan
to parallel plan based on resources and there is some literature
about the same, but I think we will loose much more by not considering
parallelism during planning time.  If I remember correctly, then some
of the other databases do tackle this problem of shortage of resources
during execution as mentioned by me upthread, but I think for that it
is not necessary to have a Parallel Planner as a separate layer.
I believe it is important to have some way to handle shortage of resources
during execution, but it can be done at later stage.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: installcheck missing in src/bin/pg_rewind/Makefile
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Logical Decoding follows timelines