Re: [DESIGN] ParallelAppend

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [DESIGN] ParallelAppend
Дата
Msg-id CAKJS1f_85iL7RFnKAbivtwgtqABLdHS_QCzRd+UmKoVU7UR2bQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [DESIGN] ParallelAppend  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [DESIGN] ParallelAppend  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers

On 27 July 2015 at 21:09, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, can I ask some questions?

I suppose we can take this as the analog of ParalleSeqScan.  I
can see not so distinction between Append(ParalleSeqScan) and
ParallelAppend(SeqScan). What difference is there between them?

If other nodes will have the same functionality as you mention at
the last of this proposal, it might be better that some part of
this feature is implemented as a part of existing executor
itself, but not as a deidicated additional node, just as my
asynchronous fdw execution patch patially does. (Although it
lacks planner part and bg worker launching..) If that is the
case, it might be better that ExecProcNode is modified so that it
supports both in-process and inter-bgworker cases by the single
API.

What do you think about this?

I have to say that I really like the thought of us having parallel enabled stuff in Postgres, but I also have to say that I don't think inventing all these special parallel node types is a good idea. If we think about everything that we can parallelise...

Perhaps.... sort, hash join, seqscan, hash, bitmap heap scan, nested loop. I don't want to debate that, but perhaps there's more, perhaps less.
Are we really going to duplicate all of the code and add in the parallel stuff as new node types?

My other concern here is that I seldom hear people talk about the planner's architectural lack of ability to make a good choice about how many parallel workers to choose. Surely to properly calculate costs you need to know the exact number of parallel workers that will be available at execution time, but you need to know this at planning time!? I can't see how this works, apart from just being very conservative about parallel workers, which I think is really bad, as many databases have busy times in the day, and also quiet times, generally quiet time is when large batch stuff gets done, and that's the time that parallel stuff is likely most useful. Remember queries are not always planned just before they're executed. We could have a PREPAREd query, or we could have better plan caching in the future, or if we build some intelligence into the planner to choose a good number of workers based on the current server load, then what's to say that the server will be under this load at exec time? If we plan during a quiet time, and exec in a busy time all hell may break loose.

I really do think that existing nodes should just be initialized in a parallel mode, and each node type can have a function to state if it supports parallelism or not. 

I'd really like to hear more opinions in the ideas I discussed here: 


This design makes use of the Funnel node that Amit has already made and allows more than 1 node to be executed in parallel at once.

It appears that parallel enabling the executor node by node is fundamentally locked into just 1 node being executed in parallel, then perhaps a Funnel node gathering up the parallel worker buffers and streaming those back in serial mode. I believe by design, this does not permit a whole plan branch from executing in parallel and I really feel like doing things this way is going to be very hard to undo and improve later. I might be too stupid to figure it out, but how would parallel hash join work if it can't gather tuples from the inner and outer nodes in parallel?

Sorry for the rant, but I just feel like we're painting ourselves into a corner by parallel enabling the executor node by node.
Apologies if I've completely misunderstood things.

Regards 

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Typo in a comment in set_foreignscan_references
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: A little RLS oversight?