Re: Parallel Seq Scan

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Parallel Seq Scan
Дата	1 июля 2015 г. 05:37:32
Msg-id	CAA4eK1+X0ecytADODfDWWOB2=2UBiq0O_2fN_v3M-XndiirMCg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Seq Scan (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Parallel Seq Scan Re: Parallel Seq Scan Re: Parallel Seq Scan
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jun 30, 2015 at 4:00 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>
> [Jumping in without catching up on entire thread.

No problem.

> Please let me know
> if these questions have already been covered.]
>
> 1. Can you change the name to something like ParallelHeapScan?
> Parallel Sequential is a contradiction. (I know this is bikeshedding
> and I won't protest further if you keep the name.)
>

For what you are asking to change name for?

We have two nodes in patch (Funnel and PartialSeqScan). Funnel is

the name given to node because it is quite generic and can be

used in multiple ways (other than plain parallel sequiantial scan)

and other node is named as PartialSeqScan because it is used

for doing the part of sequence scan.

> 2. Where is the speedup coming from? How much of it is CPU and IO
> overlapping (i.e. not leaving disk or CPU idle while the other is
> working), and how much from the CPU parallelism? I know this is
> difficult to answer rigorously, but it would be nice to have some
> breakdown even if for a specific machine.
>

Yes, you are right and we have done quite some testing (on the hardware

available) with this patch (with different approaches) to see how much

difference it creates for IO and CPU, with respect to IO we have found

that it doesn't help much [1], though it helps when the data is cached

and there are really good benefits in terms of CPU [2].

In terms of completeness, I think we should add some documentation

for this patch, one way is to update about the execution mechanism in

src/backend/access/transam/README.parallel and then explain about

new configuration knobs in documentation (.sgml files). Also we

can have a separate page in itself in documentation under Server

Programming Section (Parallel Query -> Parallel Scan;

Parallel Scan Examples; ...)

Another thing to think about this patch at this stage do we need to

breakup this patch and if yes, how to break it up into multiple patches,

so that it can be easier to complete the review. I could see that it

can be splitted into 2 or 3 patches.

a. Infrastructure for parallel execution, like some of the stuff in

execparallel.c, heapam.c,tqueue.c, etc and all other generic

(non-nodes specific) code.

b. Nodes (Funnel and PartialSeqScan) specific code for optimiser

and executor.

c. Documentation

Suggestions?

[1] - http://www.postgresql.org/message-id/CAA4eK1JHCmN2X1LjQ4bOmLApt+btOuid5Vqqk5G6dDFV69iyHg@mail.gmail.com

[2] - Refer slides 14-15 for the presentation in PGCon, I can repost the

data here if required.

https://www.pgcon.org/2015/schedule/events/785.en.html

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Seq Scan