Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Parallel Seq Scan
Дата
Msg-id CA+TgmobBZ=0n=JcS28hBxVBaSXeZHBQCnxVzCTUSPMe1zsuGdw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thu, Jan 8, 2015 at 6:42 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Are we sure that in such cases we will consume work_mem during
> execution?  In cases of parallel_workers we are sure to an extent
> that if we reserve the workers then we will use it during execution.
> Nonetheless, I have proceded and integrated the parallel_seq scan
> patch with v0.3 of parallel_mode patch posted by you at below link:
> http://www.postgresql.org/message-id/CA+TgmoYmp_=XcJEhvJZt9P8drBgW-pDpjHxBhZA79+M4o-CZQA@mail.gmail.com

That depends on the costing model.  It makes no sense to do a parallel
sequential scan on a small relation, because the user backend can scan
the whole thing itself faster than the workers can start up.  I
suspect it may also be true that the useful amount of parallelism
increases the larger the relation gets (but maybe not).

> 2. To enable two types of shared memory queue's (error queue and
> tuple queue), we need to ensure that we switch to appropriate queue
> during communication of various messages from parallel worker
> to master backend.  There are two ways to do it
>    a.  Save the information about error queue during startup of parallel
>         worker (ParallelMain()) and then during error, set the same (switch
>         to error queue in errstart() and switch back to tuple queue in
>         errfinish() and errstart() in case errstart() doesn't need to
> propagate
>         error).
>    b.  Do something similar as (a) for tuple queue in printtup or other
> place
>         if any for non-error messages.
> I think approach (a) is slightly better as compare to approach (b) as
> we need to switch many times for tuple queue (for each tuple) and
> there could be multiple places where we need to do the same.  For now,
> I have used approach (a) in Patch which needs some more work if we
> agree on the same.

I don't think you should be "switching" queues.  The tuples should be
sent to the tuple queue, and errors and notices to the error queue.

> 3. As per current implementation of Parallel_seqscan, it needs to use
> some information from parallel.c which was not exposed, so I have
> exposed the same by moving it to parallel.h.  Information that is required
> is as follows:
> ParallelWorkerNumber, FixedParallelState and shm keys -
>     This is used to decide the blocks that needs to be scanned.
>     We might change it in future the way parallel scan/work distribution
>     is done, but I don't see any harm in exposing this information.

Hmm.  I can see why ParallelWorkerNumber might need to be exposed, but
the other stuff seems like it shouldn't be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Seq Scan