Re: Parallel Seq Scan

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Parallel Seq Scan
Дата	7 июля 2015 г. 03:58:06
Msg-id	CAA4eK1KEX4qtoSmWiw4kxpsY5nMa3gSOngAdHzvSYm311FQ4eg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Seq Scan (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Parallel Seq Scan
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 6, 2015 at 10:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote:
>
> > Or the other way to look at it could be separate out fields which are
> > required for parallel scan which is done currently by forming a
> > separate structure ParallelHeapScanDescData.
> >
> I was suggesting that you separate out both the normal scan fields and
> the partial scan fields, that way we're sure that rs_nblocks is not
> accessed during a parallel scan.
>

In patch rs_nblocks is used in paratial scan's as well, only the

way to initialize is changed.

> Or, you could try wrapping the parts of heapam.c that are affected by
> parallelism into new static functions.
>

Sounds sensible to me, but I would like to hear from Robert before

making this change, if he has any different opinions about this point, as

he has originally written this part of the patch.

> > The reason why partial scan can't be mixed with sync scan is that in
> > parallel
> > scan, it performs the scan of heap by synchronizing blocks (each
> > parallel worker
> > scans a block and then asks for a next block to scan) among parallel
> > workers.
> > Now if we try to make sync scans work along with it, the
> > synchronization among
> > parallel workers will go for a toss. It might not be impossible to
> > make that
> > work in some way, but not sure if it is important enough for sync
> > scans to work
> > along with parallel scan.
>
> I haven't tested it, but I think it would still be helpful. The block
> accesses are still in order even during a partial scan, so why wouldn't
> it help?
>
> You might be concerned about the reporting of a block location, which
> would become more noisy with increased parallelism. But in my original
> testing, sync scans weren't very sensitive to slight deviations, because
> of caching effects.
>

I am not sure how many blocks difference could be considered okay for

deviation?

In theory, making parallel scan perform sync scan could lead to difference

of multiple blocks, consider the case where there are 32 or more workers

participating in scan and each got one block to scan, it is possible that

first worker performs scan of 1st block after 32nd worker performs the

scan of 32nd block (it could lead to even bigger differences).

> > tqueue.c is mainly designed to pass tuples between parallel workers
> > and currently it is used in Funnel operator to gather the tuples
> > generated
> > by all the parallel workers. I think we can use it for any other
> > operator
> > which needs tuple communication among parallel workers.
>
> Some specifics of the Funnel operator seem to be a part of tqueue, which
> doesn't make sense to me. For instance, reading from the set of queues
> in a round-robin fashion is part of the Funnel algorithm, and doesn't
> seem suitable for a generic tuple communication mechanism (that would
> never allow order-sensitive reading, for example).
>

Okay, this makes sense to me, I think it is better to move Funnel

operator specific parts out of tqueue.c unless Robert or anybody else

feels otherwise.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Seq Scan