Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel Seq Scan
Дата
Msg-id CAA4eK1KEX4qtoSmWiw4kxpsY5nMa3gSOngAdHzvSYm311FQ4eg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Parallel Seq Scan  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Mon, Jul 6, 2015 at 10:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote:
>
> > Or the other way to look at it could be separate out fields which are
> > required for parallel scan which is done currently by forming a
> > separate structure ParallelHeapScanDescData.
> >
> I was suggesting that you separate out both the normal scan fields and
> the partial scan fields, that way we're sure that rs_nblocks is not
> accessed during a parallel scan.
>

In patch rs_nblocks is used in paratial scan's as well, only the
way to initialize is changed.

> Or, you could try wrapping the parts of heapam.c that are affected by
> parallelism into new static functions.
>

Sounds sensible to me, but I would like to hear from Robert before
making this change, if he has any different opinions about this point, as
he has originally written this part of the patch.

> > The reason why partial scan can't be mixed with sync scan is that in
> > parallel
> > scan, it performs the scan of heap by synchronizing blocks (each
> > parallel worker
> > scans a block and then asks for a next block to scan) among parallel
> > workers.
> > Now if we try to make sync scans work along with it, the
> > synchronization among
> > parallel workers will go for a toss.  It might not be impossible to
> > make that
> > work in some way, but not sure if it is important enough for sync
> > scans to work
> > along with parallel scan.
>
> I haven't tested it, but I think it would still be helpful. The block
> accesses are still in order even during a partial scan, so why wouldn't
> it help?
>
> You might be concerned about the reporting of a block location, which
> would become more noisy with increased parallelism. But in my original
> testing, sync scans weren't very sensitive to slight deviations, because
> of caching effects.
>

I am not sure how many blocks difference could be considered okay for
deviation?
In theory, making parallel scan perform sync scan could lead to difference
of multiple blocks, consider the case where there are 32 or more workers
participating in scan and each got one block to scan, it is possible that
first worker performs scan of 1st block after 32nd worker performs the
scan of 32nd block (it could lead to even bigger differences).


> > tqueue.c is mainly designed to pass tuples between parallel workers
> > and currently it is used in Funnel operator to gather the tuples
> > generated
> > by all the parallel workers.  I think we can use it for any other
> > operator
> > which needs tuple communication among parallel workers.
>
> Some specifics of the Funnel operator seem to be a part of tqueue, which
> doesn't make sense to me. For instance, reading from the set of queues
> in a round-robin fashion is part of the Funnel algorithm, and doesn't
> seem suitable for a generic tuple communication mechanism (that would
> never allow order-sensitive reading, for example).
>

Okay, this makes sense to me, I think it is better to move Funnel
operator specific parts out of tqueue.c unless Robert or anybody else
feels otherwise.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Support for N synchronous standby servers - take 2
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Parallel Seq Scan