Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Parallel Seq Scan
Дата
Msg-id CA+TgmoZxTeVEHm6p96YMsZtWr6J9dgGoaC_ZTKnzLLvfBH9QEw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Fri, Jan 9, 2015 at 12:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> The parameters sound reasonable but I'm a bit worried about the way
> you're describing the implementation.  Specifically this comment:
>
> "Cost of starting up parallel workers with default value as 1000.0
> multiplied by number of workers decided for scan."
>
> That appears to imply that we'll decide on the number of workers, figure
> out the cost, and then consider "parallel" as one path and
> "not-parallel" as another.  [...]
> I'd really like to be able to set the 'max parallel' high and then have
> the optimizer figure out how many workers should actually be spawned for
> a given query.

+1.

> Yeah, we also need to consider the i/o side of this, which will
> definitely be tricky.  There are i/o systems out there which are faster
> than a single CPU and ones where a single CPU can manage multiple i/o
> channels.  There are also cases where the i/o system handles sequential
> access nearly as fast as random and cases where sequential is much
> faster than random.  Where we can get an idea of that distinction is
> with seq_page_cost vs. random_page_cost as folks running on SSDs tend to
> lower random_page_cost from the default to indicate that.

On my MacOS X system, I've already seen cases where my parallel_count
module runs incredibly slowly some of the time.  I believe that this
is because having multiple workers reading the relation block-by-block
at the same time causes the OS to fail to realize that it needs to do
aggressive readahead.  I suspect we're going to need to account for
this somehow.

> Yeah, I agree that's more typical.  Robert's point that the master
> backend should participate is interesting but, as I recall, it was based
> on the idea that the master could finish faster than the worker- but if
> that's the case then we've planned it out wrong from the beginning.

So, if the workers have been started but aren't keeping up, the master
should do nothing until they produce tuples rather than participating?That doesn't seem right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Seq Scan
Следующее
От: Andreas Karlsson
Дата:
Сообщение: Re: Using 128-bit integers for sum, avg and statistics aggregates