Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Parallel Seq Scan
Дата
Msg-id CA+Tgmoaoj8kf6ft9O1E=T3+XCrRoKr4sWBVfoXdzFaDCH+=M+Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (John Gorman <johngorman2@gmail.com>)
Список pgsql-hackers
On Tue, Jan 13, 2015 at 6:25 AM, John Gorman <johngorman2@gmail.com> wrote:
> One approach that I has worked well for me is to break big jobs into much
> smaller bite size tasks. Each task is small enough to complete quickly.
>
> We add the tasks to a task queue and spawn a generic worker pool which eats
> through the task queue items.
>
> This solves a lot of problems.
>
> - Small to medium jobs can be parallelized efficiently.
> - No need to split big jobs perfectly.
> - We don't get into a situation where we are waiting around for a worker to
> finish chugging through a huge task while the other workers sit idle.
> - Worker memory footprint is tiny so we can afford many of them.
> - Worker pool management is a well known problem.
> - Worker spawn time disappears as a cost factor.
> - The worker pool becomes a shared resource that can be managed and reported
> on and becomes considerably more predictable.

I think this is a good idea, but for now I would like to keep our
goals somewhat more modest: let's see if we can get parallel
sequential scan, and only parallel sequential scan, working and
committed.  Ultimately, I think we may need something like what you're
talking about, because if you have a query with three or six or twelve
different parallelizable operations in it, you want the available CPU
resources to switch between those as their respective needs may
dictate.  You certainly don't want to spawn a separate pool of workers
for each scan.

But I think getting that all working in the first version is probably
harder than what we should attempt.  We have a bunch of problems to
solve here just around parallel sequential scan and the parallel mode
infrastructure: heavyweight locking, prefetching, the cost model, and
so on.  Trying to add to that all of the problems that might attend on
a generic task queueing infrastructure fills me with no small amount
of fear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: OOM on EXPLAIN with lots of nodes
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Typo fix in alter_table.sgml