Re: Parallel Seq Scan

Поиск

Список

Период

Сортировка

От	John Gorman
Тема	Re: Parallel Seq Scan
Дата	13 января 2015 г. 14:25:19
Msg-id	CALkS6B_HBPPzSWuUQsS_S=OD-WtkRc9j2C+LubgDqJ05gigrug@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Parallel Seq Scan (John Gorman <johngorman2@gmail.com>) Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sun, Jan 11, 2015 at 6:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jan 11, 2015 at 6:01 AM, Stephen Frost <sfrost@snowman.net> wrote:
> So, for my 2c, I've long expected us to parallelize at the relation-file
> level for these kinds of operations. This goes back to my other
> thoughts on how we should be thinking about parallelizing inbound data
> for bulk data loads but it seems appropriate to consider it here also.
> One of the issues there is that 1G still feels like an awful lot for a
> minimum work size for each worker and it would mean we don't parallelize
> for relations less than that size.

Yes, I think that's a killer objection.

One approach that I has worked well for me is to break big jobs into much smaller bite size tasks. Each task is small enough to complete quickly.

We add the tasks to a task queue and spawn a generic worker pool which eats through the task queue items.

This solves a lot of problems.

- Small to medium jobs can be parallelized efficiently.

- No need to split big jobs perfectly.

- We don't get into a situation where we are waiting around for a worker to finish chugging through a huge task while the other workers sit idle.

- Worker memory footprint is tiny so we can afford many of them.

- Worker pool management is a well known problem.

- Worker spawn time disappears as a cost factor.

- The worker pool becomes a shared resource that can be managed and reported on and becomes considerably more predictable.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Marco Nenciarini
Дата: 13 января 2015 г., 14:22:34
Сообщение: Re: [RFC] LSN Map

Следующее

От: Kyotaro HORIGUCHI
Дата: 13 января 2015 г., 14:48:07
Сообщение: Re: Async execution of postgres_fdw.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Seq Scan

Предыдущее

Следующее