Re: Trouble with hashagg spill I/O pattern and costing

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: Trouble with hashagg spill I/O pattern and costing
Дата	25 мая 2020 г. 21:36:42
Msg-id	978a6fa0bc00071d59ebcddf15ca72ae5aa7a15d.camel@j-davis.com обсуждение исходный текст
Ответ на	Re: Trouble with hashagg spill I/O pattern and costing (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Trouble with hashagg spill I/O pattern and costing
Список	pgsql-hackers

Дерево обсуждения

On Mon, 2020-05-25 at 04:10 +0200, Tomas Vondra wrote:
>      algorithm  master  prealloc  tlist  prealloc-tlist
>      --------------------------------------------------
>           hash    1365       437    368             213
>           sort     226       214    224             215
> 
> The sort row simply means "enable_hashagg = off" and AFAIK the
> patches
> should not have a lot of influence here - the prealloc does, but it's
> fairly negligible.

I also say a small speedup from the prealloc patch for Sort. I wrote if
off initially, but I'm wondering if there's something going on there.
Perhaps drawing K elements from the minheap at once is better for
caching? If so, that's good news, because it means the prealloc list is
a win-win.

>                           ->  Finalize HashAggregate
>                                 Group Key: lineitem_1.l_partkey
>                                 ->  Gather
>                                       Workers Planned: 2
>                                       ->  Partial HashAggregate
>                                             Group Key:
> lineitem_1.l_partkey
>                                             ->  Parallel Seq Scan on
> lineitem lineitem_1
>      (20 rows)

Although each worker here only gets half the tuples, it will get
(approximately) all of the *groups*. This may partly explain why the
planner moves away from this plan when there are more workers: the
number of hashagg batches doesn't go down much with more workers.

It also might be interesting to know the estimate for the number of
groups relative to the size of the table. If those two are close, it
might look to the planner like processing the whole input in each
worker isn't much worse than processing all of the groups in each
worker.

Regards,
    Jeff Davis

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Wolfgang Wilhelm
Дата: 25 мая 2020 г., 21:33:57
Сообщение: Re: Just for fun: Postgres 20?

Следующее

От: David Gilman
Дата: 25 мая 2020 г., 21:54:29
Сообщение: Re: Warn when parallel restoring a custom dump without data offsets

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Trouble with hashagg spill I/O pattern and costing

Предыдущее

Следующее