Re: Disk-based hash aggregate's cost model

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Disk-based hash aggregate's cost model
Дата	4 сентября 2020 г. 19:01:37
Msg-id	20200904190137.scrucxyjkxyc2bmk@development обсуждение исходный текст
Ответ на	Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Disk-based hash aggregate's cost model
Список	pgsql-hackers

Дерево обсуждения

On Fri, Sep 04, 2020 at 11:31:36AM -0700, Jeff Davis wrote:
>On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
>> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
>> files
>> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
>> durations (also in the ~25% range).
>
>I was able to reproduce the problem, thank you.
>
>Only two attributes are needed, so the CP_SMALL_TLIST projected schema
>only needs a single-byte null bitmap.
>
>But if just setting the attributes to NULL rather than projecting them,
>the null bitmap size is based on all 16 attributes, bumping the bitmap
>size to two bytes.
>
>MAXALIGN(23 + 1) = 24
>MAXALIGN(23 + 2) = 32
>
>I think that explains it. It's not ideal, but projection has a cost as
>well, so I don't think we necessarily need to do something here.
>
>If we are motivated to improve this in v14, we could potentially have a
>different schema for spilled tuples, and perform real projection at
>spill time. But I don't know if that's worth the extra complexity.
>

Thanks for the investigation and explanation.

Wouldn't it be enough to just use a slot with smaller tuple descriptor?
All we'd need to do is creating the descriptor in ExecInitAgg after
calling find_hash_columns, and using it for rslot/wslot, and then
"mapping" the attributes in hashagg_spill_tuple (which already almost
does that, to the extra cost should be 0) and when reading the spilled
tuples. So I'm not quite buying the argument that this would make
measurable difference ...

That being said, I won't insist on fixing this in v13 - at least we know
what the issue is and we can fix it later. The costing seems like a more
serious open item.

OTOH I don't think this example is particularly extreme, and I wouldn't
be surprised if we se even worse examples in practice - tables tend to
be quite wide and aggregation of just a few columns seems likely.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Disk-based hash aggregate's cost model