Re: Disk-based hash aggregate's cost model

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Disk-based hash aggregate's cost model
Дата	1 сентября 2020 г. 09:19:03
Msg-id	20200901091903.ggabq3t3axkc2wwx@development обсуждение исходный текст
Ответ на	Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Disk-based hash aggregate's cost model Re: Disk-based hash aggregate's cost model
Список	pgsql-hackers

Дерево обсуждения

On Mon, Aug 31, 2020 at 11:34:34PM -0700, Jeff Davis wrote:
>On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
>> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
>> for the spilling case. In particular, the initial section computing
>> startup_cost seems to ignore that we may need to so some of the stuff
>> repeatedly - for example we'll repeat hash lookups for spilled
>> tuples,
>> and so on.
>
>To fix that, we'd also need to change the cost of in-memory HashAgg,
>right?
>

Why? I don't think we need to change costing of in-memory HashAgg. My
assumption was we'd only tweak startup_cost for cases with spilling by
adding something like (cpu_operator_cost * npartitions * ntuples).

>> The other thing is that sort seems to be doing only about half the
>> physical I/O (as measured by iosnoop) compared to hashagg, even
>> though
>> the estimates of pages / input_bytes are exactly the same. For
>> hashagg
>> the iosnoop shows 5921MB reads and 7185MB writes, while sort only
>> does
>> 2895MB reads and 3655MB writes. Which kinda matches the observed
>> sizes
>> of temp files in the two cases, so the input_bytes for sort seems to
>> be
>> a bit overestimated.
>
>Hmm, interesting.
>

FWIW I suspect some of this difference may be due to logical vs.
physical I/O. iosnoop only tracks physical I/O sent to the device, but
maybe we do much more logical I/O and it simply does not expire from
page cache for the sort. It might behave differently for larger data
set, longer query, ...

>How reasonable is it to be making these kinds of changes to the cost
>model right now? I think your analysis is solid, but I'm worried about
>making more intrusive changes very late in the cycle.
>
>I had originally tried to limit the cost model changes to the new plans
>I am introducing -- that is, HashAgg plans expected to require disk.
>That's why I came up with a rather arbitrary penalty.
>

I don't know. I certainly understand the desire not to change things
this late. OTOH I'm worried that we'll end up receiving a lot of poor
plans post release.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Disk-based hash aggregate's cost model