Re: Disk-based hash aggregate's cost model
От | Tomas Vondra |
---|---|
Тема | Re: Disk-based hash aggregate's cost model |
Дата | |
Msg-id | 20200901211930.csrkqdxdb5qci7jf@development обсуждение исходный текст |
Ответ на | Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Disk-based hash aggregate's cost model
|
Список | pgsql-hackers |
On Tue, Sep 01, 2020 at 12:58:59PM -0700, Jeff Davis wrote: >On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote: >> Why? I don't think we need to change costing of in-memory HashAgg. My >> assumption was we'd only tweak startup_cost for cases with spilling >> by >> adding something like (cpu_operator_cost * npartitions * ntuples). > >The code above (the in-memory case) has a clause: > > startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples; > >which seems to account only for the hash calculation, because it's >multiplying by the number of grouping columns. > >Your calculation would also use cpu_operator_cost, but just for the >lookup. I'm OK with that, but it's a little inconsistent to only count >it for the tuples that spill to disk. > >But why multiply by the number of partitions? Wouldn't it be the depth? >A wide fanout will not increase the number of lookups. > Yeah, I think you're right it should be depth, not number of partitions. FWIW I don't know if this is enough to "fix" the costing, it's just something I noticed while looking at the code. >> FWIW I suspect some of this difference may be due to logical vs. >> physical I/O. iosnoop only tracks physical I/O sent to the device, >> but >> maybe we do much more logical I/O and it simply does not expire from >> page cache for the sort. It might behave differently for larger data >> set, longer query, ... > >That would suggest something like a penalty for HashAgg for being a >worse IO pattern. Or do you have another suggestion? > Possibly, yes. I think it'd be good to measure logical I/O (e.g. by adding some instrumentation to LogicalTapeSet) to see if this hypothesis is actually true. FWIW any thoughts about the different in temp size compared to CP_SMALL_TLIST? >> I don't know. I certainly understand the desire not to change things >> this late. OTOH I'm worried that we'll end up receiving a lot of poor >> plans post release. > >I was reacting mostly to changing the cost of Sort. Do you think >changes to Sort are required or did I misunderstand? > Not sure I'm following. I don't think anyone proposed changing costing for Sort. Or did I miss something? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: