Re: Disk-based hash aggregate's cost model
От | Tomas Vondra |
---|---|
Тема | Re: Disk-based hash aggregate's cost model |
Дата | |
Msg-id | 20200901091903.ggabq3t3axkc2wwx@development обсуждение исходный текст |
Ответ на | Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Disk-based hash aggregate's cost model
Re: Disk-based hash aggregate's cost model |
Список | pgsql-hackers |
On Mon, Aug 31, 2020 at 11:34:34PM -0700, Jeff Davis wrote: >On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote: >> So I'm wondering if the hashagg is not ignoring similar non-I/O costs >> for the spilling case. In particular, the initial section computing >> startup_cost seems to ignore that we may need to so some of the stuff >> repeatedly - for example we'll repeat hash lookups for spilled >> tuples, >> and so on. > >To fix that, we'd also need to change the cost of in-memory HashAgg, >right? > Why? I don't think we need to change costing of in-memory HashAgg. My assumption was we'd only tweak startup_cost for cases with spilling by adding something like (cpu_operator_cost * npartitions * ntuples). >> The other thing is that sort seems to be doing only about half the >> physical I/O (as measured by iosnoop) compared to hashagg, even >> though >> the estimates of pages / input_bytes are exactly the same. For >> hashagg >> the iosnoop shows 5921MB reads and 7185MB writes, while sort only >> does >> 2895MB reads and 3655MB writes. Which kinda matches the observed >> sizes >> of temp files in the two cases, so the input_bytes for sort seems to >> be >> a bit overestimated. > >Hmm, interesting. > FWIW I suspect some of this difference may be due to logical vs. physical I/O. iosnoop only tracks physical I/O sent to the device, but maybe we do much more logical I/O and it simply does not expire from page cache for the sort. It might behave differently for larger data set, longer query, ... >How reasonable is it to be making these kinds of changes to the cost >model right now? I think your analysis is solid, but I'm worried about >making more intrusive changes very late in the cycle. > >I had originally tried to limit the cost model changes to the new plans >I am introducing -- that is, HashAgg plans expected to require disk. >That's why I came up with a rather arbitrary penalty. > I don't know. I certainly understand the desire not to change things this late. OTOH I'm worried that we'll end up receiving a lot of poor plans post release. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: