Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id 20200629213319.ldfbi6wiyh6idf6e@development
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Mon, Jun 29, 2020 at 01:31:40PM -0400, Bruce Momjian wrote:
>On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote:
>> I have no reason to believe that the planner is any more or any less
>> likely to conclude that the hash table will fit in memory in v13 as
>> things stand (I don't know if the BufFile issue matters).
>>
>> In general, grouping estimates probably aren't very good compared to
>> join estimates. I imagine that in either v12 or v13 the planner is
>> likely to incorrectly believe that it'll all fit in memory fairly
>> often. v12 was much too permissive about what could happen. But v13 is
>> too conservative.
>
>FYI, we have improved planner statistics estimates for years, which must
>have affected node spill behavior on many node types (except hash_agg),
>and don't remember any complaints about it.
>

I think misestimates for GROUP BY are quite common and very hard to fix.
Firstly, our ndistinct estimator may give pretty bad results depending
e.g. on how is the table correlated.

I've been running some TPC-H benchmarks, and for partsupp.ps_partkey our
estimate was 4338776, when the actual value is 15000000, i.e. ~3.5x
higher. This was with statistics target increased to 1000. I can easily
imagine even worse estimates with lower values.

This ndistinct estimator is used even for extended statistics, so that
can't quite save us. Moreover, the grouping may be on top of a join, in
which case using ndistinct coefficients may not be possible :-(

So I think this is a quite real problem ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk