Re: Default setting for enable_hashagg_disk
От | Tomas Vondra |
---|---|
Тема | Re: Default setting for enable_hashagg_disk |
Дата | |
Msg-id | 20200726000509.vtl4p7vzazdr2dif@development обсуждение исходный текст |
Ответ на | Re: Default setting for enable_hashagg_disk (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Default setting for enable_hashagg_disk
|
Список | pgsql-hackers |
On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote: >On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan <pg@bowt.ie> wrote: >> "Peak Memory Usage: 1605334kB" >> >> Hash agg avoids spilling entirely (so the planner gets it right this >> time around). It even uses notably less memory. > >I guess that this is because the reported memory usage doesn't reflect >the space used for transition state, which is presumably most of the >total -- array_agg() is used in the query. > I'm not sure what you mean by "reported memory usage doesn't reflect the space used for transition state"? Surely it does include that, we've built the memory accounting stuff pretty much exactly to do that. I think it's pretty clear what's happening - in the sorted case there's only a single group getting new values at any moment, so when we decide to spill we'll only add rows to that group and everything else will be spilled to disk. In the unsorted case however we manage to initialize all groups in the hash table, but at that point the groups are tiny an fit into work_mem. As we process more and more data the groups grow, but we can't evict them - at the moment we don't have that capability. So we end up processing everything in memory, but significantly exceeding work_mem. FWIW all my tests are done on the same TPC-H data set clustered by l_shipdate (so probably random with respect to other columns). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: