Re: Default setting for enable_hashagg_disk
От | Tomas Vondra |
---|---|
Тема | Re: Default setting for enable_hashagg_disk |
Дата | |
Msg-id | 20200724131852.nul7wihayw4uadp7@development обсуждение исходный текст |
Ответ на | Re: Default setting for enable_hashagg_disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
On Fri, Jul 24, 2020 at 10:40:47AM +0200, Tomas Vondra wrote: >On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote: >>On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra >><tomas.vondra@2ndquadrant.com> wrote: >>>So let me share some fresh I/O statistics collected on the current code >>>using iosnoop. I've done the tests on two different machines using the >>>"aggregate part" of TPC-H Q17, i.e. essentially this: >>> >>> SELECT * FROM ( >>> SELECT >>> l_partkey AS agg_partkey, >>> 0.2 * avg(l_quantity) AS avg_quantity >>> FROM lineitem GROUP BY l_partkey OFFSET 1000000000 >>> ) part_agg; >>> >>>The OFFSET is there just to ensure we don't need to send anything to >>>the client, etc. >> >>Thanks for testing this. >> >>>So sort writes ~3.4GB of data, give or take. But hashagg/master writes >>>almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the >>>original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still >>>much more than the 3.4GB of data written by sort (which has to spill >>>everything, while hashagg only spills rows not covered by the groups >>>that fit into work_mem). >> >>What I find when I run your query (with my own TPC-H DB that is >>smaller than what you used here -- 59,986,052 lineitem tuples) is that >>the sort required about 7x more memory than the hash agg to do >>everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak >>hash agg memory usage. I'd be surprised if the ratio was very >>different for you -- but can you check? >> > >I can check, but it's not quite clear to me what are we looking for? >Increase work_mem until there's no need to spill in either case? > FWIW the hashagg needs about 4775953kB and the sort 33677586kB. So yeah, that's about 7x more. I think that's probably built into the TPC-H data set. It'd be easy to construct cases with much higher/lower factors. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: