On Sat, Aug 2, 2014 at 1:40 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> This is a prerequisite for memory-bounded HashAgg, which I intend to
> submit for the next CF.
FWIW, I think that's a good project. A large number of these TPC-H
queries used HashAggs when I checked, on a moderate sized sample TPC-H
database:
http://examples.citusdata.com/tpch_queries.html
(I found these queries at random from Googling, but happened to have a
~2GB TPC-H database on my laptop). I attach EXPLAIN ANALYZE ouput for
each, as shown on master. From this admittedly unscientific random
sampling, 5 out of 8 query plans have a hash aggregate node. TPC-H is
a benchmark that Postgres does not tend to do too well on [1], and I
suspect that this has something to do with it; lower work_mem settings
will spook the optimizer into using a group aggregate within
choose_hashed_grouping(). Of course, in order to get the benefit of
your patch, that will need to be adjusted. I think that part is
surprisingly straightforward, though.
[1] https://wiki.postgresql.org/wiki/TPC-H
--
Peter Geoghegan