On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> What worries me a bit is the sharp discontinuities when spilling with
> significantly less work_mem than the "optimal" amount. For example,
> with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
> that setting work_mem to 6MB looks like this:
...
> Planned Partitions: 128 Peak Memory Usage: 6161kB Disk
> Usage: 2478080kB HashAgg Batches: 128
...
> Planned Partitions: 128 Peak Memory Usage: 5393kB Disk
> Usage: 2482152kB HashAgg Batches: 11456
...
> My guess that this is because the
> recursive hash aggregation misbehaves in a self-similar fashion once
> a
> certain tipping point has been reached.
It looks like it might be fairly easy to use HyperLogLog as an
estimator for the recursive step. That should reduce the
overpartitioning, which I believe is the cause of this discontinuity.
It's not clear to me that overpartitioning is a real problem in this
case -- but I think the fact that it's causing confusion is enough
reason to see if we can fix it.
Regards,
Jeff Davis