Обсуждение: Hash aggregates blowing out memory
I've been having problems where a HashAggregate is used because of a bad estimate of the distinct number of elements involved. In the following example the total number of domain IDs is about 2/3 of the number of rows, and it's estimated at about 1/15 of the actual value. This will occasionally cause the generated query to use a HashAggregate, and this runs the backend out of memory - it will use 700 or more meg before failing. The following was run -immediately- after a vacuum. explain analyze select sum(count) as sumc,class,domain_id into temp new_clicks from clicks,countries where date > (current_date - 20) and clicks.country_id=countries.country_id group by domain_id,class; GroupAggregate (cost=1136261.89..1183383.51 rows=191406 width=12) (actual time=138375.935..163794.452 rows=3258152 loops=1) -> Sort (cost=1136261.89..1147922.66 rows=4664311 width=12) (actual time=138374.865..147308.343 rows=4514313 loops=1) Sort Key: clicks.domain_id, countries."class" -> Hash Join (cost=4.72..421864.06 rows=4664311 width=12) (actual time=6837.405..66938.259 rows=4514313 loops=1) Hash Cond: ("outer".country_id = "inner".country_id) -> Seq Scan on clicks (cost=0.00..351894.67 rows=4664311 width=12) (actual time=6836.388..46865.490 rows=4514313 loops=1) Filter: (date > (('now'::text)::date - 20)) -> Hash (cost=4.18..4.18 rows=218 width=8) (actual time=0.946..0.946 rows=0 loops=1) -> Seq Scan on countries (cost=0.00..4.18 rows=218 width=8) (actual time=0.011..0.516 rows=218 loops=1) Total runtime: 175404.738 ms (10 rows) -- Mike Harding <mvh@ix.netcom.com>
Mike Harding <mvh@ix.netcom.com> writes: > I've been having problems where a HashAggregate is used because of a bad > estimate of the distinct number of elements involved. If you're desperate, there's always enable_hashagg. Or reduce sort_mem enough so that even the misestimate looks like it will exceed sort_mem. In the long run it would be nice if HashAgg could spill to disk. We were expecting to see a contribution of code along that line last year (from the CMU/Berkeley database class) but it never showed up. The performance implications might be a bit grim anyway :-( regards, tom lane
Any way to adjust n_distinct to be more accurate? I don't think a 'disk spill' would be that bad, if you could re-sort the hash in place. If nothing else, if it could -fail- when it reaches the lower stratosphere, and re-start, it's faster than getting no result at all... sort of an auto disable of the hashagg. On Fri, 2005-02-25 at 16:55 -0500, Tom Lane wrote: > Mike Harding <mvh@ix.netcom.com> writes: > > I've been having problems where a HashAggregate is used because of a bad > > estimate of the distinct number of elements involved. > > If you're desperate, there's always enable_hashagg. Or reduce sort_mem > enough so that even the misestimate looks like it will exceed sort_mem. > > In the long run it would be nice if HashAgg could spill to disk. We > were expecting to see a contribution of code along that line last year > (from the CMU/Berkeley database class) but it never showed up. The > performance implications might be a bit grim anyway :-( > > regards, tom lane -- Mike Harding <mvh@ix.netcom.com>
Mike Harding <mvh@ix.netcom.com> writes: > Any way to adjust n_distinct to be more accurate? You could try increasing the statistics target for the relevant columns. What does pg_stats show for the "numdistinct" estimates of the columns you are grouping over, and does that have anything to do with reality? regards, tom lane
Mike Harding <mvh@ix.netcom.com> writes: > The following was run -immediately- after a vacuum. You realize "vacuum" doesn't update the statistics, right? You have to do "analyze" or "vacuum analyze" for that. -- greg
Sorry, I should have said 'vacuum analyze verbose'... On Sat, 2005-02-26 at 00:45 -0500, Greg Stark wrote: > Mike Harding <mvh@ix.netcom.com> writes: > > > The following was run -immediately- after a vacuum. > > You realize "vacuum" doesn't update the statistics, right? > You have to do "analyze" or "vacuum analyze" for that. > > -- Mike Harding <mvh@ix.netcom.com>