Обсуждение: Hash aggregates blowing out memory

Поиск
Список
Период
Сортировка

Hash aggregates blowing out memory

От
Mike Harding
Дата:
I've been having problems where a HashAggregate is used because of a bad
estimate of the distinct number of elements involved.  In the following
example the total number of domain IDs is about 2/3 of the number of
rows, and it's estimated at about 1/15 of the actual value.  This will
occasionally cause the generated query to use a HashAggregate, and this
runs the backend out of memory - it will use 700 or more meg before
failing.

The following was run -immediately- after a vacuum.

explain analyze select sum(count) as sumc,class,domain_id into temp
new_clicks from clicks,countries where date > (current_date - 20) and
clicks.country_id=countries.country_id group by domain_id,class;

 GroupAggregate  (cost=1136261.89..1183383.51 rows=191406 width=12)
(actual time=138375.935..163794.452 rows=3258152 loops=1)
   ->  Sort  (cost=1136261.89..1147922.66 rows=4664311 width=12) (actual
time=138374.865..147308.343 rows=4514313 loops=1)
         Sort Key: clicks.domain_id, countries."class"
         ->  Hash Join  (cost=4.72..421864.06 rows=4664311 width=12)
(actual time=6837.405..66938.259 rows=4514313 loops=1)
               Hash Cond: ("outer".country_id = "inner".country_id)
               ->  Seq Scan on clicks  (cost=0.00..351894.67
rows=4664311 width=12) (actual time=6836.388..46865.490 rows=4514313
loops=1)
                     Filter: (date > (('now'::text)::date - 20))
               ->  Hash  (cost=4.18..4.18 rows=218 width=8) (actual
time=0.946..0.946 rows=0 loops=1)
                     ->  Seq Scan on countries  (cost=0.00..4.18
rows=218 width=8) (actual time=0.011..0.516 rows=218 loops=1)
 Total runtime: 175404.738 ms
(10 rows)
--
Mike Harding <mvh@ix.netcom.com>


Re: Hash aggregates blowing out memory

От
Tom Lane
Дата:
Mike Harding <mvh@ix.netcom.com> writes:
> I've been having problems where a HashAggregate is used because of a bad
> estimate of the distinct number of elements involved.

If you're desperate, there's always enable_hashagg.  Or reduce sort_mem
enough so that even the misestimate looks like it will exceed sort_mem.

In the long run it would be nice if HashAgg could spill to disk.  We
were expecting to see a contribution of code along that line last year
(from the CMU/Berkeley database class) but it never showed up.  The
performance implications might be a bit grim anyway :-(

            regards, tom lane

Re: Hash aggregates blowing out memory

От
Mike Harding
Дата:
Any way to adjust n_distinct to be more accurate?

I don't think a 'disk spill' would be that bad, if you could re-sort the
hash in place.  If nothing else, if it could -fail- when it reaches the
lower stratosphere, and re-start, it's faster than getting no result at
all... sort of an auto disable of the hashagg.

On Fri, 2005-02-25 at 16:55 -0500, Tom Lane wrote:
> Mike Harding <mvh@ix.netcom.com> writes:
> > I've been having problems where a HashAggregate is used because of a bad
> > estimate of the distinct number of elements involved.
>
> If you're desperate, there's always enable_hashagg.  Or reduce sort_mem
> enough so that even the misestimate looks like it will exceed sort_mem.
>
> In the long run it would be nice if HashAgg could spill to disk.  We
> were expecting to see a contribution of code along that line last year
> (from the CMU/Berkeley database class) but it never showed up.  The
> performance implications might be a bit grim anyway :-(
>
>             regards, tom lane
--
Mike Harding <mvh@ix.netcom.com>


Re: Hash aggregates blowing out memory

От
Tom Lane
Дата:
Mike Harding <mvh@ix.netcom.com> writes:
> Any way to adjust n_distinct to be more accurate?

You could try increasing the statistics target for the relevant columns.
What does pg_stats show for the "numdistinct" estimates of the columns
you are grouping over, and does that have anything to do with reality?

            regards, tom lane

Re: Hash aggregates blowing out memory

От
Greg Stark
Дата:
Mike Harding <mvh@ix.netcom.com> writes:

> The following was run -immediately- after a vacuum.

You realize "vacuum" doesn't update the statistics, right?
You have to do "analyze" or "vacuum analyze" for that.


--
greg

Re: Hash aggregates blowing out memory

От
Mike Harding
Дата:
Sorry, I should have said 'vacuum analyze verbose'...

On Sat, 2005-02-26 at 00:45 -0500, Greg Stark wrote:
> Mike Harding <mvh@ix.netcom.com> writes:
>
> > The following was run -immediately- after a vacuum.
>
> You realize "vacuum" doesn't update the statistics, right?
> You have to do "analyze" or "vacuum analyze" for that.
>
>
--
Mike Harding <mvh@ix.netcom.com>