Re: hashagg, statistisics and excessive memory allocation

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: hashagg, statistisics and excessive memory allocation
Дата
Msg-id 20060511211912.GQ99570@pervasive.com
обсуждение исходный текст
Ответ на hashagg, statistisics and excessive memory allocation  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Список pgsql-hackers
On Thu, May 11, 2006 at 08:36:25PM +0200, Stefan Kaltenbrunner wrote:
> Hi!
> 
> on irc somebody complained yesterday that a simple group by on a 25M
> integer row caused his backend to exhaust the 3GB process limit on his
> 32bit built(one a box with 16GB Ram).
> Some testing showed that the planner was seriously underestimating the
> number of distinct rows in the table (with the default statistic target
> it estimated ~150k rows while there are about 19M distinct values) and
> chosing a hashagg for the aggregate.
> uping the statistics target to 1000 improves the estimate to about 5M
> rows which unfortunably is still not enough to cause the planner to
> switch to a groupagg with work_mem set to 256000.
> Some testing seems to indicate that even with perfectly matching stats
> like(8.1.3 here):
> 
> foo=# create table testtable AS select a from generate_series(1,5000000)
> as a;
> SELECT
> foo=# CREATE INDEX test_idx on testtable (a);
> CREATE INDEX
> foo=# ANALYZE ;
> ANALYZE
> foo=# explain select a,count(*) from testtable group by a;
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  HashAggregate  (cost=97014.73..159504.51 rows=4999182 width=4)
>    ->  Seq Scan on testtable  (cost=0.00..72018.82 rows=4999182 width=4)
> (2 rows)
> 
> will use about 2,5x of what work_mem is set too, while that is partly
> expected it seems quite dangerous that one can even with only moderate
> underestimation of the expected resultcount(say 2x or 4x) run a server
> out of memory.

Well, the real problem is that hash operations will happily chew through
all available memory instead of either falling back to something else or
at least going to disk. I thought that was on the TODO to be fixed, but
I don't see it there. I also thought some improvement had been made
there, but maybe it's only in HEAD...?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Compressing table images
Следующее
От: PFC
Дата:
Сообщение: Re: [PERFORM] Big IN() clauses etc : feature proposal