hashagg, statistisics and excessive memory allocation

Поиск
Список
Период
Сортировка
От Stefan Kaltenbrunner
Тема hashagg, statistisics and excessive memory allocation
Дата
Msg-id 44638429.8000307@kaltenbrunner.cc
обсуждение исходный текст
Ответы Re: hashagg, statistisics and excessive memory allocation  (Josh Berkus <josh@agliodbs.com>)
Re: hashagg, statistisics and excessive memory allocation  ("Jim C. Nasby" <jnasby@pervasive.com>)
Список pgsql-hackers
Hi!

on irc somebody complained yesterday that a simple group by on a 25M
integer row caused his backend to exhaust the 3GB process limit on his
32bit built(one a box with 16GB Ram).
Some testing showed that the planner was seriously underestimating the
number of distinct rows in the table (with the default statistic target
it estimated ~150k rows while there are about 19M distinct values) and
chosing a hashagg for the aggregate.
uping the statistics target to 1000 improves the estimate to about 5M
rows which unfortunably is still not enough to cause the planner to
switch to a groupagg with work_mem set to 256000.
Some testing seems to indicate that even with perfectly matching stats
like(8.1.3 here):

foo=# create table testtable AS select a from generate_series(1,5000000)
as a;
SELECT
foo=# CREATE INDEX test_idx on testtable (a);
CREATE INDEX
foo=# ANALYZE ;
ANALYZE
foo=# explain select a,count(*) from testtable group by a;                              QUERY PLAN
-------------------------------------------------------------------------HashAggregate  (cost=97014.73..159504.51
rows=4999182width=4)  ->  Seq Scan on testtable  (cost=0.00..72018.82 rows=4999182 width=4)
 
(2 rows)

will use about 2,5x of what work_mem is set too, while that is partly
expected it seems quite dangerous that one can even with only moderate
underestimation of the expected resultcount(say 2x or 4x) run a server
out of memory.


Stefan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: [PERFORM] Big IN() clauses etc : feature proposal
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: hashagg, statistisics and excessive memory allocation