Re: Performance Issues with count()
От | Martijn van Oosterhout |
---|---|
Тема | Re: Performance Issues with count() |
Дата | |
Msg-id | 20020424095337.A17391@svana.org обсуждение исходный текст |
Ответ на | Re: Performance Issues with count() (Jan Wieck <janwieck@yahoo.com>) |
Список | pgsql-general |
On Tue, Apr 23, 2002 at 07:25:34PM -0400, Jan Wieck wrote: > In particular, the Perl script uses a hash table with one > entry for each group. Now what happens if the input data is > spread out and contains 1G groups? It'll simply blow away > your script because it runs out of memory. This behaviour is > unacceptable for a database system, so as you see in the > Explain output, PostgreSQL sorts and groups the input data in > temporary files before counting the rows. Due to that, the > PostgreSQL solution to the problem requires a gazillion of IO > operations, but it'll work whatever the input data is, given > that there is enough disk space. I've always had a little problem with this setup. Sure, if you're using 1 billion groups then you have a problem, but if you are going into only a small number of groups, the sorting takes forever compared to using a hash. Currently the planner beleives that the output of a group will be 1/10th of the input, but I think the square root would be a better estimate. Here we regularly summerise 2 million rows into 6 groups and it's almost faster to do 6 sequential scans than it is to do the sort/group. Besides, the running out of memory argument is silly, because if sorting moves out to disk when it gets too big, you can do the same with a hash table. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
В списке pgsql-general по дате отправления: