Re: Performance Issues with count()

Поиск

Список

Период

Сортировка

От	Martijn van Oosterhout
Тема	Re: Performance Issues with count()
Дата	23 апреля 2002 г. 19:54:58
Msg-id	20020424095337.A17391@svana.org обсуждение исходный текст
Ответ на	Re: Performance Issues with count() (Jan Wieck <janwieck@yahoo.com>)
Список	pgsql-general

Дерево обсуждения

On Tue, Apr 23, 2002 at 07:25:34PM -0400, Jan Wieck wrote:
>     In particular, the Perl script uses a  hash  table  with  one
>     entry  for  each group. Now what happens if the input data is
>     spread out and contains 1G groups?  It'll  simply  blow  away
>     your  script because it runs out of memory. This behaviour is
>     unacceptable for a database system, so  as  you  see  in  the
>     Explain output, PostgreSQL sorts and groups the input data in
>     temporary files before counting the rows.  Due to  that,  the
>     PostgreSQL solution to the problem requires a gazillion of IO
>     operations, but it'll work whatever the input data is,  given
>     that there is enough disk space.

I've always had a little problem with this setup. Sure, if you're using 1
billion groups then you have a problem, but if you are going into only a
small number of groups, the sorting takes forever compared to using a hash.
Currently the planner beleives that the output of a group will be 1/10th of
the input, but I think the square root would be a better estimate.

Here we regularly summerise 2 million rows into 6 groups and it's almost
faster to do 6 sequential scans than it is to do the sort/group.

Besides, the running out of memory argument is silly, because if sorting
moves out to disk when it gets too big, you can do the same with a hash
table.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance Issues with count()