Обсуждение: Bug or feature?

Поиск
Список
Период
Сортировка

Bug or feature?

От
Kostyuk Oleg
Дата:
pgSQL 7.2.1, table with ~590000 rows with next stuct:

user_id       integer
url           text
.................... -- other fields

i do VACUUM ANALYZE, and then have this results:


1) ----------
proxy=> EXPLAIN SELECT count(user_id) AS cnt, user_id FROM stats GROUP
BY user_id ORDER BY cnt DESC;
NOTICE:  QUERY PLAN:

Sort  (cost=94399.52..94399.52 rows=58842 width=4)
   ->  Aggregate  (cost=86383.49..89325.61 rows=58842 width=4)
         ->  Group  (cost=86383.49..87854.55 rows=588425 width=4)
               ->  Sort  (cost=86383.49..86383.49 rows=588425 width=4)
                     ->  Seq Scan on stats  (cost=0.00..13873.25
rows=588425 width=4)


2) ----------
proxy=> EXPLAIN SELECT count(url) AS cnt, user_id FROM stats GROUP BY
user_id ORDER BY cnt DESC;
NOTICE:  QUERY PLAN:

Sort  (cost=155022.05..155022.05 rows=58842 width=51)
   ->  Aggregate  (cost=146163.49..149105.61 rows=58842 width=51)
         ->  Group  (cost=146163.49..147634.55 rows=588425 width=51)
               ->  Sort  (cost=146163.49..146163.49 rows=588425 width=51)
                     ->  Seq Scan on stats  (cost=0.00..13873.25
rows=588425 width=51)

-----------end-----------

cost grow by 1.5 times!
yes, i understand - this because url column have big width...
but in count(url) width of column not important! - but used?...
or it's bug of optimizer? or it's not implemented in optimizer?

Thx.

ps : sorry for my bad english :)

--
With best wishes, DIGMA sysadmin
Oleg Kostyuk aka Cub (CUB7-RIPE)
[BSD registered user #BSD050664]


Re: Bug or feature?

От
Stephan Szabo
Дата:
On Mon, 20 May 2002, Kostyuk Oleg wrote:

> 1) ----------
> proxy=> EXPLAIN SELECT count(user_id) AS cnt, user_id FROM stats GROUP
> BY user_id ORDER BY cnt DESC;
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=94399.52..94399.52 rows=58842 width=4)
>    ->  Aggregate  (cost=86383.49..89325.61 rows=58842 width=4)
>          ->  Group  (cost=86383.49..87854.55 rows=588425 width=4)
>                ->  Sort  (cost=86383.49..86383.49 rows=588425 width=4)
>                      ->  Seq Scan on stats  (cost=0.00..13873.25
> rows=588425 width=4)
>
>
> 2) ----------
> proxy=> EXPLAIN SELECT count(url) AS cnt, user_id FROM stats GROUP BY
> user_id ORDER BY cnt DESC;
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=155022.05..155022.05 rows=58842 width=51)
>    ->  Aggregate  (cost=146163.49..149105.61 rows=58842 width=51)
>          ->  Group  (cost=146163.49..147634.55 rows=588425 width=51)
>                ->  Sort  (cost=146163.49..146163.49 rows=588425 width=51)
>                      ->  Seq Scan on stats  (cost=0.00..13873.25
> rows=588425 width=51)
>
> -----------end-----------
>
> cost grow by 1.5 times!

In general, I'm not sure how meaningful a comparison between explain
outputs from different queries are.

> yes, i understand - this because url column have big width...
> but in count(url) width of column not important! - but used?...

Except that it's probably brought along for the sort step.  You
do at least need to know if it's NULL or not, and there's nothing
I can't think of something that would necessarily stop you from redefining
a count aggregate that did something with that value (apart from the
fact it's probably silly).  There are optimizations with certain
aggregates that aren't taken because the aggregate system is pretty
general.