slow count in window query

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема slow count in window query
Дата
Msg-id 162867790907150318y3dccb77ep1ba6eb94742e72c8@mail.gmail.com
обсуждение исходный текст
Ответы Re: slow count in window query  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
Hello,

I did some test - median via window function - I found probably some
bad optimised code. I found two methods - Celko and Itzik Ben-Gan.
Ben-Gan methoud should to be faster - there is one sort less, but in
practice - it is 2 times slower.

create table x(a integer);
insert into x select (random()*10000)::int from generate_series(1,10000);

Celko method:
postgres=# explain select avg(a)                                 from (select a, row_number() over
(order by a asc) as hi,                                                         row_number()
over (order by a desc) as lo,                                    from x) s                                where hi in
(lo-1,lo+1);                                         QUERY PLAN
 
-------------------------------------------------------------------------------------------------Aggregate
(cost=2144.02..2144.03rows=1 width=4)  ->  Subquery Scan s  (cost=1643.77..2143.77 rows=100 width=4)        Filter:
((s.hi= (s.lo - 1)) OR (s.hi = (s.lo + 1)))        ->  WindowAgg  (cost=1643.77..1943.77 rows=10000 width=4)
 ->  WindowAgg  (cost=1643.77..1818.77 rows=10000 width=4)                    ->  Sort  (cost=1643.77..1668.77
rows=10000width=4)                          Sort Key: x.a                          ->  WindowAgg  (cost=804.39..979.39
 
rows=10000 width=4)                                ->  Sort  (cost=804.39..829.39
rows=10000 width=4)                                      Sort Key: x.a                                      ->  Seq
Scanon x
 
(cost=0.00..140.00 rows=10000 width=4)
(11 rows)

Ben-Gan:

postgres=# explain select avg(a)                                 from (select a, row_number() over
(order by a) as r,                                                       count(*) over () as rc
                    from x ) p                               where r in ((rc+1)/2,(rc+2)/2) ;
        QUERY PLAN
 
-------------------------------------------------------------------------------------Aggregate  (cost=1354.64..1354.65
rows=1width=4)  ->  Subquery Scan p  (cost=804.39..1354.39 rows=100 width=4)        Filter: ((p.r = ((p.rc + 1) / 2))
OR(p.r = ((p.rc + 2) / 2)))        ->  WindowAgg  (cost=804.39..1104.39 rows=10000 width=4)              ->  WindowAgg
(cost=804.39..979.39rows=10000 width=4)                    ->  Sort  (cost=804.39..829.39 rows=10000 width=4)
              Sort Key: x.a                          ->  Seq Scan on x  (cost=0.00..140.00
 
rows=10000 width=4)
(8 rows)

but
postgres=# select avg(a) from (select a, row_number() over (order by
a) as r, count(*) over () as rc from x ) p where r in
((rc+1)/2,(rc+2)/2) ;         avg
-----------------------5027.0000000000000000
(1 row)

Time: 179,310 ms

postgres=# select avg(a) from (select a, row_number() over (order by a
asc) as hi, row_number() over (order by a desc) as lo from x) s where
hi in (lo-1,lo+1);         avg
-----------------------5027.0000000000000000
(1 row)

Time: 78,791 ms

When I checked diff, I found, so the problem is count() function.

count(*) over () is very slow. - maybe so this is standard aggregate?

Regards
Pavel Stehule


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Mysql.whynot or PG vs MySQL comparison table?
Следующее
От: Bernd Helmle
Дата:
Сообщение: Re: [PATCH 3/3] Document geqo_seed variable.