Missing optimization when filters are applied after window functions

Поиск

Список

Период

Сортировка

От	Volker Grabsch
Тема	Missing optimization when filters are applied after window functions
Дата	16 мая 2012 г. 04:50:37
Msg-id	20120516075017.GA31790@vlap обсуждение исходный текст
Ответы	Re: Missing optimization when filters are applied after window functions Re: Missing optimization when filters are applied after window functions
Список	pgsql-hackers

Дерево обсуждения

Dear PostgreSQL hackers,

[ please CC to me as I'm not subscribed to the list ]

I think there is a missing optimization when a filter is
applied after a window function, where the filtered field
is also used for partitioning.

Here a simplified example: Suppose we have a table
that stores 100.000 ids, and 10 random values per id:
   create table test (       id integer,       value double precision,       primary key (id, value)   );
   insert into test (id, value)       select           generate_series(1, 100000) id,           random()       from
     generate_series(1, 10);
 

We are now asking for the rank of each value per id
(it doesn't matter if it's rank() or any other window
function):
   select       id,       value,       rank() over (partition by id order by value)   from       test;

Now suppose we're interested only in the result for
id 23000. If the query above was defined in a view,
this would lead to the following query:
   select       *   from       (           select               id,               value,               rank() over
(partitionby id order by value)           from               test       ) subquery   where       id = 23000;
 

This is very slow! The "explain" command shows the reason:
   Subquery Scan on subquery  (cost=0.00..86556.80 rows=5000 width=20)      Filter: (subquery.id = 23000)       ->
WindowAgg (cost=0.00..74056.80 rows=1000000 width=12)            ->  Index Scan using test_pkey on test
(cost=0.00..56556.80rows=1000000 width=12)
 

The Filter (id = 23000) is applied after WindowAgg,
although it could certainly be applied before!

Of course we could rewrite the above query like this:
   select       id,       value,       rank() over (partition by id order by value)   from       test   where       id
=23000;
 

This would lead to a more sane plan where the filter is
applied before WindowAgg, making use of the primary key:
   WindowAgg  (cost=43.53..43.70 rows=10 width=12)     ->  Sort  (cost=43.53..43.55 rows=10 width=12)           Sort
Key:value           ->  Bitmap Heap Scan on test  (cost=4.53..43.36 rows=10 width=12)                 Recheck Cond: (id
=23000)                 ->  Bitmap Index Scan on test_pkey  (cost=0.00..4.53 rows=10 width=0)
IndexCond: (id = 23000)
 

However, the second query cannot be cleanly split into
a generic view and a specialized part anymore.

In other words, it is currently impossible to use Window
functions in a View without risking serious, unexpected
performance issues, even in simple cases!

I propose the following general optimization: If all window
functions are partitioned by the same first field (here: id),
then any filter on that field should be executed before
WindowAgg. So a query like this:
   select       ...   from       (select           ... over (partition by FIELD, other2, ...)           ... over
(partitionby FIELD, other3, ...)           ... over (partition by FIELD, ...)           ...       )   where
SOME_FILTER(FIELD)

Shoule be transformed into that:
   select       ...   from       (select           ... over (partition by FIELD, other2, ...)           ... over
(partitionby FIELD, other3, ...)           ... over (partition by FIELD, ...)           ...        where
SOME_FILTER(FIELD)      )
 

I verfied that this issue exists with PostgreSQL 9.2 beta 1
as well as the PostgreSQL 9.1 release.

Is there any work on this issue? Or am I wrong and this
optimization should not be implemented for some reason?


Regards,
Volker

-- 
Volker Grabsch
---<<(())>>---

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Missing optimization when filters are applied after window functions