Обсуждение: window function count(*) and limit

Поиск
Список
Период
Сортировка

window function count(*) and limit

От
Jesper Krogh
Дата:
Hi.

I have been puzzled about the evaluation order when using window
functions and limit.

jk=# select * from testtable; id | value
----+-------  1 |     1  2 |     2  3 |     3  4 |     4  5 |     5  6 |     6  7 |     7  8 |     8  9 |     9 10 |
10
(10 rows)

jk=# select id,count(*) over () from testtable where id < 9 limit 3; id | count
----+-------  1 |     8  2 |     8  3 |     8
(3 rows)


So the first element "id" is definately picked after the "limit 3", whereas
the window function is applied before. I have been digging in the
documentation but I didnt find this case specified out.

This behaviour may be correct, but it hugely surprises me...
I expected it to either count to 3 or blow up and tell me that count(*)
wasn't a window function.

It looks like something about the type of the function where
count(*) is a "agg" and row_number() is a "window". But shouldn't
count(*) exist as a type "window" and behave accordingly?

Same goes on for min() max() and other standard aggregates.

.. postgresql 8.4.4 (but couldn't find anyting mentioned in 8.4.5/9.0 
release notes about this).

Jesper

-- 
Jesper


Re: window function count(*) and limit

От
Tom Lane
Дата:
Jesper Krogh <jesper@krogh.cc> writes:
> I have been puzzled about the evaluation order when using window
> functions and limit.

It's basically FROM -> WHERE -> window functions -> LIMIT.

> I expected it to either count to 3 or blow up and tell me that count(*)
> wasn't a window function.

Any aggregate function can be used as a window function.  It just
aggregates over the whole partition (which in this case is all 8
rows that satisfy the WHERE).

LIMIT is supposed to truncate the output at a specified number of
rows, not change what is in those rows; so ISTM that this evaluation
order is the expected one.  If you want to put a LIMIT in front of
the aggregates/window functions, put it in a sub-select.
        regards, tom lane


Re: window function count(*) and limit

От
Jesper Krogh
Дата:
On 2010-10-23 18:42, Tom Lane wrote:
> Jesper Krogh<jesper@krogh.cc>  writes:
>
>> I have been puzzled about the evaluation order when using window
>> functions and limit.
>>
> It's basically FROM ->  WHERE ->  window functions ->  LIMIT.
>
>> I expected it to either count to 3 or blow up and tell me that count(*)
>> wasn't a window function.
>>
> Any aggregate function can be used as a window function.  It just
> aggregates over the whole partition (which in this case is all 8
> rows that satisfy the WHERE).
>
Thank you for clarifying.

Testing more.. I can see that it does that in all situations, so
it is not that "confusing" anymore. I still think it is hugely
counter intuitive and attached is a documentation patch
that should clarify it a bit.

I would just assume the reverse behaviour would be way closer to
useful for everyone. (say if you want the window function to operate
over the full set, then you wouldn't specify the limit). But that
doesn't help anyone if the SQL-spec specifies it otherwise.

As a sidenote.. the queryplan for some of them seems quite "unoptimal".

# explain select id,last_value(id) over () from testtable order by id
asc limit 3;
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1.30 rows=3 width=4)
    ->  WindowAgg  (cost=0.00..6117917.93 rows=14165769 width=4)
          ->  Index Scan using testtable_pkey on testtable
(cost=0.00..5940845.82 rows=14165769 width=4)

The output is essentially the 3 smallest ids and the largest one in the
table which
all can be found by both a forward and reverse scan on the primary key
index so above
is absolutely not the cheapest way to find the result.

--
Jesper


Вложения