Обсуждение: Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

Поиск
Список
Период
Сортировка

Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

От
David Fetter
Дата:
On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:
> Hello @all,
> 
> I know, i can do:
> 
> select * from (select ... row_number() over (...) ...) foo where
> row_number < N
> 
> to limit the rows per group, but the inner select has to retrieve
> the whole set of records and in the outer select most of them
> discarded.

That sounds like the optimizer's falling down on the job.  Would this
be difficult to fix?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

От
Robert Haas
Дата:
On Thu, Mar 25, 2010 at 5:17 PM, David Fetter <david@fetter.org> wrote:
> On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:
>> Hello @all,
>>
>> I know, i can do:
>>
>> select * from (select ... row_number() over (...) ...) foo where
>> row_number < N
>>
>> to limit the rows per group, but the inner select has to retrieve
>> the whole set of records and in the outer select most of them
>> discarded.
>
> That sounds like the optimizer's falling down on the job.  Would this
> be difficult to fix?

I may not be the best person to offer an opinion on this topic, but it
sounds tricky to me.  I think it would need some kind of extremely
specific special-case logic.  The planner would have to recognize
row_number() < n, row_number() <= n, and row_number = n as special
cases indicating that n-1, n, and n records respectively should be
expected to be fetched from the partition.  And you might also worry
about n > row_number(), and n >= row_number().

It might be worth doing because I suspect that is actually going to be
a fairly common type of query, but some thought needs to be given to
how to do it without resorting to abject kludgery.

...Robert


Re: Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

От
Hitoshi Harada
Дата:
2010/3/26 David Fetter <david@fetter.org>:
> On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:
>> Hello @all,
>>
>> I know, i can do:
>>
>> select * from (select ... row_number() over (...) ...) foo where
>> row_number < N
>>
>> to limit the rows per group, but the inner select has to retrieve
>> the whole set of records and in the outer select most of them
>> discarded.
>
> That sounds like the optimizer's falling down on the job.  Would this
> be difficult to fix?

I believe this isn't the task of window functions. In fact, "over( ...
LIMIT n)" or optimizer hack will fail on multiple window definitions.

To take top N items of each group (I agree this is quite common job),
I'd suggest syntax that is done by extending DISTINCT ON.

SELECT DISTINCT n ON(key1, key2) ...

where "n" means top "n" items on each "key1, key2" group. The current
DISTINCT ON() syntax is equivalent to DISTINCT 1 ON() in this way.
That'll be fairly easy to implement and you aren't be bothered by this
like multiple window definitions. The cons of this is that it can be
applied to only row_number logic. You may want to use rank,
dense_rank, etc. sometimes.

Regards,

--
Hitoshi Harada