Re: Window functions and index usage

Поиск
Список
Период
Сортировка
От Anssi Kääriäinen
Тема Re: Window functions and index usage
Дата
Msg-id 4E8B12F5.20308@thl.fi
обсуждение исходный текст
Ответ на Re: Window functions and index usage  (Robert Klemme <shortcutter@googlemail.com>)
Ответы Re: Window functions and index usage  (Robert Klemme <shortcutter@googlemail.com>)
Список pgsql-performance
On 10/04/2011 04:27 PM, Robert Klemme wrote:
> On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
> <anssi.kaariainen@thl.fi>  wrote:
>> I have the following setup:
>>
>> create table test(id integer, seq integer);
>> insert into test select generate_series(0, 100), generate_series(0, 1000);
>> create unique index test_idx on test(id, seq);
>> analyze test;
>>
>> Now I try to fetch the latest 5 values per id, ordered by seq from the
>> table:
>>
>> select * from (
>> select id, seq, row_number() over (partition by id order by seq)
>>   from test
>>   where id in (1, 2, 3)
>> ) where row_number()<= 5;
> It seems this fetches the *first* 5 values per id - and not the
> latest.  For that you would need to "order by seq desc" in the window
> and probably also in the index.
>
Yeah. Sorry, wrong order. And the last line is wrong, it should be ")
tmp where row_number <= 5;".
>> This does not use the index on id, seq for sorting the data. It uses a
>> bitmap index scan, and sequential scan when issued SET enable_bitmapscan to
>> false. Tested both on git head and 8.4.8. See end of post for plans. It
>> seems it would be possible to fetch the first values per id using the index,
>> or at least skip the sorting.
> Just guessing: since row_number is an analytic function and it can be
> combined with any type of windowing only in "rare" cases do the
> ordering of index columns and the ordering in the window align.  So
> while your particular use case could benefit from this optimization
> the overall judgement might be that it's not worthwhile to make the
> optimizer more complex to cover this case - or I fail to see the more
> general pattern here. :-)
>
I think there are common use cases for this - see end of message for an
example.
>> Is there some other way to run the query so that it would use the index? Is
>> there plans to support the index usage for the above query assuming that it
>> is possible to use the index for that query?
>>
>> The real world use case would be to fetch latest 5 threads per discussion
>> forum in one query. Or fetch 3 latest comments for all patches in given
>> commit fest in single query.
> Is it really that realistic that someone wants the latest n entries
> for *all* threads / patches?  It seems since this can result in a very
> large data set this is probably not the type of query which is done
> often.
The idea is that the dataset isn't that large. And you don't have to
fetch them for all threads / patches. You might fetch them only for
patches in currently viewed commit fest. See
https://commitfest.postgresql.org/action/commitfest_view?id=12 for one
such use. What I have in mind is fetching first all the patches in the
commit fest in one go. Then issue query which would look something like:
  select * from
     (select comment_data, row_number() over (partition by patch_id
order by comment_date desc)
        from patch_comments
      where patch_id in (list of patch_ids fetched in first query)
    ) tmp where row_number <= 3;

Now you have all the data needed for the first column in the above
mentioned page.

I guess the commit fest application is fetching all the comments for the
patches in the commit fest in one query, and then doing the limit in
application code. This is fine because there aren't that many comments
per patch. But if you have dozens of forums and thousands of threads per
forum you can't do that.

This is useful in any situation where you want to show n latest entries
instead of the last entry in a list view. Latest modifications to an
object, latest commits to a file, latest messages to a discussion thread
or latest payments per project. Or 5 most popular videos per category,
10 highest paid employees per department and so on.

  - Anssi Kääriäinen

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Klemme
Дата:
Сообщение: Re: Window functions and index usage
Следующее
От: pasman pasmański
Дата:
Сообщение: Re: : PG9.0 - Checkpoint tuning and pg_stat_bgwriter