Window functions and index usage

Поиск
Список
Период
Сортировка
От Anssi Kääriäinen
Тема Window functions and index usage
Дата
Msg-id 4E8AD46F.30003@thl.fi
обсуждение исходный текст
Ответы Re: Window functions and index usage  (Robert Klemme <shortcutter@googlemail.com>)
Re: Window functions and index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
I have the following setup:

create table test(id integer, seq integer);
insert into test select generate_series(0, 100), generate_series(0, 1000);
create unique index test_idx on test(id, seq);
analyze test;

Now I try to fetch the latest 5 values per id, ordered by seq from the
table:

select * from (
select id, seq, row_number() over (partition by id order by seq)
   from test
  where id in (1, 2, 3)
) where row_number() <= 5;

This does not use the index on id, seq for sorting the data. It uses a
bitmap index scan, and sequential scan when issued SET enable_bitmapscan
to false. Tested both on git head and 8.4.8. See end of post for plans.
It seems it would be possible to fetch the first values per id using the
index, or at least skip the sorting.

If I emulate the behavior I want by using:
  (select id, seq from test where id = 1 order by seq limit 5)
union
  (select id, seq from test where id = 2 order by seq limit 5)
union
  (select id, seq from test where id = 2 order by seq limit 5);
I get two orders of magnitude faster execution.

Is there some other way to run the query so that it would use the index?
Is there plans to support the index usage for the above query assuming
that it is possible to use the index for that query?

The real world use case would be to fetch latest 5 threads per
discussion forum in one query. Or fetch 3 latest comments for all
patches in given commit fest in single query.

  - Anssi Kääriäinen

Normal plan (by the way, note the wildly inaccurate topmost row estimate):

                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on tmp  (cost=711.65..805.05 rows=958 width=16) (actual
time=10.543..27.469 rows=15 loops=1)
    Filter: (tmp.row_number <= 5)
    ->  WindowAgg  (cost=711.65..769.13 rows=2874 width=8) (actual
time=10.537..23.551 rows=3003 loops=1)
          ->  Sort  (cost=711.65..718.83 rows=2874 width=8) (actual
time=10.528..13.798 rows=3003 loops=1)
                Sort Key: test.id, test.seq
                Sort Method: quicksort  Memory: 182kB
                ->  Bitmap Heap Scan on test  (cost=59.04..546.55
rows=2874 width=8) (actual time=0.580..4.750 rows=3003 loops=1)
                      Recheck Cond: (id = ANY ('{1,2,3}'::integer[]))
                      ->  Bitmap Index Scan on test_idx
(cost=0.00..58.32 rows=2874 width=0) (actual time=0.490..0.490 rows=3003
loops=1)
                            Index Cond: (id = ANY ('{1,2,3}'::integer[]))
  Total runtime: 27.531 ms
(11 rows)

Plan with set enable_bitmapscan set to off:

                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on tmp  (cost=2003.23..2096.64 rows=958 width=16)
(actual time=32.907..47.279 rows=15 loops=1)
    Filter: (tmp.row_number <= 5)
    ->  WindowAgg  (cost=2003.23..2060.71 rows=2874 width=8) (actual
time=32.898..44.053 rows=3003 loops=1)
          ->  Sort  (cost=2003.23..2010.42 rows=2874 width=8) (actual
time=32.883..36.067 rows=3003 loops=1)
                Sort Key: test.id, test.seq
                Sort Method: quicksort  Memory: 182kB
                ->  Seq Scan on test  (cost=0.00..1838.14 rows=2874
width=8) (actual time=0.017..26.733 rows=3003 loops=1)
                      Filter: (id = ANY ('{1,2,3}'::integer[]))
  Total runtime: 47.334 ms
(9 rows)

The UNION approach:

                                                               QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=28.47..28.62 rows=15 width=8) (actual
time=0.176..0.194 rows=15 loops=1)
    ->  Append  (cost=0.00..28.40 rows=15 width=8) (actual
time=0.026..0.149 rows=15 loops=1)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.024..0.045 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.022..0.034 rows=5
loops=1)
                      Index Cond: (id = 1)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.017..0.034 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.015..0.023 rows=5
loops=1)
                      Index Cond: (id = 2)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.021..0.037 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.019..0.026 rows=5
loops=1)
                      Index Cond: (id = 3)
  Total runtime: 0.258 ms
(12 rows)





В списке pgsql-performance по дате отправления:

Предыдущее
От: Nowak Michał
Дата:
Сообщение: Re: Query with order by and limit is very slow - wrong index used
Следующее
От: Venkat Balaji
Дата:
Сообщение: : Column Performance in a query