Обсуждение: Efficiently selecting single row from a select with window functions row_number, lag and lead

Поиск
Список
Период
Сортировка

Efficiently selecting single row from a select with window functions row_number, lag and lead

От
Andrew Bailey
Дата:
I would like to do the following:

select id, row_number() over w as rownum, lag(id, 1) over w as prev, lead(id, 1) over w as next from route where id=1350 window w as (order by shortname, id asc rows between 1 preceding and 1 following)  order by shortname, id ;

However this gives the result
1350;1;;

The following query gives the result I am expecting

select * from (select id, row_number() over w as rownum,
lag(id, 1) over w as prev, lead(id, 1) over w as next
from route window w as (order by shortname, id
rows between 1 preceding and 1 following) order by shortname, id) as s where id=1350

1350;3;1815;1813

The explain plan is
"Subquery Scan on s  (cost=0.14..15.29 rows=1 width=32)"
"  Filter: (s.id = 1350)"
"  ->  WindowAgg  (cost=0.14..13.51 rows=143 width=12)"
"        ->  Index Only Scan using route_idx on route  (cost=0.14..10.29 rows=143 width=12)"

as it makes use of the index created as follows

CREATE INDEX route_idx
  ON route
  USING btree
  (shortname COLLATE pg_catalog."default", id);

I believe that the index has all the data that is needed to obtain the results in a single query.
Is it possible to write the query as a single select and if so how?

Thanks in advance

Andrew Bailey
On 1/1/16 9:39 PM, Andrew Bailey wrote:
> select id, row_number() over w as rownum, lag(id, 1) over w as prev,
> lead(id, 1) over w as next from route where id=1350 window w as (order
> by shortname, id asc rows between 1 preceding and 1 following) order by
> shortname, id ;
>
> However this gives the result
> 1350;1;;
>
> The following query gives the result I am expecting
>
> select * from (select id, row_number() over w as rownum,
> lag(id, 1) over w as prev, lead(id, 1) over w as next
> from route window w as (order by shortname, id
> rows between 1 preceding and 1 following) order by shortname, id) as s
> where id=1350
>
> 1350;3;1815;1813
>
> The explain plan is
> "Subquery Scan on s  (cost=0.14..15.29 rows=1 width=32)"
> "  Filter: (s.id <http://s.id> = 1350)"
> "  ->  WindowAgg  (cost=0.14..13.51 rows=143 width=12)"
> "        ->  Index Only Scan using route_idx on route  (cost=0.14..10.29
> rows=143 width=12)"

I'm pretty sure the issue here is that the WHERE clause is limiting your
result set before the window can find what you're looking for.

You could probably switch the WHERE in your original query to a HAVING
and get the same results.

I'm not sure the filter can actually be pushed past the window functions
to get the result you want. That Index Only Scan could still be pulling
every row in the table.

BTW, if you switch the order by to id, shortname then it might be able
to use the index, but of course the results would be different.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Efficiently selecting single row from a select with window functions row_number, lag and lead

От
David Rowley
Дата:
On 2 January 2016 at 16:39, Andrew Bailey <hazlorealidad@gmail.com> wrote:
I would like to do the following:

select id, row_number() over w as rownum, lag(id, 1) over w as prev, lead(id, 1) over w as next from route where id=1350 window w as (order by shortname, id asc rows between 1 preceding and 1 following)  order by shortname, id ;

However this gives the result
1350;1;;


This does not work due to the id=1350 is always applied before the rows make it into the window therefore you only have rows which match id=1350, which is not what you want in this case.
 
The following query gives the result I am expecting

select * from (select id, row_number() over w as rownum,
lag(id, 1) over w as prev, lead(id, 1) over w as next
from route window w as (order by shortname, id
rows between 1 preceding and 1 following) order by shortname, id) as s where id=1350

1350;3;1815;1813


This works because the id=1350 is not pushed down into the subquery which contain the windowing functions, this also means that the entire route table is processed and you may suffer from performance problems if the route table is, or gets big. You'll be able to confirm this by looking at the EXPLAIN output and noticing the lack of filter on the seqscan.
 
The explain plan is
"Subquery Scan on s  (cost=0.14..15.29 rows=1 width=32)"
"  Filter: (s.id = 1350)"
"  ->  WindowAgg  (cost=0.14..13.51 rows=143 width=12)"
"        ->  Index Only Scan using route_idx on route  (cost=0.14..10.29 rows=143 width=12)"

as it makes use of the index created as follows

CREATE INDEX route_idx
  ON route
  USING btree
  (shortname COLLATE pg_catalog."default", id);

I believe that the index has all the data that is needed to obtain the results in a single query.
Is it possible to write the query as a single select and if so how?

why not just write it as: select id, (select max(id) from route where id < 1350) as prev, (select min(id) from route where id > 1350) as next from route where id=2; ?
That should be much more efficient for a larger table as it should avoid the seqscan and allow the index to be used for all 3 numbers.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Efficiently selecting single row from a select with window functions row_number, lag and lead

От
Vitaly Burovoy
Дата:
On 1/1/16, Andrew Bailey <hazlorealidad@gmail.com> wrote:
> I would like to do the following:
>
> select id, row_number() over w as rownum, lag(id, 1) over w as prev,
> lead(id, 1) over w as next from route where id=1350 window w as (order by
> shortname, id asc rows between 1 preceding and 1 following)  order by
> shortname, id ;
>
> However this gives the result
> 1350;1;;
>
> The following query gives the result I am expecting
>
> select * from (select id, row_number() over w as rownum,
> lag(id, 1) over w as prev, lead(id, 1) over w as next
> from route window w as (order by shortname, id
> rows between 1 preceding and 1 following) order by shortname, id) as s
> where id=1350
>
> 1350;3;1815;1813
>
> The explain plan is
> "Subquery Scan on s  (cost=0.14..15.29 rows=1 width=32)"
> "  Filter: (s.id = 1350)"
> "  ->  WindowAgg  (cost=0.14..13.51 rows=143 width=12)"
> "        ->  Index Only Scan using route_idx on route  (cost=0.14..10.29
> rows=143 width=12)"
>
> as it makes use of the index created as follows
>
> CREATE INDEX route_idx
>   ON route
>   USING btree
>   (shortname COLLATE pg_catalog."default", id);
>
> I believe that the index has all the data that is needed to obtain the
> results in a single query.
> Is it possible to write the query as a single select and if so how?
>
> Thanks in advance
>
> Andrew Bailey
>

It works as expected.

You can read it at
http://www.postgresql.org/docs/current/static/tutorial-window.html
---
The rows considered by a window function are those of the "virtual
table" produced by the query's FROM clause as filtered by its WHERE,
GROUP BY, and HAVING clauses if any. For example, a row removed
because it does not meet the WHERE condition is not seen by any window
function.
---

So your WHERE clause from the first query selects the only row to a
"virtual table", and lead and lag works with the table contains the
only one row and doesn't have any other before and after it.

In the second query subselect selects ALL rows to the "virtual table",
lead and lag fill values and WHERE in the external select gets a
single row filled by subselect.

It is also in the documentation:
"If there is a need to filter or group rows after the window
calculations are performed, you can use a sub-select."

Unfortunately it is impossible to give an access to window function to
rows not selected by a current query.

--
Best regards,
Vitaly Burovoy