Обсуждение: Poor performance when using a window function in a view

Поиск
Список
Период
Сортировка

Poor performance when using a window function in a view

От
Chris Hanks
Дата:
Hi. Sorry if this is repetitive, I tried posting to pgsql-performance first
but I think it got stuck in moderation.

I'm trying to create a view that uses a window function, but it seems that
Postgres is unable to optimize it. Here's a reproduction of my situation
with 9.2.2:

---

drop table if exists values cascade;

create table values (
  fkey1 integer not null,
  fkey2 integer not null,
  fkey3 integer not null,
  value float not null,
  constraint values_pkey primary key (fkey1, fkey2, fkey3)
);

-- This is kind of hacky, but it roughly resembles the size and
distribution of my dataset.
insert into values select distinct on (fkey1, fkey2, fkey3)
  i / 12 + 1 as fkey1,
  i % 4 + 1 as fkey2,
  ceil(random() * 10) as fkey3,
  random() * 2 - 1 as value from generate_series(0, 199999) i;

create or replace view values_view as
select fkey1, fkey3,
  (derived1 / max(derived1) over (partition by fkey1)) as derived1,
  (derived2 / sum(derived1) over (partition by fkey1)) as derived2
from (
  select fkey1, fkey3,
    cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
precision) as derived1,
    sum((case when (value > 0.0) then (value * 4) else (value + 1) end)) as
derived2
  from values
  group by fkey1, fkey3
) as t1;

-- This query requires a sequential scan on values, though all the data it
needs could be found much more efficiently with an index scan.
explain analyze select * from values_view where fkey1 = 1263;

---

Can anyone suggest a way to rewrite this query? Or if postgres isn't
capable of optimizing this right now, is there a workaround of some kind?
This is a view I'd like to be able to join a smaller table against.

Thanks - Chris

Re: Poor performance when using a window function in a view

От
Merlin Moncure
Дата:
On Wed, Feb 27, 2013 at 8:22 PM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> Hi. Sorry if this is repetitive, I tried posting to pgsql-performance first
> but I think it got stuck in moderation.
>
> I'm trying to create a view that uses a window function, but it seems that
> Postgres is unable to optimize it. Here's a reproduction of my situation
> with 9.2.2:
>
> ---
>
> drop table if exists values cascade;
>
> create table values (
>   fkey1 integer not null,
>   fkey2 integer not null,
>   fkey3 integer not null,
>   value float not null,
>   constraint values_pkey primary key (fkey1, fkey2, fkey3)
> );
>
> -- This is kind of hacky, but it roughly resembles the size and distribution
> of my dataset.
> insert into values select distinct on (fkey1, fkey2, fkey3)
>   i / 12 + 1 as fkey1,
>   i % 4 + 1 as fkey2,
>   ceil(random() * 10) as fkey3,
>   random() * 2 - 1 as value from generate_series(0, 199999) i;
>
> create or replace view values_view as
> select fkey1, fkey3,
>   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
> from (
>   select fkey1, fkey3,
>     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
> precision) as derived1,
>     sum((case when (value > 0.0) then (value * 4) else (value + 1) end)) as
> derived2
>   from values
>   group by fkey1, fkey3
> ) as t1;
>
> -- This query requires a sequential scan on values, though all the data it
> needs could be found much more efficiently with an index scan.
> explain analyze select * from values_view where fkey1 = 1263;
>
> ---
>
> Can anyone suggest a way to rewrite this query? Or if postgres isn't capable
> of optimizing this right now, is there a workaround of some kind? This is a
> view I'd like to be able to join a smaller table against.

this comes up a lot. only way to expose as a view is to push the query
into a set returning function which you then wrap into a view.
downside is that any query except on fkey1/fkey 2 will have to fully
materialize view.

merlin

Re: Poor performance when using a window function in a view

От
Chris Hanks
Дата:
On Wed, Feb 27, 2013 at 10:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

> On Wed, Feb 27, 2013 at 8:22 PM, Chris Hanks
> <christopher.m.hanks@gmail.com> wrote:
> > Hi. Sorry if this is repetitive, I tried posting to pgsql-performance
> first
> > but I think it got stuck in moderation.
> >
> > I'm trying to create a view that uses a window function, but it seems
> that
> > Postgres is unable to optimize it. Here's a reproduction of my situation
> > with 9.2.2:
> >
> > ---
> >
> > drop table if exists values cascade;
> >
> > create table values (
> >   fkey1 integer not null,
> >   fkey2 integer not null,
> >   fkey3 integer not null,
> >   value float not null,
> >   constraint values_pkey primary key (fkey1, fkey2, fkey3)
> > );
> >
> > -- This is kind of hacky, but it roughly resembles the size and
> distribution
> > of my dataset.
> > insert into values select distinct on (fkey1, fkey2, fkey3)
> >   i / 12 + 1 as fkey1,
> >   i % 4 + 1 as fkey2,
> >   ceil(random() * 10) as fkey3,
> >   random() * 2 - 1 as value from generate_series(0, 199999) i;
> >
> > create or replace view values_view as
> > select fkey1, fkey3,
> >   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
> >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
> > from (
> >   select fkey1, fkey3,
> >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
> > precision) as derived1,
> >     sum((case when (value > 0.0) then (value * 4) else (value + 1) end))
> as
> > derived2
> >   from values
> >   group by fkey1, fkey3
> > ) as t1;
> >
> > -- This query requires a sequential scan on values, though all the data
> it
> > needs could be found much more efficiently with an index scan.
> > explain analyze select * from values_view where fkey1 = 1263;
> >
> > ---
> >
> > Can anyone suggest a way to rewrite this query? Or if postgres isn't
> capable
> > of optimizing this right now, is there a workaround of some kind? This
> is a
> > view I'd like to be able to join a smaller table against.
>
> this comes up a lot. only way to expose as a view is to push the query
> into a set returning function which you then wrap into a view.
> downside is that any query except on fkey1/fkey 2 will have to fully
> materialize view.
>
> merlin
>

What would that look like? I've googled around for an example of what
you're talking about, but I'm not finding anything. I think I know how to
write a SQL function that will return a set of rows given a fkey1 value,
but I don't see how I'd turn that into a view...?

Thanks!

Re: Poor performance when using a window function in a view

От
Tom Lane
Дата:
Chris Hanks <christopher.m.hanks@gmail.com> writes:
> create or replace view values_view as
> select fkey1, fkey3,
>   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
> from (
>   select fkey1, fkey3,
>     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
> precision) as derived1,
>     sum((case when (value > 0.0) then (value * 4) else (value + 1) end)) as
> derived2
>   from values
>   group by fkey1, fkey3
> ) as t1;

> -- This query requires a sequential scan on values, though all the data it
> needs could be found much more efficiently with an index scan.
> explain analyze select * from values_view where fkey1 = 1263;

To use the outer WHERE clause as an index constraint, postgres would
have to prove that scanning only the rows with fkey1 = 1263 would still
find all the rows that would get examined by the window functions ---
and in this case, it's not only the window functions that make that less
than obvious, but the grouped aggregates in the sub-select below them.
There's not nearly that amount of intelligence in the system about
window functions, as yet.  So you'll have to write out the query
longhand and put the WHERE clause at the lower level, if you want this
optimization to happen.

            regards, tom lane

Re: Poor performance when using a window function in a view

От
Chris Hanks
Дата:
On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Hanks <christopher.m.hanks@gmail.com> writes:
> create or replace view values_view as
> select fkey1, fkey3,
>   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
> from (
>   select fkey1, fkey3,
>     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
> precision) as derived1,
>     sum((case when (value > 0.0) then (value * 4) else (value + 1) end)) as
> derived2
>   from values
>   group by fkey1, fkey3
> ) as t1;

> -- This query requires a sequential scan on values, though all the data it
> needs could be found much more efficiently with an index scan.
> explain analyze select * from values_view where fkey1 = 1263;

To use the outer WHERE clause as an index constraint, postgres would
have to prove that scanning only the rows with fkey1 = 1263 would still
find all the rows that would get examined by the window functions ---
and in this case, it's not only the window functions that make that less
than obvious, but the grouped aggregates in the sub-select below them.
There's not nearly that amount of intelligence in the system about
window functions, as yet.  So you'll have to write out the query
longhand and put the WHERE clause at the lower level, if you want this
optimization to happen.

                        regards, tom lane

Ok, that makes sense, thanks.

Can anyone point me to an example of wrapping a function in a view, like Merlin suggested? I'm not sure how that would work.

Re: Poor performance when using a window function in a view

От
Raymond O'Donnell
Дата:
On 01/03/2013 00:19, Chris Hanks wrote:
> On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     Chris Hanks <christopher.m.hanks@gmail.com
>     <mailto:christopher.m.hanks@gmail.com>> writes:
>     > create or replace view values_view as
>     > select fkey1, fkey3,
>     >   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>     > from (
>     >   select fkey1, fkey3,
>     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
>     > precision) as derived1,
>     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>     end)) as
>     > derived2
>     >   from values
>     >   group by fkey1, fkey3
>     > ) as t1;
>
>     > -- This query requires a sequential scan on values, though all the
>     data it
>     > needs could be found much more efficiently with an index scan.
>     > explain analyze select * from values_view where fkey1 = 1263;
>
>     To use the outer WHERE clause as an index constraint, postgres would
>     have to prove that scanning only the rows with fkey1 = 1263 would still
>     find all the rows that would get examined by the window functions ---
>     and in this case, it's not only the window functions that make that less
>     than obvious, but the grouped aggregates in the sub-select below them.
>     There's not nearly that amount of intelligence in the system about
>     window functions, as yet.  So you'll have to write out the query
>     longhand and put the WHERE clause at the lower level, if you want this
>     optimization to happen.
>
>                             regards, tom lane
>
>
> Ok, that makes sense, thanks.
>
> Can anyone point me to an example of wrapping a function in a view, like
> Merlin suggested? I'm not sure how that would work.

Off the top of my head, I'd imagine it's as simple as:

  create view ... as
  select * from my_function(...);

:-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: Poor performance when using a window function in a view

От
Chris Hanks
Дата:
On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 01/03/2013 00:19, Chris Hanks wrote:
> On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     Chris Hanks <christopher.m.hanks@gmail.com
>     <mailto:christopher.m.hanks@gmail.com>> writes:
>     > create or replace view values_view as
>     > select fkey1, fkey3,
>     >   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>     > from (
>     >   select fkey1, fkey3,
>     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
>     > precision) as derived1,
>     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>     end)) as
>     > derived2
>     >   from values
>     >   group by fkey1, fkey3
>     > ) as t1;
>
>     > -- This query requires a sequential scan on values, though all the
>     data it
>     > needs could be found much more efficiently with an index scan.
>     > explain analyze select * from values_view where fkey1 = 1263;
>
>     To use the outer WHERE clause as an index constraint, postgres would
>     have to prove that scanning only the rows with fkey1 = 1263 would still
>     find all the rows that would get examined by the window functions ---
>     and in this case, it's not only the window functions that make that less
>     than obvious, but the grouped aggregates in the sub-select below them.
>     There's not nearly that amount of intelligence in the system about
>     window functions, as yet.  So you'll have to write out the query
>     longhand and put the WHERE clause at the lower level, if you want this
>     optimization to happen.
>
>                             regards, tom lane
>
>
> Ok, that makes sense, thanks.
>
> Can anyone point me to an example of wrapping a function in a view, like
> Merlin suggested? I'm not sure how that would work.

Off the top of my head, I'd imagine it's as simple as:

  create view ... as
  select * from my_function(...);

:-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Sorry, I don't understand. I'm able to make a function that takes an integer and uses it in the subselect as "WHERE fkey1 = arg", and that works as I expect it to and it's plenty fast. But I don't see how to write a view to take advantage of this function - what arguments would go in my_function(...) when I'm declaring the view?

Chris

Re: Poor performance when using a window function in a view

От
Merlin Moncure
Дата:
On Fri, Mar 1, 2013 at 3:59 AM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>>
>> On 01/03/2013 00:19, Chris Hanks wrote:
>> > On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
>> > <mailto:tgl@sss.pgh.pa.us>> wrote:
>> >
>> >     Chris Hanks <christopher.m.hanks@gmail.com
>> >     <mailto:christopher.m.hanks@gmail.com>> writes:
>> >     > create or replace view values_view as
>> >     > select fkey1, fkey3,
>> >     >   (derived1 / max(derived1) over (partition by fkey1)) as
>> > derived1,
>> >     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>> >     > from (
>> >     >   select fkey1, fkey3,
>> >     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as
>> > double
>> >     > precision) as derived1,
>> >     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>> >     end)) as
>> >     > derived2
>> >     >   from values
>> >     >   group by fkey1, fkey3
>> >     > ) as t1;
>> >
>> >     > -- This query requires a sequential scan on values, though all the
>> >     data it
>> >     > needs could be found much more efficiently with an index scan.
>> >     > explain analyze select * from values_view where fkey1 = 1263;
>> >
>> >     To use the outer WHERE clause as an index constraint, postgres would
>> >     have to prove that scanning only the rows with fkey1 = 1263 would
>> > still
>> >     find all the rows that would get examined by the window functions
>> > ---
>> >     and in this case, it's not only the window functions that make that
>> > less
>> >     than obvious, but the grouped aggregates in the sub-select below
>> > them.
>> >     There's not nearly that amount of intelligence in the system about
>> >     window functions, as yet.  So you'll have to write out the query
>> >     longhand and put the WHERE clause at the lower level, if you want
>> > this
>> >     optimization to happen.
>> >
>> >                             regards, tom lane
>> >
>> >
>> > Ok, that makes sense, thanks.
>> >
>> > Can anyone point me to an example of wrapping a function in a view, like
>> > Merlin suggested? I'm not sure how that would work.
>>
>> Off the top of my head, I'd imagine it's as simple as:
>>
>>   create view ... as
>>   select * from my_function(...);
>>
>> :-)
>>
>> Ray.
>>
>>
>> --
>> Raymond O'Donnell :: Galway :: Ireland
>> rod@iol.ie
>
>
>
> Sorry, I don't understand. I'm able to make a function that takes an integer
> and uses it in the subselect as "WHERE fkey1 = arg", and that works as I
> expect it to and it's plenty fast. But I don't see how to write a view to
> take advantage of this function - what arguments would go in
> my_function(...) when I'm declaring the view?

First let's clearly state the problem - create some test data:
postgres=# create table foo as select v1, v2 from (select
generate_series(1,10) v1) q1, (select generate_series(1,10000) v2) q2;
postgres=# create table bar as select v1 from generate_series(1,10) v1;

foo is data table, bar defines the key around which we have the data.

create an index:
postgres=# create index on foo(v1);

simple lookup is optimized obviously:
postgres=# explain select * from foo where v1 = 7;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using foo_v1_idx on foo  (cost=0.00..4.38 rows=1 width=8)
   Index Cond: (v1 = 50)

get indexed lookup for simple window function:
postgres=# explain select v1, sum(v2) over(partition by v1 order by
v2) from foo where v1 = 7;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 WindowAgg  (cost=4.39..4.41 rows=1 width=8)
   ->  Sort  (cost=4.39..4.39 rows=1 width=8)
         Sort Key: v2
         ->  Index Scan using foo_v1_idx on foo  (cost=0.00..4.38
rows=1 width=8)
               Index Cond: (v1 = 50)

push to subquery (essentially what view does) and we lose the optimization:
postgres=# explain select * from (select v1, sum(v2) over(partition by
v1 order by v2) from foo) q  where v1 = 7;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Subquery Scan on q  (cost=9747.82..12997.82 rows=1 width=12)
   Filter: (q.v1 = 50)
   ->  WindowAgg  (cost=9747.82..11747.82 rows=100000 width=8)
         ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)
               Sort Key: foo.v1, foo.v2
               ->  Seq Scan on foo  (cost=0.00..1443.00 rows=100000 width=8)

let's work around it!
postgres=# CREATE OR REPLACE FUNCTION getfoovals(
  v1 INOUT int,
  sumv2 OUT BIGINT) RETURNS SETOF RECORD AS
$$
  select
    v1,
    sum(v2)
    over(partition by v1 order by v2)
  from foo where foo.v1 = getfoovals.v1;
$$ LANGUAGE SQL STABLE;

select * from getfoovals(7);
 v1 |  sumv2
----+----------
  7 |        1
  7 |        3
  7 |        6
  7 |       10
  7 |       15
  7 |       21
<snip>

abstract to view:
postgres=# CREATE OR REPLACE VIEW foovals AS
  select
    v1,
    (getfoovals(v1)).sumv2
  from bar;

postgres=#  select * from foovals where v1 = 7;

Main problem with this technique is awkwardness around using column
lest set returning function if it returns > 1 column...if you'r not
careful you can get extra invocations of function.   you work hack
around this to some degree via:
*) offset 0 hacks
*) return type coersion hacks
*) 9.3 LATERAL feature completely nails it

Can't use WITH because it fences of the optimization.

This (window function optimization fencing) is probably #1 performance
gotcha I hit in everyday coding after LATERAL and lack of better
ability to inline simple SQL functions.

merlin


Re: Poor performance when using a window function in a view

От
Misa Simic
Дата:
Hi Chris,

You don't need to make a a full view - to join it later to "less rows number table")

If you have, function what takes fkey1 as input parameter and returns SET OF (type of your values_view)

i.e.

CREATE OR REPLACE FUNCTION get_filtered_values_view(in_fkey1 integer)
  RETURNS SETOF values_view AS
$BODY$select fkey1, fkey3,
  (derived1 / max(derived1) over (partition by fkey1)) as derived1,
  (derived2 / sum(derived1) over (partition by fkey1)) as derived2
FROM

(
select fkey1, fkey3,
    cast(sum((case when (value > 0.0) then 4 else 1 end)) as double precision) as derived1,
    sum((case when (value > 0.0) then (value * 4) else (value + 1) end)) as derived2
  from values
  group by fkey1, fkey3
  having fkey1 = $1
) t$BODY$
  LANGUAGE sql STABLE
  COST 100
  ROWS 1000;
ALTER FUNCTION get_filtered_values_view(integer)
  OWNER TO postgres;

Then you can make new function what takes values from table you would like join to view:

CREATE OR REPLACE FUNCTION get_filtered_values_view_joined()
  RETURNS SETOF values_view AS
$BODY$
SELECT get_filtered_values_view(
    fkey1
)
FROM 
(SELECT DISTINCT fkey1 FROM smaller_table_for_join) t
$BODY$
  LANGUAGE sql STABLE
  COST 100
  ROWS 1000;
ALTER FUNCTION get_filtered_values_joined()
  OWNER TO postgres;


then you can encapsulate it to final view:
CREATE VIEW final_derived_view AS
SELECT * FROM get_filtered_values_joined()


2013/3/1 Chris Hanks <christopher.m.hanks@gmail.com>
On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 01/03/2013 00:19, Chris Hanks wrote:
> On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     Chris Hanks <christopher.m.hanks@gmail.com
>     <mailto:christopher.m.hanks@gmail.com>> writes:
>     > create or replace view values_view as
>     > select fkey1, fkey3,
>     >   (derived1 / max(derived1) over (partition by fkey1)) as derived1,
>     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>     > from (
>     >   select fkey1, fkey3,
>     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as double
>     > precision) as derived1,
>     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>     end)) as
>     > derived2
>     >   from values
>     >   group by fkey1, fkey3
>     > ) as t1;
>
>     > -- This query requires a sequential scan on values, though all the
>     data it
>     > needs could be found much more efficiently with an index scan.
>     > explain analyze select * from values_view where fkey1 = 1263;
>
>     To use the outer WHERE clause as an index constraint, postgres would
>     have to prove that scanning only the rows with fkey1 = 1263 would still
>     find all the rows that would get examined by the window functions ---
>     and in this case, it's not only the window functions that make that less
>     than obvious, but the grouped aggregates in the sub-select below them.
>     There's not nearly that amount of intelligence in the system about
>     window functions, as yet.  So you'll have to write out the query
>     longhand and put the WHERE clause at the lower level, if you want this
>     optimization to happen.
>
>                             regards, tom lane
>
>
> Ok, that makes sense, thanks.
>
> Can anyone point me to an example of wrapping a function in a view, like
> Merlin suggested? I'm not sure how that would work.

Off the top of my head, I'd imagine it's as simple as:

  create view ... as
  select * from my_function(...);

:-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Sorry, I don't understand. I'm able to make a function that takes an integer and uses it in the subselect as "WHERE fkey1 = arg", and that works as I expect it to and it's plenty fast. But I don't see how to write a view to take advantage of this function - what arguments would go in my_function(...) when I'm declaring the view?

Chris

Re: Poor performance when using a window function in a view

От
Chris Hanks
Дата:
On Fri, Mar 1, 2013 at 6:37 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, Mar 1, 2013 at 3:59 AM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>>
>> On 01/03/2013 00:19, Chris Hanks wrote:
>> > On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
>> > <mailto:tgl@sss.pgh.pa.us>> wrote:
>> >
>> >     Chris Hanks <christopher.m.hanks@gmail.com
>> >     <mailto:christopher.m.hanks@gmail.com>> writes:
>> >     > create or replace view values_view as
>> >     > select fkey1, fkey3,
>> >     >   (derived1 / max(derived1) over (partition by fkey1)) as
>> > derived1,
>> >     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>> >     > from (
>> >     >   select fkey1, fkey3,
>> >     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as
>> > double
>> >     > precision) as derived1,
>> >     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>> >     end)) as
>> >     > derived2
>> >     >   from values
>> >     >   group by fkey1, fkey3
>> >     > ) as t1;
>> >
>> >     > -- This query requires a sequential scan on values, though all the
>> >     data it
>> >     > needs could be found much more efficiently with an index scan.
>> >     > explain analyze select * from values_view where fkey1 = 1263;
>> >
>> >     To use the outer WHERE clause as an index constraint, postgres would
>> >     have to prove that scanning only the rows with fkey1 = 1263 would
>> > still
>> >     find all the rows that would get examined by the window functions
>> > ---
>> >     and in this case, it's not only the window functions that make that
>> > less
>> >     than obvious, but the grouped aggregates in the sub-select below
>> > them.
>> >     There's not nearly that amount of intelligence in the system about
>> >     window functions, as yet.  So you'll have to write out the query
>> >     longhand and put the WHERE clause at the lower level, if you want
>> > this
>> >     optimization to happen.
>> >
>> >                             regards, tom lane
>> >
>> >
>> > Ok, that makes sense, thanks.
>> >
>> > Can anyone point me to an example of wrapping a function in a view, like
>> > Merlin suggested? I'm not sure how that would work.
>>
>> Off the top of my head, I'd imagine it's as simple as:
>>
>>   create view ... as
>>   select * from my_function(...);
>>
>> :-)
>>
>> Ray.
>>
>>
>> --
>> Raymond O'Donnell :: Galway :: Ireland
>> rod@iol.ie
>
>
>
> Sorry, I don't understand. I'm able to make a function that takes an integer
> and uses it in the subselect as "WHERE fkey1 = arg", and that works as I
> expect it to and it's plenty fast. But I don't see how to write a view to
> take advantage of this function - what arguments would go in
> my_function(...) when I'm declaring the view?

First let's clearly state the problem - create some test data:
postgres=# create table foo as select v1, v2 from (select
generate_series(1,10) v1) q1, (select generate_series(1,10000) v2) q2;
postgres=# create table bar as select v1 from generate_series(1,10) v1;

foo is data table, bar defines the key around which we have the data.

create an index:
postgres=# create index on foo(v1);

simple lookup is optimized obviously:
postgres=# explain select * from foo where v1 = 7;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using foo_v1_idx on foo  (cost=0.00..4.38 rows=1 width=8)
   Index Cond: (v1 = 50)

get indexed lookup for simple window function:
postgres=# explain select v1, sum(v2) over(partition by v1 order by
v2) from foo where v1 = 7;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 WindowAgg  (cost=4.39..4.41 rows=1 width=8)
   ->  Sort  (cost=4.39..4.39 rows=1 width=8)
         Sort Key: v2
         ->  Index Scan using foo_v1_idx on foo  (cost=0.00..4.38
rows=1 width=8)
               Index Cond: (v1 = 50)

push to subquery (essentially what view does) and we lose the optimization:
postgres=# explain select * from (select v1, sum(v2) over(partition by
v1 order by v2) from foo) q  where v1 = 7;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Subquery Scan on q  (cost=9747.82..12997.82 rows=1 width=12)
   Filter: (q.v1 = 50)
   ->  WindowAgg  (cost=9747.82..11747.82 rows=100000 width=8)
         ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)
               Sort Key: foo.v1, foo.v2
               ->  Seq Scan on foo  (cost=0.00..1443.00 rows=100000 width=8)

let's work around it!
postgres=# CREATE OR REPLACE FUNCTION getfoovals(
  v1 INOUT int,
  sumv2 OUT BIGINT) RETURNS SETOF RECORD AS
$$
  select
    v1,
    sum(v2)
    over(partition by v1 order by v2)
  from foo where foo.v1 = getfoovals.v1;
$$ LANGUAGE SQL STABLE;

select * from getfoovals(7);
 v1 |  sumv2
----+----------
  7 |        1
  7 |        3
  7 |        6
  7 |       10
  7 |       15
  7 |       21
<snip>

abstract to view:
postgres=# CREATE OR REPLACE VIEW foovals AS
  select
    v1,
    (getfoovals(v1)).sumv2
  from bar;

postgres=#  select * from foovals where v1 = 7;

Main problem with this technique is awkwardness around using column
lest set returning function if it returns > 1 column...if you'r not
careful you can get extra invocations of function.   you work hack
around this to some degree via:
*) offset 0 hacks
*) return type coersion hacks
*) 9.3 LATERAL feature completely nails it

Can't use WITH because it fences of the optimization.

This (window function optimization fencing) is probably #1 performance
gotcha I hit in everyday coding after LATERAL and lack of better
ability to inline simple SQL functions.

merlin




Hi - thanks everyone for your input. I bit the bullet and just made a materialized view for now. I got Merlin's suggestion working, though - I'm going to revisit it when 9.3 comes out and I can use LATERAL.

Thanks again!
Chris