Обсуждение: SELECT LIMIT 1 VIEW Performance Issue

Поиск
Список
Период
Сортировка

SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Hi All,

I previously posted the following as a sequel to my SELECT DISTINCT
Performance Issue question. We would most appreciate any clue or
suggestions on how to overcome this show-stopping issue. We are using 8.0.3
on Windows.

Is it a known limitation when using a view with SELECT ... LIMIT 1?

Would the forthcoming performance enhancement with MAX help when used
within a view, as in:

create or replace view VCurPlayer as select * from Player a
where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID =
b.PlayerID);

select PlayerID,AtDate from VCurPlayer where PlayerID='22220';

Thanks and regards,
KC.

---------

At 19:45 05/06/06, PFC wrote:

>>Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but
>>the performance was no better:
>>select PlayerID,AtDate from Player where PlayerID='22220' order by
>>PlayerID desc, AtDate desc LIMIT 1
>
>         The DISTINCT query will pull out all the rows and keep only one,
> so the
>one with LIMIT should be faster. Can you post explain analyze of the LIMIT
>query ?

Actually the problem with LIMIT 1 query is when we use views with the LIMIT
1 construct. The direct SQL is ok:

esdt=> explain analyze select PlayerID,AtDate from Player where
PlayerID='22220'
  order by PlayerID desc, AtDate desc LIMIT 1;

  Limit  (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 rows=1
loops=1)
    ->  Index Scan Backward using pk_player on player  (cost=0.00..16074.23
rows=11770 width=23) (actual time=0.000..0.000 rows=1 loops=1)
          Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 0.000 ms

esdt=> create or replace view VCurPlayer3 as select * from Player a
where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID
order by b.PlayerID desc, b.AtDate desc LIMIT 1);

esdt=> explain analyze select PlayerID,AtDate,version from VCurPlayer3
where PlayerID='22220';
  Index Scan using pk_player on player a  (cost=0.00..33072.78 rows=59
width=27)
(actual time=235.000..235.000 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = ((subplan))::text)
    SubPlan
      ->  Limit  (cost=0.00..1.44 rows=1 width=23) (actual
time=0.117..0.117 rows=1 loops=1743)
            ->  Index Scan Backward using pk_player on player
b  (cost=0.00..14023.67 rows=9727 width=23) (actual time=0.108..0.108
rows=1 loops=1743)
                  Index Cond: (($0)::text = (playerid)::text)
  Total runtime: 235.000 ms

The problem appears to be in the loops=1743 scanning all 1743 data records
for that player.

Regards, KC.



Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Hi All,

Investigating further on this problem I brought up in June, the following
query with pg 8.0.3 on Windows scans all 1743 data records for a player:

esdt=> explain analyze select PlayerID,AtDate from Player a
  where PlayerID='22220' and AtDate = (select b.AtDate from Player b
  where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
LIMIT 1);

  Index Scan using pk_player on player a  (cost=0.00..2789.07 rows=9
width=23) (a
ctual time=51.046..51.049 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = ((subplan))::text)
    SubPlan
      ->  Limit  (cost=0.00..0.83 rows=1 width=23) (actual
time=0.016..0.017 rows
=1 loops=1743)
            ->  Index Scan Backward using pk_player on player
b  (cost=0.00..970.
53 rows=1166 width=23) (actual time=0.011..0.011 rows=1 loops=1743)
                  Index Cond: ((playerid)::text = ($0)::text)
  Total runtime: 51.133 ms

Using a static value in the subquery produces the desired result below, but
since we use views for our queries (see last part of this email), we cannot
push the static value into the subquery:

esdt=> explain analyze select PlayerID,AtDate from Player a
  where PlayerID='22220' and AtDate = (select b.AtDate from Player b
  where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

  Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23)
(actu
al time=0.054..0.058 rows=1 loops=1)
    Index Cond: (((playerid)::text = '22220'::text) AND ((atdate)::text =
($0)::t
ext))
    InitPlan
      ->  Limit  (cost=0.00..0.75 rows=1 width=23) (actual
time=0.028..0.029 rows
=1 loops=1)
            ->  Index Scan Backward using pk_player on player
b  (cost=0.00..1323
.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 loops=1)
                  Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 0.149 ms

The Player table has a primary key on PlayerID, AtDate. Is there a way to
stop the inner-most index scan looping all 1743 data records for that
player?  Is that a bug or known issue?

BTW, I tried using 8.1 beta2 on Windows and its performance is similar, I
have also tried other variants such as MAX and DISTINCT but with no success.

Any help is most appreciated.

Best regards,
KC.


At 10:46 05/06/15, K C Lau wrote:
>Hi All,
>
>I previously posted the following as a sequel to my SELECT DISTINCT
>Performance Issue question. We would most appreciate any clue or
>suggestions on how to overcome this show-stopping issue. We are using
>8.0.3 on Windows.
>
>Is it a known limitation when using a view with SELECT ... LIMIT 1?
>
>Would the forthcoming performance enhancement with MAX help when used
>within a view, as in:
>
>create or replace view VCurPlayer as select * from Player a
>where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID =
>b.PlayerID);
>
>select PlayerID,AtDate from VCurPlayer where PlayerID='22220';
>
>Thanks and regards,
>KC.
>
>---------
>
>Actually the problem with LIMIT 1 query is when we use views with the
>LIMIT 1 construct. The direct SQL is ok:
>
>esdt=> explain analyze select PlayerID,AtDate from Player where
>PlayerID='22220'
>  order by PlayerID desc, AtDate desc LIMIT 1;
>
>  Limit  (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000
> rows=1 loops=1)
>    ->  Index Scan Backward using pk_player on
> player  (cost=0.00..16074.23 rows=11770 width=23) (actual
> time=0.000..0.000 rows=1 loops=1)
>          Index Cond: ((playerid)::text = '22220'::text)
>  Total runtime: 0.000 ms
>
>esdt=> create or replace view VCurPlayer3 as select * from Player a
>where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID
>order by b.PlayerID desc, b.AtDate desc LIMIT 1);
>
>esdt=> explain analyze select PlayerID,AtDate,version from VCurPlayer3
>where PlayerID='22220';
>  Index Scan using pk_player on player a  (cost=0.00..33072.78 rows=59
> width=27)
>(actual time=235.000..235.000 rows=1 loops=1)
>    Index Cond: ((playerid)::text = '22220'::text)
>    Filter: ((atdate)::text = ((subplan))::text)
>    SubPlan
>      ->  Limit  (cost=0.00..1.44 rows=1 width=23) (actual
> time=0.117..0.117 rows=1 loops=1743)
>            ->  Index Scan Backward using pk_player on player
> b  (cost=0.00..14023.67 rows=9727 width=23) (actual time=0.108..0.108
> rows=1 loops=1743)
>                  Index Cond: (($0)::text = (playerid)::text)
>  Total runtime: 235.000 ms


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Simon Riggs
Дата:
On Thu, 2005-09-22 at 12:21 +0800, K C Lau wrote:

> Investigating further on this problem I brought up in June, the following
> query with pg 8.0.3 on Windows scans all 1743 data records for a player:
>
> esdt=> explain analyze select PlayerID,AtDate from Player a
>   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
>   where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
> LIMIT 1);
>

>   Total runtime: 51.133 ms
>
> Using a static value in the subquery produces the desired result below, but
> since we use views for our queries (see last part of this email), we cannot
> push the static value into the subquery:
>
> esdt=> explain analyze select PlayerID,AtDate from Player a
>   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
>   where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

>   Total runtime: 0.149 ms
>
> The Player table has a primary key on PlayerID, AtDate. Is there a way to
> stop the inner-most index scan looping all 1743 data records for that
> player?  Is that a bug or known issue?

Currently the planner can't tell whether a subquery is correlated or not
until it has planned the query. So it is unable to push down the
qualification automatically in the way you have achieved manually. The
new min() optimisation doesn't yet work with GROUP BY which is what you
would use to reformulate the query that way, so no luck that way either.

If you don't want to do this in a view, calculate the values for all
players at once and store the values in a summary table for when you
need them.

Best Regards, Simon Riggs



Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
We use similar views as base views throughout our OLTP system to get the
latest time-based record(s). So it is quite impossible to use summary
tables etc. Are there other ways to do it?

The subquery would pinpoint the record(s) with the composite primary key.
Both MS Sql and Oracle do not have such performance problem. So this
problem is effectively stopping us from migrating to PostgreSQL.

Any suggestions would be most appreciated.

Best regards,
KC.

At 16:40 05/09/22, Simon Riggs wrote:
>On Thu, 2005-09-22 at 12:21 +0800, K C Lau wrote:
>
> > Investigating further on this problem I brought up in June, the following
> > query with pg 8.0.3 on Windows scans all 1743 data records for a player:
> >
> > esdt=> explain analyze select PlayerID,AtDate from Player a
> >   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
> >   where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
> > LIMIT 1);
> >
>
> >   Total runtime: 51.133 ms
> >
> > Using a static value in the subquery produces the desired result below,
> but
> > since we use views for our queries (see last part of this email), we
> cannot
> > push the static value into the subquery:
> >
> > esdt=> explain analyze select PlayerID,AtDate from Player a
> >   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
> >   where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc
> LIMIT 1);
>
> >   Total runtime: 0.149 ms
> >
> > The Player table has a primary key on PlayerID, AtDate. Is there a way to
> > stop the inner-most index scan looping all 1743 data records for that
> > player?  Is that a bug or known issue?
>
>Currently the planner can't tell whether a subquery is correlated or not
>until it has planned the query. So it is unable to push down the
>qualification automatically in the way you have achieved manually. The
>new min() optimisation doesn't yet work with GROUP BY which is what you
>would use to reformulate the query that way, so no luck that way either.
>
>If you don't want to do this in a view, calculate the values for all
>players at once and store the values in a summary table for when you
>need them.
>
>Best Regards, Simon Riggs
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Simon Riggs
Дата:
On Thu, 2005-09-22 at 18:40 +0800, K C Lau wrote:
> We use similar views as base views throughout our OLTP system to get the
> latest time-based record(s). So it is quite impossible to use summary
> tables etc. Are there other ways to do it?
>
> The subquery would pinpoint the record(s) with the composite primary key.
> Both MS Sql and Oracle do not have such performance problem. So this
> problem is effectively stopping us from migrating to PostgreSQL.
>
> Any suggestions would be most appreciated.

Even if this were fixed for 8.1, which seems unlikely, would you be able
to move to that release immediately?

ISTM you have two choices, in priority, complexity and time/cost order
1) custom mods to your app
2) custom mods to PostgreSQL

Maybe its possible to reconstruct your query with sub-sub-selects so
that you have a correlated query with manually pushed down clauses,
which also references a more constant base view?

Is a 51ms query really such a problem for you?

Best Regards, Simon Riggs



Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Merlin Moncure"
Дата:
> >I previously posted the following as a sequel to my SELECT DISTINCT
> >Performance Issue question. We would most appreciate any clue or
> >suggestions on how to overcome this show-stopping issue. We are using
> >8.0.3 on Windows.
> >
> >Is it a known limitation when using a view with SELECT ... LIMIT 1?
> >
> >Would the forthcoming performance enhancement with MAX help when used
> >within a view, as in:
> >
> >create or replace view VCurPlayer as select * from Player a
> >where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID
=
> >b.PlayerID);

Here is a trick I use sometimes with views, etc.  This may or may not be
effective to solve your problem but it's worth a shot.  Create one small
SQL function taking date, etc. and returning the values and define it
immutable.  Now in-query it is treated like a constant.

Another useful application for this feature is when you have nested
views (view 1 queries view 2) and you need to filter records based on
fields from view 2 which are not returned in view 1.  Impossible?

in view 2 add clause where v2.f between  f_min() and f_max(), them being
immutable functions which can grab filter criteria based on inputs or
values from a table.

Merlin

Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
At 20:48 05/09/22, Simon Riggs wrote:
>On Thu, 2005-09-22 at 18:40 +0800, K C Lau wrote:
> > We use similar views as base views throughout our OLTP system to get the
> > latest time-based record(s). So it is quite impossible to use summary
> > tables etc. Are there other ways to do it?
> >
> > The subquery would pinpoint the record(s) with the composite primary key.
> > Both MS Sql and Oracle do not have such performance problem. So this
> > problem is effectively stopping us from migrating to PostgreSQL.
> >
> > Any suggestions would be most appreciated.
>
>Even if this were fixed for 8.1, which seems unlikely, would you be able
>to move to that release immediately?

Yes. In fact when we first developed our system a few years ago, we tested
on MS7.0, Oracle 8 and PG 7.1.1 and we did not hit that problem. When we
try again with PG 8.0, the performance becomes unbearable, but other areas
appear ok and other queries are often faster than MS Sql2k.

>Maybe its possible to reconstruct your query with sub-sub-selects so
>that you have a correlated query with manually pushed down clauses,
>which also references a more constant base view?

We would be most happy to try them if we have some example views or pointers.

>Is a 51ms query really such a problem for you?

Unfortunately yes, as our target performance is in the high hundreds of
transactions per sec. And 51 ms is already the best case for a single
select, with everything cached in memory immediately after the same select
which took 390 ms on a quiet system.

>Best Regards, Simon Riggs

Best regards,
KC.


Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
At 22:37 05/09/22, Merlin Moncure wrote:

> > >create or replace view VCurPlayer as select * from Player a
> > >where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID=
> > >b.PlayerID);
>
>Here is a trick I use sometimes with views, etc.  This may or may not be
>effective to solve your problem but it's worth a shot.  Create one small
>SQL function taking date, etc. and returning the values and define it
>immutable.  Now in-query it is treated like a constant.

We don't use functions as a rule, but I would be glad to give it a try.
I would most appreciate if you could define a sample function and rewrite
the VCurPlayer view above. Both PlayerID and AtDate are varchar fields.

>Another useful application for this feature is when you have nested
>views (view 1 queries view 2) and you need to filter records based on
>fields from view 2 which are not returned in view 1.  Impossible?
>
>in view 2 add clause where v2.f between  f_min() and f_max(), them being
>immutable functions which can grab filter criteria based on inputs or
>values from a table.
>
>Merlin

Best regards,
KC.


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Tom Lane
Дата:
K C Lau <kclau60@netvigator.com> writes:
> At 20:48 05/09/22, Simon Riggs wrote:
>> Even if this were fixed for 8.1, which seems unlikely, would you be able
>> to move to that release immediately?

> Yes. In fact when we first developed our system a few years ago, we tested
> on MS7.0, Oracle 8 and PG 7.1.1 and we did not hit that problem.

It's really not credible that PG 7.1 did any better with this than
current sources do.  The subplan mechanism hasn't changed materially
since about 6.5.  It could be that 7.1's performance was simply so
bad across the board that you didn't notice ...

            regards, tom lane

Re: SELECT LIMIT 1 VIEW Performance Issue

От
Simon Riggs
Дата:
On Thu, 2005-09-22 at 22:39 +0800, K C Lau wrote:
> >Is a 51ms query really such a problem for you?
>
> Unfortunately yes, as our target performance is in the high hundreds of
> transactions per sec. And 51 ms is already the best case for a single
> select, with everything cached in memory immediately after the same select
> which took 390 ms on a quiet system.

If the current value is used so often, use two tables - one with a
current view only of the row maintained using UPDATE. Different
performance issues maybe, but at least not correlated subquery ones.

Best Regards, Simon Riggs



Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Merlin Moncure"
Дата:
> >Here is a trick I use sometimes with views, etc.  This may or may not
be
> >effective to solve your problem but it's worth a shot.  Create one
small
> >SQL function taking date, etc. and returning the values and define it
> >immutable.  Now in-query it is treated like a constant.
>
> We don't use functions as a rule, but I would be glad to give it a
try.
> I would most appreciate if you could define a sample function and
rewrite
> the VCurPlayer view above. Both PlayerID and AtDate are varchar
fields.

> esdt=> explain analyze select PlayerID,AtDate from Player a
>   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
>   where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc
LIMIT 1
try:

create function player_max_at_date (varchar) returns date as
$$
    select atdate from player where playerid = $1 order by playerid
desc, AtDate desc limit 1;
$$ language sql immutable;

create view v as select playerid, player_max_at_date(playerid) from
player;
select * from v where playerid = 'x'; --etc

note: this function is not really immutable.  try with both 'immutable'
and 'stable' if performance is same, do stable.

You're welcome in advance, ;)
Merlin



Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Kevin Grittner"
Дата:
Have you tried the "best choice" pattern -- where you select the set of
candidate rows and then exclude those for which a better choice
exists within the set?  I often get better results with this pattern than
with the alternatives.  Transmuting your query to use this patter gives:

select PlayerID,AtDate from Player a where PlayerID='22220'
  and not exists
    (select * from Player b
      where b.PlayerID = a.PlayerID and b.AtDate > a.AtDate);

>>> K C Lau <kclau60@netvigator.com> 09/21/05 11:21 PM >>>

select PlayerID,AtDate from Player a
  where PlayerID='22220' and AtDate = (select b.AtDate from Player b
  where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
LIMIT 1);


Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Thank you all for your suggestions. I' tried, with some variations too, but
still no success. The times given are the best of a few repeated tries on
an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.

For reference, only the following gets the record quickly:

esdt=> explain analyze select PlayerID,AtDate from Player a
  where PlayerID='22220' and AtDate = (select b.AtDate from Player b
  where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

  Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23)
(actual time=0.054..0.057 rows=1 loops=1)
    Index Cond: (((playerid)::text = '22220'::text) AND ((atdate)::text =
($0)::text))
    InitPlan
      ->  Limit  (cost=0.00..0.75 rows=1 width=23) (actual
time=0.027..0.028 rows=1 loops=1)
            ->  Index Scan Backward using pk_player on player
b  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1
loops=1)
                  Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 0.132 ms

At 02:19 05/09/23, Kevin Grittner wrote:
>Have you tried the "best choice" pattern -- where you select the set of
>candidate rows and then exclude those for which a better choice
>exists within the set?  I often get better results with this pattern than
>with the alternatives.

esdt=> explain analyze select PlayerID,AtDate from Player a where
PlayerID='22220'
and not exists (select * from Player b where b.PlayerID = a.PlayerID and
b.AtDate > a.AtDate);

  Index Scan using pk_player on player a  (cost=0.00..3032.46 rows=878
width=23)
(actual time=35.820..35.823 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: (NOT (subplan))
    SubPlan
      ->  Index Scan using pk_player on player b  (cost=0.00..378.68
rows=389 width=776) (actual time=0.013..0.013 rows=1 loops=1743)
            Index Cond: (((playerid)::text = ($0)::text) AND
((atdate)::text > ($1)::text))
  Total runtime: 35.950 ms

Note that it is faster than the LIMIT 1:

esdt=> explain analyze select PlayerID,AtDate from Player a where
PlayerID='22220' and AtDate = (select b.AtDate from Pl
ayer b where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate
desc LIMIT 1);

  Index Scan using pk_player on player a  (cost=0.00..2789.07 rows=9
width=23) (actual time=41.366..41.371 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = ((subplan))::text)
    SubPlan
      ->  Limit  (cost=0.00..0.83 rows=1 width=23) (actual
time=0.013..0.014 rows=1 loops=1743)
            ->  Index Scan Backward using pk_player on player
b  (cost=0.00..970.53 rows=1166 width=23) (actual time=0.008..0.008 rows=1
loops=1743)
                  Index Cond: ((playerid)::text = ($0)::text)
  Total runtime: 41.490 ms

At 02:07 05/09/23, Merlin Moncure wrote:
> > >Here is a trick I use sometimes with views, etc.  This may or may not be
> > >effective to solve your problem but it's worth a shot.  Create one small
> > >SQL function taking date, etc. and returning the values and define it
> > >immutable.  Now in-query it is treated like a constant.

esdt=> create or replace function player_max_atdate (varchar(32)) returns
varchar(32) as $$
esdt$>  select atdate from player where playerid = $1 order by playerid
desc, AtDate desc limit 1;
esdt$> $$ language sql immutable;
CREATE FUNCTION
esdt=> create or replace view VCurPlayer3 as select * from Player where
AtDate = player_max_atdate(PlayerID);
CREATE VIEW
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';

  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
width=23) (actual time=65.434..65.439 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
  Total runtime: 65.508 ms

While it says loops=1, the time suggests that it is going through all 1743
records for that PlayerID.

I tried to simulate the fast subquery inside the function, but it is taking
almost twice as much time:

esdt=> create or replace function player_max_atdate (varchar(32)) returns
varchar(32) as $$
esdt$>  select atdate from player a where playerid = $1 and AtDate =
(select b.AtDate from Player b
esdt$>  where b.PlayerID = $1 order by b.PlayerID desc, b.AtDate desc LIMIT 1);
esdt$> $$ language sql immutable;
CREATE FUNCTION
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';

  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
width=23) (actual time=119.369..119.373 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
  Total runtime: 119.441 ms

Adding another LIMIT 1 inside the function makes it even slower:

esdt=> create or replace function player_max_atdate (varchar(32)) returns
varchar(32) as $$
esdt$>  select atdate from player where playerid = $1 and AtDate = (select
b.AtDate from Player b
esdt$>  where b.PlayerID = $1 order by b.PlayerID desc, b.AtDate desc LIMIT 1)
esdt$>  order by PlayerID desc, AtDate desc LIMIT 1;
esdt$> $$ language sql immutable;
CREATE FUNCTION
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';

  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
width=23) (actual time=129.858..129.863 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
  Total runtime: 129.906 ms

At 00:16 05/09/23, Simon Riggs wrote:
>If the current value is used so often, use two tables - one with a
>current view only of the row maintained using UPDATE. Different
>performance issues maybe, but at least not correlated subquery ones.

Many of our tables have similar construct and it would be a huge task to
duplicate and maintain all these tables throughout the system. We would
prefer a solution with SQL or function at the view or db level, or better
still, a fix, if this problem is considered general enough.

>You're welcome in advance, ;)
>Merlin

Thank you all in advance for any further ideas.
KC.


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Simon Riggs
Дата:
On Thu, 2005-09-22 at 18:40 +0800, K C Lau wrote:
> > > esdt=> explain analyze select PlayerID,AtDate from Player a
> > >   where PlayerID='22220' and AtDate = (select b.AtDate from Player b
> > >   where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
> > > LIMIT 1);

I think you should try:

select distinct on (PlayerID) PlayerID,AtDate from Player a
where PlayerID='22220' order by PlayerId, AtDate Desc;

Does that work for you?

Best Regards, Simon Riggs


Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
At 19:15 05/09/23, Simon Riggs wrote:
>select distinct on (PlayerID) PlayerID,AtDate from Player a
>where PlayerID='22220' order by PlayerId, AtDate Desc;
>
>Does that work for you?
>
>Best Regards, Simon Riggs

esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
Player a where PlayerID='22220' order by PlayerId, AtDate Desc;
  Unique  (cost=1417.69..1426.47 rows=2 width=23) (actual
time=31.231..36.609 rows=1 loops=1)
    ->  Sort  (cost=1417.69..1422.08 rows=1756 width=23) (actual
time=31.129..32.473 rows=1743 loops=1)
          Sort Key: playerid, atdate
          ->  Index Scan using pk_player on player a  (cost=0.00..1323.05
rows=1756 width=23) (actual time=0.035..6.575 rows=1743 loops=1)
                Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 36.943 ms

The sort was eliminated with: order by PlayerId Desc, AtDate Desc:

esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc;
  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438
rows=1 loops=1)
    ->  Index Scan Backward using pk_player on player
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950
rows=1743 loops=1)
          Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 8.499 ms

That is the fastest of all queries looping the 1743 rows.
I do get the desired result by adding LIMIT 1:

esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc LIMIT 1;

  Limit  (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033
rows=1 loops=1)
    ->  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual
time=0.028..0.028 rows=1 loops=1)
          ->  Index Scan Backward using pk_player on player
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022 rows=1
loops=1)
                Index Cond: ((playerid)::text = '22220'::text)
  Total runtime: 0.094 ms

However, when I use that within a function in a view, it is slow again:

esdt=> create or replace function player_max_atdate (varchar(32)) returns
varchar(32) as $$
esdt$>  select distinct on (PlayerID) AtDate from player where PlayerID= $1
order by PlayerID desc, AtDate desc limit 1;
esdt$> $$ language sql immutable;
CREATE FUNCTION
esdt=> create or replace view VCurPlayer3 as select * from Player where
AtDate = player_max_atdate(PlayerID);
CREATE VIEW
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';

  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
width=23) (actual time=76.660..76.664 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
  Total runtime: 76.716 ms

Why wouldn't the function get the row as quickly as the direct sql does?

Best regards, KC.



Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Merlin Moncure"
Дата:
> At 02:07 05/09/23, Merlin Moncure wrote:
> > > >Here is a trick I use sometimes with views, etc.  This may or may
not
> be
> > > >effective to solve your problem but it's worth a shot.  Create
one
> small
> > > >SQL function taking date, etc. and returning the values and
define it
> > > >immutable.  Now in-query it is treated like a constant.
>
> esdt=> create or replace function player_max_atdate (varchar(32))
returns
> varchar(32) as $$
> esdt$>  select atdate from player where playerid = $1 order by
playerid
> desc, AtDate desc limit 1;
> esdt$> $$ language sql immutable;

Can you time just the execution of this function and compare vs. pure
SQL version?  If the times are different, can you do a exaplain analyze
of a prepared version of above?

prepare test(character varying) as select atdate from player where
playerid = $1 order by playerid desc, AtDate desc limit 1;

explain analyze execute test('22220');

> CREATE FUNCTION
> esdt=> create or replace view VCurPlayer3 as select * from Player
where
> AtDate = player_max_atdate(PlayerID);
> CREATE VIEW

This is wrong, it should have been
create or replace view VCurPlayer3 as select *,
player_max_atdate(PlayerID) as max_date  from Player;

I did a test on a table with 124k records and a two part key, ID & date.
esp# select count(*) from parts_order_file;
count
--------
 124158
(1 row)


esp=# select count(*) from parts_order_file where pr_dealer_no =
'000500';
 count
-------
 27971
(1 row)

created same function, view v, etc.
esp=# explain analyze select * from v where pr_dealer_no = '000500'
limit 1;

QUERY PLAN

------------------------------------------------------------------------
----------------------------
----------------------------------------------------------------
 Limit  (cost=0.00..3.87 rows=1 width=10) (actual time=1.295..1.297
rows=1 loops=1)
   ->  Index Scan using parts_order_file_pr_dealer_no_key on
parts_order_file  (cost=0.00..109369.15
 rows=28226 width=10) (actual time=1.287..1.287 rows=1 loops=1)
         Index Cond: (pr_dealer_no = '000500'::bpchar)
 Total runtime: 1.413 ms
(4 rows)

Something is not jiving here.  However, if the server plan still does
not come out correct, try the following (p.s. why is function returning
varchar(32) and not date?):

create or replace function player_max_atdate (varchar(32)) returns date
as
$$
    DECLARE
       player_record record;
       return date date;
    BEGIN
       for player_record in execute
           'select atdate from player where playerid = \'' || $1 || '\'
order by playerid desc, AtDate desc limit 1;' loop
           return_date = player_record.atdate;
       end loop;

       return return_date;
    END;
$ language plpgsql immutable;

Merlin

Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Dear Merlin,

At 20:34 05/09/23, Merlin Moncure wrote:
>Can you time just the execution of this function and compare vs. pure
>SQL version?  If the times are different, can you do a exaplain analyze
>of a prepared version of above?

esdt=> prepare test(character varying) as select atdate from player where
esdt-> playerid = $1 order by playerid desc, AtDate desc limit 1;
PREPARE
esdt=> explain analyze execute test('22220');
  Limit  (cost=0.00..0.83 rows=1 width=23) (actual time=0.032..0.033 rows=1
loops=1)
    ->  Index Scan Backward using pk_player on player  (cost=0.00..970.53
rows=1166 width=23) (actual time=0.027..0.027 rows=1 loops=1)
          Index Cond: ((playerid)::text = ($1)::text)
  Total runtime: 0.088 ms

The prepared SQL timing is similar to that of a direct SQL.

> > esdt=> create or replace view VCurPlayer3 as select * from Player where
> > AtDate = player_max_atdate(PlayerID);
>
>This is wrong, it should have been
>create or replace view VCurPlayer3 as select *,
>player_max_atdate(PlayerID) as max_date  from Player;

Your suggestion returns all the records plus a max AtDate column for each
PlayerID.
What I want to get with the view is the record that has the max value of
AtDate for each PlayerID.
The AtDate is a varchar(23) field containing a string date of format
'yyyymmddhh', not the SQL Date field. Sorry if that confused you.

>Something is not jiving here.  However, if the server plan still does
>not come out correct, try the following (p.s. why is function returning
>varchar(32) and not date?):

esdt=> create or replace function player_max_atdate (varchar(32)) returns
varchar(32) as $$
esdt$>     DECLARE
esdt$>        player_record record;
esdt$>        return_date varchar(32);
esdt$>     BEGIN
esdt$>        for player_record in execute
esdt$>            'select atdate from player where playerid = \'' || $1 ||
'\' order by playerid desc, AtDate desc limit 1;' loop
esdt$>            return_date = player_record.atdate;
esdt$>        end loop;
esdt$>        return return_date;
esdt$>     END;
esdt$> $$ language plpgsql immutable;
CREATE FUNCTION
esdt=> create or replace view VCurPlayer3 as select * from Player where
AtDate = player_max_atdate(PlayerID);
CREATE VIEW
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';

  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
width=23) (actual time=849.021..849.025 rows=1 loops=1)
    Index Cond: ((playerid)::text = '22220'::text)
    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
  Total runtime: 849.078 ms

Your suggested plpgsql function seems to be even slower, with a best time
of 849 ms after several tries. Is that expected?

Thanks again and best regards,
KC.


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Mark Kirkwood
Дата:
K C Lau wrote:
> Thank you all for your suggestions. I' tried, with some variations too,
> but still no success. The times given are the best of a few repeated
> tries on an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.
>

A small denormalization, where you mark the row with the latest atdate
for each playerid may get you the performance you want.

e.g: (8.1beta1)

ALTER TABLE player ADD islastatdate boolean;

UPDATE player SET islastatdate = true where (playerid,atdate) IN
(SELECT playerid, atdate FROM vcurplayer);

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE islastatdate;

CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate)
WHERE islastatdate;

ANALYZE player;

Generating some test data produced:

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='22220';

                                                            QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
  Index Scan using player_id_lastatdate on player a  (cost=0.00..4.33
rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1)
    Index Cond: ((playerid = '22220'::text) AND (lastatdate = true))
    Filter: lastatdate
  Total runtime: 0.272 ms
(4 rows)

Whereas with the original view definition:

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE a.atdate =
( SELECT max(b.atdate) FROM player b
   WHERE a.playerid = b.playerid);

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='22220';

     QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using player_id_date on player a  (cost=0.00..7399.23
rows=11 width=13) (actual time=121.738..121.745 rows=1 loops=1)
    Index Cond: (playerid = '22220'::text)
    Filter: (atdate = (subplan))
    SubPlan
      ->  Result  (cost=1.72..1.73 rows=1 width=0) (actual
time=0.044..0.047 rows=1 loops=2000)
            InitPlan
              ->  Limit  (cost=0.00..1.72 rows=1 width=4) (actual
time=0.028..0.031 rows=1 loops=2000)
                    ->  Index Scan Backward using player_id_date on
player b  (cost=0.00..3787.94 rows=2198 width=4) (actual
time=0.019..0.019 rows=1 loops=2000)
                          Index Cond: ($0 = playerid)
                          Filter: (atdate IS NOT NULL)
  Total runtime: 121.916 ms
(11 rows)

Note that my generated data has too many rows for each playerid, but the
  difference in performance should illustrate the idea.

Cheers

Mark

Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Dear Mark,

Thank you. That seems like a more manageable alternative if nothing else
works out. It should cover many of the OLTP update transactions. But it
does mean quite a bit of programming changes and adding another index on
all such tables, and it would not cover those cases when we need to get the
latest record before a certain time, for example.

I'm wondering if this performance issue is common enough for other users to
merit a fix in pg, especially as it seems that with MVCC, each of the data
records need to be accessed in addition to scanning the index.

Best regards,
KC.

At 09:40 05/09/24, Mark Kirkwood wrote:
>A small denormalization, where you mark the row with the latest atdate for
>each playerid may get you the performance you want.
>
>e.g: (8.1beta1)
>
>ALTER TABLE player ADD islastatdate boolean;
>
>UPDATE player SET islastatdate = true where (playerid,atdate) IN
>(SELECT playerid, atdate FROM vcurplayer);
>
>CREATE OR REPLACE VIEW vcurplayer AS
>SELECT * FROM player a
>WHERE islastatdate;
>
>CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate)
>WHERE islastatdate;
>
>ANALYZE player;
>
>Generating some test data produced:
>
>EXPLAIN ANALYZE
>SELECT playerid,atdate
>FROM vcurplayer
>WHERE playerid='22220';
>
>                                                            QUERY PLAN

>--------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using player_id_lastatdate on player a  (cost=0.00..4.33
> rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1)
>    Index Cond: ((playerid = '22220'::text) AND (lastatdate = true))
>    Filter: lastatdate
>  Total runtime: 0.272 ms
>(4 rows)
>
>Whereas with the original view definition:
>
>CREATE OR REPLACE VIEW vcurplayer AS
>SELECT * FROM player a
>WHERE a.atdate =
>( SELECT max(b.atdate) FROM player b
>   WHERE a.playerid = b.playerid);
>
>EXPLAIN ANALYZE
>SELECT playerid,atdate
>FROM vcurplayer
>WHERE playerid='22220';
>     QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using player_id_date on player a  (cost=0.00..7399.23 rows=11
> width=13) (actual time=121.738..121.745 rows=1 loops=1)
>    Index Cond: (playerid = '22220'::text)
>    Filter: (atdate = (subplan))
>    SubPlan
>      ->  Result  (cost=1.72..1.73 rows=1 width=0) (actual
> time=0.044..0.047 rows=1 loops=2000)
>            InitPlan
>              ->  Limit  (cost=0.00..1.72 rows=1 width=4) (actual
> time=0.028..0.031 rows=1 loops=2000)
>                    ->  Index Scan Backward using player_id_date on player
> b  (cost=0.00..3787.94 rows=2198 width=4) (actual time=0.019..0.019
> rows=1 loops=2000)
>                          Index Cond: ($0 = playerid)
>                          Filter: (atdate IS NOT NULL)
>  Total runtime: 121.916 ms
>(11 rows)
>
>Note that my generated data has too many rows for each playerid, but
>the  difference in performance should illustrate the idea.
>
>Cheers
>
>Mark


Re: SELECT LIMIT 1 VIEW Performance Issue

От
Mark Kirkwood
Дата:
K C Lau wrote:

> I'm wondering if this performance issue is common enough for other users
> to merit a fix in pg, especially as it seems that with MVCC, each of the
> data records need to be accessed in addition to scanning the index.
>

Yes - there are certainly cases where index only access (or something
similar, like b+tree tables) would be highly desirable.

 From what I have understood from previous discussions, there are
difficulties involved with producing a design that does not cause new
problems...

regards

Mark

Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
At 20:17 05/09/23, K C Lau wrote:
>At 19:15 05/09/23, Simon Riggs wrote:
>>select distinct on (PlayerID) PlayerID,AtDate from Player a
>>where PlayerID='22220' order by PlayerId, AtDate Desc;
>>
>>Does that work for you?
>>
>>Best Regards, Simon Riggs
>
>esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
>Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc;
>  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438
> rows=1 loops=1)
>    ->  Index Scan Backward using pk_player on player
> a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950
> rows=1743 loops=1)
>          Index Cond: ((playerid)::text = '22220'::text)
>  Total runtime: 8.499 ms
>
>That is the fastest of all queries looping the 1743 rows.
>I do get the desired result by adding LIMIT 1:
>
>esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
>Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc LIMIT 1;
>
>  Limit  (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033
> rows=1 loops=1)
>    ->  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual
> time=0.028..0.028 rows=1 loops=1)
>          ->  Index Scan Backward using pk_player on player
> a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022
> rows=1 loops=1)
>                Index Cond: ((playerid)::text = '22220'::text)
>  Total runtime: 0.094 ms
>
>However, when I use that within a function in a view, it is slow again:
>
>esdt=> create or replace function player_max_atdate (varchar(32)) returns
>varchar(32) as $$
>esdt$>  select distinct on (PlayerID) AtDate from player where PlayerID=
>$1 order by PlayerID desc, AtDate desc limit 1;
>esdt$> $$ language sql immutable;
>CREATE FUNCTION
>esdt=> create or replace view VCurPlayer3 as select * from Player where
>AtDate = player_max_atdate(PlayerID);
>CREATE VIEW
>esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
>PlayerID='22220';
>
>  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
> width=23) (actual time=76.660..76.664 rows=1 loops=1)
>    Index Cond: ((playerid)::text = '22220'::text)
>    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
>  Total runtime: 76.716 ms
>
>Why wouldn't the function get the row as quickly as the direct sql does?

Results from the following query suggests that the explain analyze output
above only tells half the story, and that the function is in fact called
1743 times:

esdt=> create or replace view VCurPlayer3 as select distinct on (PlayerID)
* from Player a where OID = (select distinct on (PlayerID) OID from Player
b where b.PlayerID = a.PlayerID and b.AtDate =
player_max_atdate(b.PlayerID) order by PlayerID desc, AtDate desc limit 1)
order by PlayerId Desc, AtDate desc;
CREATE VIEW
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';
  Subquery Scan vcurplayer3  (cost=0.00..1715846.91 rows=1 width=68)
(actual time=0.640..119.124 rows=1 loops=1)
    ->  Unique  (cost=0.00..1715846.90 rows=1 width=776) (actual
time=0.633..119.112 rows=1 loops=1)
          ->  Index Scan Backward using pk_player on player
a  (cost=0.00..1715846.88 rows=9 width=776) (actual time=0.628..119.104
rows=1 loops=1)
                Index Cond: ((playerid)::text = '22220'::text)
                Filter: (oid = (subplan))
                SubPlan
                  ->  Limit  (cost=0.00..976.38 rows=1 width=27) (actual
time=0.057..0.058 rows=1 loops=1743)
                        ->  Unique  (cost=0.00..976.38 rows=1 width=27)
(actual time=0.052..0.052 rows=1 loops=1743)
                              ->  Index Scan Backward using pk_player on
player b  (cost=0.00..976.36 rows=6 width=27) (actual  time=0.047..0.047
rows=1 loops=1743)
                                    Index Cond: ((playerid)::text = ($0)::text)
                                    Filter: ((atdate)::text =
(player_max_atdate(playerid))::text)
  Total runtime: 119.357 ms

It would also explain the very long time taken by the pl/pgsql function I
posted a bit earlier.

So I guess it all comes back to the basic question:

For the query select distinct on (PlayerID) * from Player a where
PlayerID='22220' order by PlayerId Desc, AtDate Desc;
can the optimizer recognise the fact the query is selecting by the primary
key (PlayerID,AtDate), so it can skip the remaining rows for that PlayerID,
as if LIMIT 1 is implied?

Best regards, KC.



Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Jim C. Nasby"
Дата:
On Fri, Sep 23, 2005 at 08:17:03PM +0800, K C Lau wrote:
> esdt=> create or replace function player_max_atdate (varchar(32)) returns
> varchar(32) as $$
> esdt$>  select distinct on (PlayerID) AtDate from player where PlayerID= $1
> order by PlayerID desc, AtDate desc limit 1;
> esdt$> $$ language sql immutable;
> CREATE FUNCTION

That function is not immutable, it should be defined as stable.

> esdt=> create or replace view VCurPlayer3 as select * from Player where
> AtDate = player_max_atdate(PlayerID);
> CREATE VIEW
> esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
> PlayerID='22220';
>
>  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9
> width=23) (actual time=76.660..76.664 rows=1 loops=1)
>    Index Cond: ((playerid)::text = '22220'::text)
>    Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
>  Total runtime: 76.716 ms
>
> Why wouldn't the function get the row as quickly as the direct sql does?

PostgreSQL doesn't pre-compile functions, at least not until 8.1 (and
I'm not sure how much those are pre-compiled, though they are
syntax-checked at creation). Do you get the same result time when you
run it a second time? What time do you get from running just the
function versus the SQL in the function?

Also, remember that every layer you add to the cake means more work for
the database. If speed is that highly critical you'll probably want to
not wrap things in functions, and possibly not use views either.

Also, keep in mind that getting below 1ms doesn't automatically mean
you'll be able to scale to 1000TPS. Things will definately change when
you load the system down, so if performance is that critical you should
start testing with the system under load if you're not already.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Jim C. Nasby"
Дата:
On Fri, Sep 23, 2005 at 04:53:55PM +0800, K C Lau wrote:
> Thank you all for your suggestions. I' tried, with some variations too, but
> still no success. The times given are the best of a few repeated tries on
> an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.
>
> For reference, only the following gets the record quickly:
>
> esdt=> explain analyze select PlayerID,AtDate from Player a
>  where PlayerID='22220' and AtDate = (select b.AtDate from Player b
>  where b.PlayerID = '22220' order by b.PlayerID desc, b.AtDate desc LIMIT
>  1);
>
>  Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23)
> (actual time=0.054..0.057 rows=1 loops=1)
>    Index Cond: (((playerid)::text = '22220'::text) AND ((atdate)::text =
> ($0)::text))
>    InitPlan
>      ->  Limit  (cost=0.00..0.75 rows=1 width=23) (actual
> time=0.027..0.028 rows=1 loops=1)
>            ->  Index Scan Backward using pk_player on player
> b  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1
> loops=1)
>                  Index Cond: ((playerid)::text = '22220'::text)
>  Total runtime: 0.132 ms

If you're doing that, you should try something like the following:
decibel=# explain analyze select  * from t where ctid=(select ctid from rrs order by rrs_id desc limit 1);
                                                                 QUERY PLAN
                     

---------------------------------------------------------------------------------------------------------------------------------------------
 Tid Scan on t  (cost=0.44..4.45 rows=1 width=42) (actual time=0.750..0.754 rows=1 loops=1)
   Filter: (ctid = $0)
   InitPlan
     ->  Limit  (cost=0.00..0.44 rows=1 width=10) (actual time=0.548..0.549 rows=1 loops=1)
           ->  Index Scan Backward using rrs_rrs__rrs_id on rrs  (cost=0.00..3.08 rows=7 width=10) (actual
time=0.541..0.541rows=1 loops=1) 
 Total runtime: 1.061 ms
(6 rows)

decibel=# select count(*) from t; count
--------
 458752

Note that that's on my nice slow laptop to boot (the count took like 10
seconds).

Just remember that ctid *is not safe outside of a transaction*!! So you can't
do something like

SELECT ctid FROM ...
store that in some variable...
SELECT * FROM table WHERE ctid = variable
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Jim C. Nasby"
Дата:
On Tue, Oct 04, 2005 at 04:15:41PM -0500, Jim C. Nasby wrote:
> >    Index Cond: ((playerid)::text = '22220'::text)

Also, why is playerid a text field? Comparing ints will certainly be
faster...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Merlin Moncure"
Дата:
[to K C:] sorry, was out on vactation all last week.  I was visualizing
the problem incorrectly anyways...

Jim wrote:
> That function is not immutable, it should be defined as stable.

That is 100% correct: however now and then I declare stable functions as
immutable in some cases because the planner treats them differently with
no side effects...this is a hack of course...see my earlier suggestion
to try both immutable and stable versions.  I can give a pretty good
example of when this can make a big difference.

> PostgreSQL doesn't pre-compile functions, at least not until 8.1 (and
> I'm not sure how much those are pre-compiled, though they are
> syntax-checked at creation). Do you get the same result time when you
> run it a second time? What time do you get from running just the
> function versus the SQL in the function?

plpgsql functions are at least partially compiled (sql functions afaik
are not), in that a internal state is generated following the first
execution.  This is the cause of all those infernal 'invalid table oid'
errors.

> Also, remember that every layer you add to the cake means more work
for
> the database. If speed is that highly critical you'll probably want to
> not wrap things in functions, and possibly not use views either.

The overhead of the function/view is totally inconsequential next to the
planner choosing a suboptimal plan.  The purpose of the function is to
coerce the planner into choosing the correct plan.

> Also, keep in mind that getting below 1ms doesn't automatically mean
> you'll be able to scale to 1000TPS. Things will definately change when
> you load the system down, so if performance is that critical you
should
> start testing with the system under load if you're not already.


Re: SELECT LIMIT 1 VIEW Performance Issue

От
"Merlin Moncure"
Дата:
KC wrote:
>
> So I guess it all comes back to the basic question:
>
> For the query select distinct on (PlayerID) * from Player a where
> PlayerID='22220' order by PlayerId Desc, AtDate Desc;
> can the optimizer recognise the fact the query is selecting by the
primary
> key (PlayerID,AtDate), so it can skip the remaining rows for that
> PlayerID,
> as if LIMIT 1 is implied?
>
> Best regards, KC.

Hi KC, have you tried:
select * from player where playerid = '22220' and atdate < 9999999999
order by platerid desc, atdate desc limit 1;

??
Merlin

Re: SELECT LIMIT 1 VIEW Performance Issue

От
K C Lau
Дата:
Dear Merlin and all,

That direct SQL returns in 0 ms. The problem only appears when a view is used.

What we've done to work around this problem is to modify the table to add a
field DataStatus which is set to 1 for the latest record for each player,
and reset to 0 when it is superceded.

A partial index is then created as:
CREATE INDEX IDX_CurPlayer on Player (PlayerID) where DataStatus = 1;

The VCurPlayer view is changed to:
CREATE or REPLACE VIEW VCurPlayer as select * from Player where DataStatus = 1;
and it now returns in 0 ms.

This is not the best solution, but until (if ever) the original problem is
fixed, we have not found an alternative work around.

The good news is that even with the additional overhead of maintaining an
extra index and the problem of vacuuming, pg 8.0.3 still performs
significantly faster on Windows than MS Sql 2000 in our OLTP application
testing so far.

Thanks to all for your help.

Best regards,
KC.

At 20:14 05/10/12, you wrote:
>KC wrote:
> >
> > So I guess it all comes back to the basic question:
> >
> > For the query select distinct on (PlayerID) * from Player a where
> > PlayerID='22220' order by PlayerId Desc, AtDate Desc;
> > can the optimizer recognise the fact the query is selecting by the
>primary
> > key (PlayerID,AtDate), so it can skip the remaining rows for that
> > PlayerID,
> > as if LIMIT 1 is implied?
> >
> > Best regards, KC.
>
>Hi KC, have you tried:
>select * from player where playerid = '22220' and atdate < 9999999999
>order by platerid desc, atdate desc limit 1;
>
>??
>Merlin