Обсуждение: unexpected stable function behavior

Поиск
Список
Период
Сортировка

unexpected stable function behavior

От
Julius Tuskenis
Дата:
Hello, list

Our company is creating a ticketing system. Of course the performance
issues are very important to us (as to all of you I guess). To increase
speed of some queries stable functions are used, but somehow they don't
act exactly as I expect, so would you please explain what am I doing (or
expecting) wrong...

First of all I have the stable function witch runs fast and I have no
problems with it at all.
CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer,
prm_event_id integer, prm_cashier_id integer)
   RETURNS numeric AS
'
........ some code here
'
   LANGUAGE plpgsql STABLE
   COST 100;

Now the test:

1) query without using the function
explain analyze
   SELECT thtp_tick_id, price_id,
     price_price,
     price_color
   FROM ticket_price
     JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
   WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
   ORDER BY price_id;

Result:
"Sort  (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842
rows=4335 loops=1)"
"  Sort Key: ticket_price.price_id"
"  Sort Method:  quicksort  Memory: 433kB"
"  ->  Nested Loop  (cost=0.00..109.12 rows=518 width=25) (actual
time=0.037..3.148 rows=4335 loops=1)"
"        ->  Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7 loops=1)"
"              Index Cond: (price_event_id = 7820)"
"              Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
"        ->  Index Scan using idx_thtp_price_id on
ticket_has_ticket_price  (cost=0.00..47.06 rows=259 width=8) (actual
time=0.013..0.211 rows=619 loops=7)"
"              Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 6.425 ms"


2) Query using the function
explain analyze
   SELECT thtp_tick_id, price_id,
     price_price, web_select_extra_price(price_id, price_event_id, 1),
     price_color
   FROM ticket_price
     JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
   WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
   ORDER BY price_id;

Result:
"Sort  (cost=261.97..263.27 rows=518 width=29) (actual
time=704.224..704.927 rows=4335 loops=1)"
"  Sort Key: ticket_price.price_id"
"  Sort Method:  quicksort  Memory: 433kB"
"  ->  Nested Loop  (cost=0.00..238.62 rows=518 width=29) (actual
time=0.272..699.073 rows=4335 loops=1)"
"        ->  Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7 loops=1)"
"              Index Cond: (price_event_id = 7820)"
"              Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
"        ->  Index Scan using idx_thtp_price_id on
ticket_has_ticket_price  (cost=0.00..47.06 rows=259 width=8) (actual
time=0.017..0.582 rows=619 loops=7)"
"              Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 705.531 ms"


Now what you can think is that executing web_select_extra_price takes
the difference, but
3) As STABLE function should be executed once for every different set of
parameters I do
SELECT web_select_extra_price(price_id, 7820, 1) FROM (

   SELECT distinct price_id
   FROM ticket_price
     JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
   WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
  ) as qq;

Result:
"Subquery Scan on qq  (cost=110.34..110.88 rows=2 width=4) (actual
time=7.265..8.907 rows=7 loops=1)"
"  ->  HashAggregate  (cost=110.34..110.36 rows=2 width=4) (actual
time=6.866..6.873 rows=7 loops=1)"
"        ->  Nested Loop  (cost=0.00..109.05 rows=517 width=4) (actual
time=0.037..4.643 rows=4335 loops=1)"
"              ->  Index Scan using index_price_event_id on
ticket_price  (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038
rows=7 loops=1)"
"                    Index Cond: (price_event_id = 7820)"
"                    Filter: ((now() >= price_date) AND (now() <=
price_date_till))"
"              ->  Index Scan using idx_thtp_price_id on
ticket_has_ticket_price  (cost=0.00..47.04 rows=258 width=4) (actual
time=0.019..0.336 rows=619 loops=7)"
"                    Index Cond: (ticket_has_ticket_price.thtp_price_id
= ticket_price.price_id)"
"Total runtime: 8.966 ms"


You can see the query has only 7 distinct parameter sets to pass to the
function but...
4)   Explain analyze
   SELECT web_select_extra_price(price_id, 7820, 1)
   FROM ticket_price
     JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
   WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)

Result:
"Nested Loop  (cost=0.00..238.30 rows=517 width=4) (actual
time=0.365..808.537 rows=4335 loops=1)"
"  ->  Index Scan using index_price_event_id on ticket_price
(cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)"
"        Index Cond: (price_event_id = 7820)"
"        Filter: ((now() >= price_date) AND (now() <= price_date_till))"
"  ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
(cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619
loops=7)"
"        Index Cond: (ticket_has_ticket_price.thtp_price_id =
ticket_price.price_id)"
"Total runtime: 810.143 ms"


So I am totally confused... It seems that selecting 4335 rows is a joke
for Postgresql, but the great job is done then adding one of 7 possible
values to the result set... Please help me understand what I am missing
here?...

Finally the system:
Server
PG: Version string    PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by
GCC gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit

Client
Win XP SP3 with pgAdmin 1.12.2.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050


Re: unexpected stable function behavior

От
Merlin Moncure
Дата:
On Thu, Mar 10, 2011 at 10:26 AM, Julius Tuskenis <julius@nsoft.lt> wrote:
> Hello, list
>
> Our company is creating a ticketing system. Of course the performance issues
> are very important to us (as to all of you I guess). To increase speed of
> some queries stable functions are used, but somehow they don't act exactly
> as I expect, so would you please explain what am I doing (or expecting)
> wrong...
>
> First of all I have the stable function witch runs fast and I have no
> problems with it at all.
> CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer,
> prm_event_id integer, prm_cashier_id integer)
>  RETURNS numeric AS
> '
> ........ some code here
> '
>  LANGUAGE plpgsql STABLE
>  COST 100;
>
> Now the test:
>
> 1) query without using the function
> explain analyze
>  SELECT thtp_tick_id, price_id,
>    price_price,
>    price_color
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ORDER BY price_id;
>
> Result:
> "Sort  (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842
> rows=4335 loops=1)"
> "  Sort Key: ticket_price.price_id"
> "  Sort Method:  quicksort  Memory: 433kB"
> "  ->  Nested Loop  (cost=0.00..109.12 rows=518 width=25) (actual
> time=0.037..3.148 rows=4335 loops=1)"
> "        ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7
> loops=1)"
> "              Index Cond: (price_event_id = 7820)"
> "              Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.06 rows=259 width=8) (actual time=0.013..0.211 rows=619
> loops=7)"
> "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 6.425 ms"
>
>
> 2) Query using the function
> explain analyze
>  SELECT thtp_tick_id, price_id,
>    price_price, web_select_extra_price(price_id, price_event_id, 1),
>    price_color
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ORDER BY price_id;
>
> Result:
> "Sort  (cost=261.97..263.27 rows=518 width=29) (actual time=704.224..704.927
> rows=4335 loops=1)"
> "  Sort Key: ticket_price.price_id"
> "  Sort Method:  quicksort  Memory: 433kB"
> "  ->  Nested Loop  (cost=0.00..238.62 rows=518 width=29) (actual
> time=0.272..699.073 rows=4335 loops=1)"
> "        ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7
> loops=1)"
> "              Index Cond: (price_event_id = 7820)"
> "              Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.06 rows=259 width=8) (actual time=0.017..0.582 rows=619
> loops=7)"
> "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 705.531 ms"
>
>
> Now what you can think is that executing web_select_extra_price takes the
> difference, but
> 3) As STABLE function should be executed once for every different set of
> parameters I do
> SELECT web_select_extra_price(price_id, 7820, 1) FROM (
>
>  SELECT distinct price_id
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ) as qq;
>
> Result:
> "Subquery Scan on qq  (cost=110.34..110.88 rows=2 width=4) (actual
> time=7.265..8.907 rows=7 loops=1)"
> "  ->  HashAggregate  (cost=110.34..110.36 rows=2 width=4) (actual
> time=6.866..6.873 rows=7 loops=1)"
> "        ->  Nested Loop  (cost=0.00..109.05 rows=517 width=4) (actual
> time=0.037..4.643 rows=4335 loops=1)"
> "              ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038 rows=7 loops=1)"
> "                    Index Cond: (price_event_id = 7820)"
> "                    Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "              ->  Index Scan using idx_thtp_price_id on
> ticket_has_ticket_price  (cost=0.00..47.04 rows=258 width=4) (actual
> time=0.019..0.336 rows=619 loops=7)"
> "                    Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 8.966 ms"
>
>
> You can see the query has only 7 distinct parameter sets to pass to the
> function but...
> 4)   Explain analyze
>  SELECT web_select_extra_price(price_id, 7820, 1)
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>
> Result:
> "Nested Loop  (cost=0.00..238.30 rows=517 width=4) (actual
> time=0.365..808.537 rows=4335 loops=1)"
> "  ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)"
> "        Index Cond: (price_event_id = 7820)"
> "        Filter: ((now() >= price_date) AND (now() <= price_date_till))"
> "  ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619
> loops=7)"
> "        Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 810.143 ms"
>
>
> So I am totally confused... It seems that selecting 4335 rows is a joke for
> Postgresql, but the great job is done then adding one of 7 possible values
> to the result set... Please help me understand what I am missing here?...
>
> Finally the system:
> Server
> PG: Version string    PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit
>
> Client
> Win XP SP3 with pgAdmin 1.12.2.

This is a huge problem with non trivial functions in the select list.
Pushing the result into and a subquery does NOT guarantee that the
inner result is materialized first.   Try a CTE.

with foo as
(
  select yadda;
)
select func(foo.a), foo.* from foo;

merlin

Re: unexpected stable function behavior

От
Merlin Moncure
Дата:
On Mon, Mar 14, 2011 at 3:46 AM, Julius Tuskenis <julius@nsoft.lt> wrote:
> Hello, Merlin
>
> Thank you for your quick response.
>
> 2011.03.10 23:14, Merlin Moncure rašė:
>
> This is a huge problem with non trivial functions in the select list.
> Pushing the result into and a subquery does NOT guarantee that the
> inner result is materialized first.
>
> From the postgresql documentation about STABLE functions: "This category
> allows the optimizer to optimize multiple calls of the function to a single
> call." I thought that this means that optimizer executes the function only
> for now parameter sets and stores results in some "cache" and use it if the
> parameters are already known. I realize this is very naive approach and most
> probably everything is much more complicated. I would appreciate if someone
> would explain the mechanism (or provide with some useful link).

Just because some optimizations can happen doesn't mean they will
happen or there is even capability to make them happen.  There was
some recent discussion about this very topic here:
http://postgresql.1045698.n5.nabble.com/function-contants-evaluated-for-every-row-td3278945.html.


>    Try a CTE.
>
> with foo as
> (
>   select yadda;
> )
> select func(foo.a), foo.* from foo;
>
> I'm sorry, but I'm totally new to CTE. Would you please show me how should I
> use the stable function and where the parameters should be put to improve
> the behavior of the optimizer for my problem?

WITH results as
(
 SELECT distinct price_id as price_id
 FROM ticket_price
   JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
 WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
current_timestamp <= price_date_till)
 ) as qq
)
 SELECT web_select_extra_price(price_id, 7820, 1) from results;


Another way to fight this is to play with the cost planner hint
parameter in 'create function', but I prefer the CTE -- it gives
strong guarantees about order of execution which is what you really
want.  CTEs are great btw, I'd start learning them immediately.

IMNSHO, this (uncontrolled number of function executions when run via
field select list) is a common gotcha w/postgres and a FAQ.  Also the
documentation is not very helpful on this point...do you agree CTE is
the right way to advise handling this problem...is it worth further
notation?

merlin

Re: unexpected stable function behavior

От
Andres Freund
Дата:
On Thursday, March 10, 2011 05:26:00 PM Julius Tuskenis wrote:
> 3) As STABLE function should be executed once for every different set of
> parameters
Thats not true. Thats not what any of the volatility information (like STABLE,
IMMUTABLE, VOLATILE) does.

See http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html

It *does* change how often a function is executed though. I.e.

SELECT g.i, some_stable_func(1) FROM generate_series(1, 1000) g(i)

will call some_stable_func only once because it can determine all the
parameters beforehand.


Andres

Re: unexpected stable function behavior

От
Julius Tuskenis
Дата:
Hello, Merlin

Thank you for your quick response.

2011.03.10 23:14, Merlin Moncure rašė:
This is a huge problem with non trivial functions in the select list.
Pushing the result into and a subquery does NOT guarantee that the
inner result is materialized first.
From the postgresql documentation about STABLE functions: "This category allows the optimizer to optimize multiple calls of the function to a single call." I thought that this means that optimizer executes the function only for now parameter sets and stores results in some "cache" and use it if the parameters are already known. I realize this is very naive approach and most probably everything is much more complicated. I would appreciate if someone would explain the mechanism (or provide with some useful link).

   Try a CTE.

with foo as
( select yadda;
)
select func(foo.a), foo.* from foo;
I'm sorry, but I'm totally new to CTE. Would you please show me how should I use the stable function and where the parameters should be put to improve the behavior of the optimizer for my problem?

Thank you in advance
-- 
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Re: unexpected stable function behavior

От
Tom Lane
Дата:
Julius Tuskenis <julius@nsoft.lt> writes:
>  From the postgresql documentation about STABLE functions: "This
> category allows the optimizer to optimize multiple calls of the function
> to a single call." I thought that this means that optimizer executes the
> function only for now parameter sets and stores results in some "cache"
> and use it if the parameters are already known.

No, it does not.  That function property *allows* the optimizer to
invoke the function fewer times than would happen in an un-optimized
query.  It does not *require* it to do so.  There is no such cache
mechanism in Postgres, and it's unlikely that there ever will be,
because it probably would be a net performance loss on average.

            regards, tom lane

Re: unexpected stable function behavior

От
Julius Tuskenis
Дата:
Hello,

2011.03.14 15:41, Merlin Moncure rašė:
> WITH results as
> (
>   SELECT distinct price_id as price_id
>   FROM ticket_price
>     JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>   WHERE price_event_id = 7820 and (current_timestamp>= price_date AND
> current_timestamp<= price_date_till)
>   ) as qq
> )
>   SELECT web_select_extra_price(price_id, 7820, 1) from results;
>
Thank you Merlin for your help. I have updated my function to use CTE.
Although there was no performance improvement (I had the select with
function using distinct values joined earlyer) it's good to know the
optimizer will not change the way I want the query to be executed. Thank
you once again.

> CTEs are great btw, I'd start learning them immediately.
I am going to do that.
> IMNSHO, this (uncontrolled number of function executions when run via
> field select list) is a common gotcha w/postgres and a FAQ.  Also the
> documentation is not very helpful on this point...
Yes, I totally agree with you. I think sentence like "Although function
is marked as STABLE or IMMUTABLE the optimizer is not obliged to take
advantage of these properties." (sorry for my English).
> do you agree CTE is the right way to advise handling this problem...is it worth further
> notation?
Yes, the CTE worked fine for me. Reading some more on this topic I found
some comments that the optimizer has no possibility to know how many
times the function is to be called in such queries (without actually
executing the query), so there is no way to determine the cost. That
explains why not the optimal plan was chosen to my query.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050


Re: unexpected stable function behavior

От
Julius Tuskenis
Дата:
Thank you, Tom for you answer

2011.03.14 19:17, Tom Lane rašė:
That function property *allows* the optimizer to
invoke the function fewer times than would happen in an un-optimized
query.  It does not *require* it to do so.
Thank you for clearing that for me. I think these 2 sentences in documentation (http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html) would prevent misunderstandings in the future.

-- 
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050