Обсуждение: unexpected stable function behavior
Hello, list Our company is creating a ticketing system. Of course the performance issues are very important to us (as to all of you I guess). To increase speed of some queries stable functions are used, but somehow they don't act exactly as I expect, so would you please explain what am I doing (or expecting) wrong... First of all I have the stable function witch runs fast and I have no problems with it at all. CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer, prm_event_id integer, prm_cashier_id integer) RETURNS numeric AS ' ........ some code here ' LANGUAGE plpgsql STABLE COST 100; Now the test: 1) query without using the function explain analyze SELECT thtp_tick_id, price_id, price_price, price_color FROM ticket_price JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) WHERE price_event_id = 7820 and (current_timestamp >= price_date AND current_timestamp <= price_date_till) ORDER BY price_id; Result: "Sort (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842 rows=4335 loops=1)" " Sort Key: ticket_price.price_id" " Sort Method: quicksort Memory: 433kB" " -> Nested Loop (cost=0.00..109.12 rows=518 width=25) (actual time=0.037..3.148 rows=4335 loops=1)" " -> Index Scan using index_price_event_id on ticket_price (cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7 loops=1)" " Index Cond: (price_event_id = 7820)" " Filter: ((now() >= price_date) AND (now() <= price_date_till))" " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price (cost=0.00..47.06 rows=259 width=8) (actual time=0.013..0.211 rows=619 loops=7)" " Index Cond: (ticket_has_ticket_price.thtp_price_id = ticket_price.price_id)" "Total runtime: 6.425 ms" 2) Query using the function explain analyze SELECT thtp_tick_id, price_id, price_price, web_select_extra_price(price_id, price_event_id, 1), price_color FROM ticket_price JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) WHERE price_event_id = 7820 and (current_timestamp >= price_date AND current_timestamp <= price_date_till) ORDER BY price_id; Result: "Sort (cost=261.97..263.27 rows=518 width=29) (actual time=704.224..704.927 rows=4335 loops=1)" " Sort Key: ticket_price.price_id" " Sort Method: quicksort Memory: 433kB" " -> Nested Loop (cost=0.00..238.62 rows=518 width=29) (actual time=0.272..699.073 rows=4335 loops=1)" " -> Index Scan using index_price_event_id on ticket_price (cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7 loops=1)" " Index Cond: (price_event_id = 7820)" " Filter: ((now() >= price_date) AND (now() <= price_date_till))" " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price (cost=0.00..47.06 rows=259 width=8) (actual time=0.017..0.582 rows=619 loops=7)" " Index Cond: (ticket_has_ticket_price.thtp_price_id = ticket_price.price_id)" "Total runtime: 705.531 ms" Now what you can think is that executing web_select_extra_price takes the difference, but 3) As STABLE function should be executed once for every different set of parameters I do SELECT web_select_extra_price(price_id, 7820, 1) FROM ( SELECT distinct price_id FROM ticket_price JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) WHERE price_event_id = 7820 and (current_timestamp >= price_date AND current_timestamp <= price_date_till) ) as qq; Result: "Subquery Scan on qq (cost=110.34..110.88 rows=2 width=4) (actual time=7.265..8.907 rows=7 loops=1)" " -> HashAggregate (cost=110.34..110.36 rows=2 width=4) (actual time=6.866..6.873 rows=7 loops=1)" " -> Nested Loop (cost=0.00..109.05 rows=517 width=4) (actual time=0.037..4.643 rows=4335 loops=1)" " -> Index Scan using index_price_event_id on ticket_price (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038 rows=7 loops=1)" " Index Cond: (price_event_id = 7820)" " Filter: ((now() >= price_date) AND (now() <= price_date_till))" " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price (cost=0.00..47.04 rows=258 width=4) (actual time=0.019..0.336 rows=619 loops=7)" " Index Cond: (ticket_has_ticket_price.thtp_price_id = ticket_price.price_id)" "Total runtime: 8.966 ms" You can see the query has only 7 distinct parameter sets to pass to the function but... 4) Explain analyze SELECT web_select_extra_price(price_id, 7820, 1) FROM ticket_price JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) WHERE price_event_id = 7820 and (current_timestamp >= price_date AND current_timestamp <= price_date_till) Result: "Nested Loop (cost=0.00..238.30 rows=517 width=4) (actual time=0.365..808.537 rows=4335 loops=1)" " -> Index Scan using index_price_event_id on ticket_price (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)" " Index Cond: (price_event_id = 7820)" " Filter: ((now() >= price_date) AND (now() <= price_date_till))" " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price (cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619 loops=7)" " Index Cond: (ticket_has_ticket_price.thtp_price_id = ticket_price.price_id)" "Total runtime: 810.143 ms" So I am totally confused... It seems that selecting 4335 rows is a joke for Postgresql, but the great job is done then adding one of 7 possible values to the result set... Please help me understand what I am missing here?... Finally the system: Server PG: Version string PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit Client Win XP SP3 with pgAdmin 1.12.2. -- Julius Tuskenis Programavimo skyriaus vadovas UAB nSoft mob. +37068233050
On Thu, Mar 10, 2011 at 10:26 AM, Julius Tuskenis <julius@nsoft.lt> wrote: > Hello, list > > Our company is creating a ticketing system. Of course the performance issues > are very important to us (as to all of you I guess). To increase speed of > some queries stable functions are used, but somehow they don't act exactly > as I expect, so would you please explain what am I doing (or expecting) > wrong... > > First of all I have the stable function witch runs fast and I have no > problems with it at all. > CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer, > prm_event_id integer, prm_cashier_id integer) > RETURNS numeric AS > ' > ........ some code here > ' > LANGUAGE plpgsql STABLE > COST 100; > > Now the test: > > 1) query without using the function > explain analyze > SELECT thtp_tick_id, price_id, > price_price, > price_color > FROM ticket_price > JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) > WHERE price_event_id = 7820 and (current_timestamp >= price_date AND > current_timestamp <= price_date_till) > ORDER BY price_id; > > Result: > "Sort (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842 > rows=4335 loops=1)" > " Sort Key: ticket_price.price_id" > " Sort Method: quicksort Memory: 433kB" > " -> Nested Loop (cost=0.00..109.12 rows=518 width=25) (actual > time=0.037..3.148 rows=4335 loops=1)" > " -> Index Scan using index_price_event_id on ticket_price > (cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7 > loops=1)" > " Index Cond: (price_event_id = 7820)" > " Filter: ((now() >= price_date) AND (now() <= > price_date_till))" > " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price > (cost=0.00..47.06 rows=259 width=8) (actual time=0.013..0.211 rows=619 > loops=7)" > " Index Cond: (ticket_has_ticket_price.thtp_price_id = > ticket_price.price_id)" > "Total runtime: 6.425 ms" > > > 2) Query using the function > explain analyze > SELECT thtp_tick_id, price_id, > price_price, web_select_extra_price(price_id, price_event_id, 1), > price_color > FROM ticket_price > JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) > WHERE price_event_id = 7820 and (current_timestamp >= price_date AND > current_timestamp <= price_date_till) > ORDER BY price_id; > > Result: > "Sort (cost=261.97..263.27 rows=518 width=29) (actual time=704.224..704.927 > rows=4335 loops=1)" > " Sort Key: ticket_price.price_id" > " Sort Method: quicksort Memory: 433kB" > " -> Nested Loop (cost=0.00..238.62 rows=518 width=29) (actual > time=0.272..699.073 rows=4335 loops=1)" > " -> Index Scan using index_price_event_id on ticket_price > (cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7 > loops=1)" > " Index Cond: (price_event_id = 7820)" > " Filter: ((now() >= price_date) AND (now() <= > price_date_till))" > " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price > (cost=0.00..47.06 rows=259 width=8) (actual time=0.017..0.582 rows=619 > loops=7)" > " Index Cond: (ticket_has_ticket_price.thtp_price_id = > ticket_price.price_id)" > "Total runtime: 705.531 ms" > > > Now what you can think is that executing web_select_extra_price takes the > difference, but > 3) As STABLE function should be executed once for every different set of > parameters I do > SELECT web_select_extra_price(price_id, 7820, 1) FROM ( > > SELECT distinct price_id > FROM ticket_price > JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) > WHERE price_event_id = 7820 and (current_timestamp >= price_date AND > current_timestamp <= price_date_till) > ) as qq; > > Result: > "Subquery Scan on qq (cost=110.34..110.88 rows=2 width=4) (actual > time=7.265..8.907 rows=7 loops=1)" > " -> HashAggregate (cost=110.34..110.36 rows=2 width=4) (actual > time=6.866..6.873 rows=7 loops=1)" > " -> Nested Loop (cost=0.00..109.05 rows=517 width=4) (actual > time=0.037..4.643 rows=4335 loops=1)" > " -> Index Scan using index_price_event_id on ticket_price > (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038 rows=7 loops=1)" > " Index Cond: (price_event_id = 7820)" > " Filter: ((now() >= price_date) AND (now() <= > price_date_till))" > " -> Index Scan using idx_thtp_price_id on > ticket_has_ticket_price (cost=0.00..47.04 rows=258 width=4) (actual > time=0.019..0.336 rows=619 loops=7)" > " Index Cond: (ticket_has_ticket_price.thtp_price_id = > ticket_price.price_id)" > "Total runtime: 8.966 ms" > > > You can see the query has only 7 distinct parameter sets to pass to the > function but... > 4) Explain analyze > SELECT web_select_extra_price(price_id, 7820, 1) > FROM ticket_price > JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) > WHERE price_event_id = 7820 and (current_timestamp >= price_date AND > current_timestamp <= price_date_till) > > Result: > "Nested Loop (cost=0.00..238.30 rows=517 width=4) (actual > time=0.365..808.537 rows=4335 loops=1)" > " -> Index Scan using index_price_event_id on ticket_price > (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)" > " Index Cond: (price_event_id = 7820)" > " Filter: ((now() >= price_date) AND (now() <= price_date_till))" > " -> Index Scan using idx_thtp_price_id on ticket_has_ticket_price > (cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619 > loops=7)" > " Index Cond: (ticket_has_ticket_price.thtp_price_id = > ticket_price.price_id)" > "Total runtime: 810.143 ms" > > > So I am totally confused... It seems that selecting 4335 rows is a joke for > Postgresql, but the great job is done then adding one of 7 possible values > to the result set... Please help me understand what I am missing here?... > > Finally the system: > Server > PG: Version string PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC > gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit > > Client > Win XP SP3 with pgAdmin 1.12.2. This is a huge problem with non trivial functions in the select list. Pushing the result into and a subquery does NOT guarantee that the inner result is materialized first. Try a CTE. with foo as ( select yadda; ) select func(foo.a), foo.* from foo; merlin
On Mon, Mar 14, 2011 at 3:46 AM, Julius Tuskenis <julius@nsoft.lt> wrote: > Hello, Merlin > > Thank you for your quick response. > > 2011.03.10 23:14, Merlin Moncure rašė: > > This is a huge problem with non trivial functions in the select list. > Pushing the result into and a subquery does NOT guarantee that the > inner result is materialized first. > > From the postgresql documentation about STABLE functions: "This category > allows the optimizer to optimize multiple calls of the function to a single > call." I thought that this means that optimizer executes the function only > for now parameter sets and stores results in some "cache" and use it if the > parameters are already known. I realize this is very naive approach and most > probably everything is much more complicated. I would appreciate if someone > would explain the mechanism (or provide with some useful link). Just because some optimizations can happen doesn't mean they will happen or there is even capability to make them happen. There was some recent discussion about this very topic here: http://postgresql.1045698.n5.nabble.com/function-contants-evaluated-for-every-row-td3278945.html. > Try a CTE. > > with foo as > ( > select yadda; > ) > select func(foo.a), foo.* from foo; > > I'm sorry, but I'm totally new to CTE. Would you please show me how should I > use the stable function and where the parameters should be put to improve > the behavior of the optimizer for my problem? WITH results as ( SELECT distinct price_id as price_id FROM ticket_price JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) WHERE price_event_id = 7820 and (current_timestamp >= price_date AND current_timestamp <= price_date_till) ) as qq ) SELECT web_select_extra_price(price_id, 7820, 1) from results; Another way to fight this is to play with the cost planner hint parameter in 'create function', but I prefer the CTE -- it gives strong guarantees about order of execution which is what you really want. CTEs are great btw, I'd start learning them immediately. IMNSHO, this (uncontrolled number of function executions when run via field select list) is a common gotcha w/postgres and a FAQ. Also the documentation is not very helpful on this point...do you agree CTE is the right way to advise handling this problem...is it worth further notation? merlin
On Thursday, March 10, 2011 05:26:00 PM Julius Tuskenis wrote: > 3) As STABLE function should be executed once for every different set of > parameters Thats not true. Thats not what any of the volatility information (like STABLE, IMMUTABLE, VOLATILE) does. See http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html It *does* change how often a function is executed though. I.e. SELECT g.i, some_stable_func(1) FROM generate_series(1, 1000) g(i) will call some_stable_func only once because it can determine all the parameters beforehand. Andres
Hello, Merlin
Thank you for your quick response.
2011.03.10 23:14, Merlin Moncure rašė:
Thank you in advance
Thank you for your quick response.
2011.03.10 23:14, Merlin Moncure rašė:
From the postgresql documentation about STABLE functions: "This category allows the optimizer to optimize multiple calls of the function to a single call." I thought that this means that optimizer executes the function only for now parameter sets and stores results in some "cache" and use it if the parameters are already known. I realize this is very naive approach and most probably everything is much more complicated. I would appreciate if someone would explain the mechanism (or provide with some useful link).This is a huge problem with non trivial functions in the select list. Pushing the result into and a subquery does NOT guarantee that the inner result is materialized first.
I'm sorry, but I'm totally new to CTE. Would you please show me how should I use the stable function and where the parameters should be put to improve the behavior of the optimizer for my problem?Try a CTE. with foo as ( select yadda; ) select func(foo.a), foo.* from foo;
Thank you in advance
-- Julius Tuskenis Programavimo skyriaus vadovas UAB nSoft mob. +37068233050
Julius Tuskenis <julius@nsoft.lt> writes: > From the postgresql documentation about STABLE functions: "This > category allows the optimizer to optimize multiple calls of the function > to a single call." I thought that this means that optimizer executes the > function only for now parameter sets and stores results in some "cache" > and use it if the parameters are already known. No, it does not. That function property *allows* the optimizer to invoke the function fewer times than would happen in an un-optimized query. It does not *require* it to do so. There is no such cache mechanism in Postgres, and it's unlikely that there ever will be, because it probably would be a net performance loss on average. regards, tom lane
Hello, 2011.03.14 15:41, Merlin Moncure rašė: > WITH results as > ( > SELECT distinct price_id as price_id > FROM ticket_price > JOIN ticket_has_ticket_price ON (price_id = thtp_price_id) > WHERE price_event_id = 7820 and (current_timestamp>= price_date AND > current_timestamp<= price_date_till) > ) as qq > ) > SELECT web_select_extra_price(price_id, 7820, 1) from results; > Thank you Merlin for your help. I have updated my function to use CTE. Although there was no performance improvement (I had the select with function using distinct values joined earlyer) it's good to know the optimizer will not change the way I want the query to be executed. Thank you once again. > CTEs are great btw, I'd start learning them immediately. I am going to do that. > IMNSHO, this (uncontrolled number of function executions when run via > field select list) is a common gotcha w/postgres and a FAQ. Also the > documentation is not very helpful on this point... Yes, I totally agree with you. I think sentence like "Although function is marked as STABLE or IMMUTABLE the optimizer is not obliged to take advantage of these properties." (sorry for my English). > do you agree CTE is the right way to advise handling this problem...is it worth further > notation? Yes, the CTE worked fine for me. Reading some more on this topic I found some comments that the optimizer has no possibility to know how many times the function is to be called in such queries (without actually executing the query), so there is no way to determine the cost. That explains why not the optimal plan was chosen to my query. -- Julius Tuskenis Programavimo skyriaus vadovas UAB nSoft mob. +37068233050
Thank you, Tom for you answer
2011.03.14 19:17, Tom Lane rašė:
2011.03.14 19:17, Tom Lane rašė:
Thank you for clearing that for me. I think these 2 sentences in documentation (http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html) would prevent misunderstandings in the future.That function property *allows* the optimizer to invoke the function fewer times than would happen in an un-optimized query. It does not *require* it to do so.
-- Julius Tuskenis Programavimo skyriaus vadovas UAB nSoft mob. +37068233050