Обсуждение: Query runs in 335ms; function in 100,239ms : date problem?
I have a function wrapping a (fairly complex) query. The query itself runs in about a 1/3rd of a second. When running the query as a 'RETURN QUERY' function on Postgres 8.4, the function runs in over 100 seconds, about 300 times slower. The function takes 3 input parameters: 2 dates and a boolean. The dates (in_date_from, in_date_to) are used several times in the function. When I replace the two parameters in the body of the query with, for instance date'2011-05-01' and date'2011-08-01', the function operates almost as speedily as the straight query. I would be grateful to know how to work around this date problem. As the function is rather large I have taken the liberty of posting it here: http://campbell-lange.net/media/files/fn_report_pers_leave.sql.html Rory
On 5 Září 2011, 23:07, Rory Campbell-Lange wrote: > I have a function wrapping a (fairly complex) query. > > The query itself runs in about a 1/3rd of a second. When running the > query as a 'RETURN QUERY' function on Postgres 8.4, the function runs in > over 100 seconds, about 300 times slower. > > The function takes 3 input parameters: 2 dates and a boolean. The dates > (in_date_from, in_date_to) are used several times in the function. > > When I replace the two parameters in the body of the query with, for > instance date'2011-05-01' and date'2011-08-01', the function operates > almost as speedily as the straight query. > > I would be grateful to know how to work around this date problem. > > As the function is rather large I have taken the liberty of posting it > here: > http://campbell-lange.net/media/files/fn_report_pers_leave.sql.html Do I understand correctly that you compare a query with literal parameters with a parametrized query wrapped in a plpgsql function? Try to run it as a prepared query - I guess you'll get about the same run time as with the function (i.e. over 100 seconds). The problem with prepared statements is that when planning the query, the parameter values are unknown - so the optimizer does not know selectivity of the conditions etc. and uses "common" values to prepare a safe plan. OTOH the literal parameters allow to optimize the plan according to the actual parameter values. Tomas
On 05/09/11, Tomas Vondra (tv@fuzzy.cz) wrote: > On 5 Zá??í 2011, 23:07, Rory Campbell-Lange wrote: ... > > The query itself runs in about a 1/3rd of a second. When running the > > query as a 'RETURN QUERY' function on Postgres 8.4, the function runs in > > over 100 seconds, about 300 times slower. ... > > As the function is rather large I have taken the liberty of posting it > > here: > > http://campbell-lange.net/media/files/fn_report_pers_leave.sql.html > > Do I understand correctly that you compare a query with literal parameters > with a parametrized query wrapped in a plpgsql function? Yes! Certainly I need to make the function perform more quickly. > Try to run it as a prepared query - I guess you'll get about the same run > time as with the function (i.e. over 100 seconds). The prepared query runs in almost exactly the same time as the function, but thanks for the suggestion. A very useful aspect of it is that I was able to get the EXPLAIN output which I guess gives a fairly good picture of the plan used for the function. The explain output is here: http://campbell-lange.net/media/files/explain.txt.html I'm inexperienced in reading EXPLAIN output, but it looks like the Nested Loop Semi Join at line 72 is running very slowly. > The problem with prepared statements is that when planning the query, the > parameter values are unknown - so the optimizer does not know selectivity > of the conditions etc. and uses "common" values to prepare a safe plan. > OTOH the literal parameters allow to optimize the plan according to the > actual parameter values. Thank you very much for the helpful explanation. Regards Rory
On 05/09/11, Rory Campbell-Lange (rory@campbell-lange.net) wrote: > On 05/09/11, Tomas Vondra (tv@fuzzy.cz) wrote: > > On 5 Zá??í 2011, 23:07, Rory Campbell-Lange wrote: > ... > > > The query itself runs in about a 1/3rd of a second. When running the > > > query as a 'RETURN QUERY' function on Postgres 8.4, the function runs in > > > over 100 seconds, about 300 times slower. ... > > Try to run it as a prepared query - I guess you'll get about the same run > > time as with the function (i.e. over 100 seconds). > > The prepared query runs in almost exactly the same time as the function, > but thanks for the suggestion. A very useful aspect of it is that I was > able to get the EXPLAIN output which I guess gives a fairly good picture > of the plan used for the function. > > The explain output is here: > http://campbell-lange.net/media/files/explain.txt.html > > I'm inexperienced in reading EXPLAIN output, but it looks like the > Nested Loop Semi Join at line 72 is running very slowly. I added in more filtering conditions to the clause at line 72 and the prepared statement dropped in runtime to 24043.902 ms. Unfortunately the function ran slower -- 47957.796 -- but even that is a 50% improvement. Thanks very much for your help. Regards Rory
On 6 Září 2011, 0:27, Rory Campbell-Lange wrote: > On 05/09/11, Tomas Vondra (tv@fuzzy.cz) wrote: >> Do I understand correctly that you compare a query with literal >> parameters >> with a parametrized query wrapped in a plpgsql function? > > Yes! Certainly I need to make the function perform more quickly. > >> Try to run it as a prepared query - I guess you'll get about the same >> run >> time as with the function (i.e. over 100 seconds). > > The prepared query runs in almost exactly the same time as the function, > but thanks for the suggestion. A very useful aspect of it is that I was > able to get the EXPLAIN output which I guess gives a fairly good picture > of the plan used for the function. Well, my point was that the queries wrapped in functions are executed just like prepared statements. And because prepared queries do not use parameter values to optimize the plan, the result may be worse compared to queries with literal parameters. So I was not expecting an improvement, I was merely trying to show the problem. > The explain output is here: > http://campbell-lange.net/media/files/explain.txt.html > > I'm inexperienced in reading EXPLAIN output, but it looks like the > Nested Loop Semi Join at line 72 is running very slowly. I've posted the plan here: http://explain.depesz.com/s/uYX Yes, the nested loop is the problem. One of the problems is that the join condition - can you rewrite this AND r.d_date || '-' || r.n_session || '-' || u.n_id IN (SELECT d_date || '-' || n_session || '-' || n_person FROM leave_association WHERE d_date >= in_date_from AND d_date <= in_date_to ) -- i.e. leave where the person normally works like this AND EXISTS (SELECT 1 FROM leave_association WHERE d_date >= in_date_from AND d_date <= in_date_to AND d_date = r.d_date AND n_session = r.n_session AND n_person = u.n_id ) -- i.e. leave where the person normally works and then do the same for the NOT IN subquery (=> NOT EXISTS). I think it should return the same results, plus it might use indexes on the leave_association. That was not possible because of the join condition. Tomas
Rory Campbell-Lange schrieb: >>> Try to run it as a prepared query - I guess you'll get about the same run >>> time as with the function (i.e. over 100 seconds). >> The prepared query runs in almost exactly the same time as the function, >> but thanks for the suggestion. A very useful aspect of it is that I was >> able to get the EXPLAIN output which I guess gives a fairly good picture >> of the plan used for the function. >> >> The explain output is here: >> http://campbell-lange.net/media/files/explain.txt.html >> >> I'm inexperienced in reading EXPLAIN output, but it looks like the >> Nested Loop Semi Join at line 72 is running very slowly. > > I added in more filtering conditions to the clause at line 72 and the > prepared statement dropped in runtime to 24043.902 ms. Unfortunately the > function ran slower -- 47957.796 -- but even that is a 50% improvement. Also declare your function as STABLE. This can trigger an speed-increase. Greetings, Torsten -- http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8 verschiedenen Datenbanksystemen abstrahiert, Queries von Applikationen trennt und automatisch die Query-Ergebnisse auswerten kann.
"Tomas Vondra" <tv@fuzzy.cz> writes: > On 6 Září 2011, 0:27, Rory Campbell-Lange wrote: >> The prepared query runs in almost exactly the same time as the function, >> but thanks for the suggestion. A very useful aspect of it is that I was >> able to get the EXPLAIN output which I guess gives a fairly good picture >> of the plan used for the function. > Well, my point was that the queries wrapped in functions are executed just > like prepared statements. And because prepared queries do not use > parameter values to optimize the plan, the result may be worse compared to > queries with literal parameters. So I was not expecting an improvement, I > was merely trying to show the problem. Right. The actual fix is to use EXECUTE so you force a new plan to be generated each time. If you use EXECUTE USING to insert the parameter values, you can avoid most of the notational mess this would otherwise imply, as well as the risk of SQL-injection bugs from failing to quote parameter values safely. I'm hoping that 9.2 will be smart enough to not need such workarounds, but that's where things stand at the moment. regards, tom lane