Обсуждение: How to force subquery scan?
Another (possibly design?) problem of mine. I have a function product_cost(product_id, date), which does simple SELECT call. I declared it as STABLE, hoping that multiple invocations of the same function are replaced with one. Query is something like this: SELECT p.product_id, avg(product_cost(s.product_id, s.date)) as average_cost, sum(product_cost(s.product_id, s.date)* s.amount) as cost_total FROM products p LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01' GROUP BY p.product_id; (For those interested in the intent of the query - imagine there is internal cost associated with a product, which is different in different periods. There is no cost column in sales table, because it might change, for previous periods too). When I ran the query for long periods I observed that my assumption about STABLE was wrong. It did not help to reduce function invocations, as one could think after reading the documentation. It was also confirmed in mailing lists, that STABLE only allows function to be used in index scan, there is no function result cacheing. I was able to reduce function calls to just one per row by using subquery: SELECT p.product_id, avg(s.cost) as average_cost, sum(s.cost * s.amount) as cost_total FROM products p LEFT JOIN (SELECT *, product_cost(product_id, date) as cost FROM sales) s ON s.date between '2004-01-01' and '2005-01-01' GROUP BY p.product_id; But it did work only as long I used LEFT JOIN. When I used regular JOIN, the optimizer happily optimized subquery scan to just table scan and elevated the function call to next query level, where it was executed twice. My question is, is there a trick that would force subquery scan when I want it? Tambet
Tambet Matiisen wrote: > > Another (possibly design?) problem of mine. I have a function > product_cost(product_id, date), which does simple SELECT call. I > declared it as STABLE, hoping that multiple invocations of the same > function are replaced with one. Query is something like this: > > SELECT > p.product_id, > avg(product_cost(s.product_id, s.date)) as average_cost, > sum(product_cost(s.product_id, s.date) * s.amount) as cost_total > FROM products p > LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01' > GROUP BY p.product_id; > > (For those interested in the intent of the query - imagine there is > internal cost associated with a product, which is different in different > periods. There is no cost column in sales table, because it might > change, for previous periods too). > > When I ran the query for long periods I observed that my assumption > about STABLE was wrong. It did not help to reduce function invocations, > as one could think after reading the documentation. It was also > confirmed in mailing lists, that STABLE only allows function to be used > in index scan, there is no function result cacheing. > > I was able to reduce function calls to just one per row by using > subquery: > > SELECT > p.product_id, > avg(s.cost) as average_cost, > sum(s.cost * s.amount) as cost_total > FROM products p > LEFT JOIN (SELECT *, product_cost(product_id, date) as cost FROM sales) > s ON s.date between '2004-01-01' and '2005-01-01' > GROUP BY p.product_id; > > But it did work only as long I used LEFT JOIN. When I used regular JOIN, > the optimizer happily optimized subquery scan to just table scan and > elevated the function call to next query level, where it was executed > twice. My question is, is there a trick that would force subquery scan > when I want it? > > Tambet > Does this do better: SELECT prodid, avg(prodcost) as average_cost, sum(prodcost * salesamount) as cost_total FROM ( SELECT p.product_id as prodid, product_cost(s.product_id, s.date) as prodcost, s.amount as salesamount FROM products p LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01' ) q GROUP BY prodid; Regards, Christoph
... > > Does this do better: > > SELECT > prodid, > avg(prodcost) as average_cost, > sum(prodcost * salesamount) as cost_total > FROM ( > SELECT > p.product_id as prodid, > product_cost(s.product_id, s.date) as prodcost, > s.amount as salesamount > FROM products p > LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01' > ) q > GROUP BY prodid; > Sorry, couldn't test it as my query wasn't actual query. I just made up something for the list. But I made some tests with similar queries and your solution doesn't help. Subquery scan is not forced, as your query can be easily translated into single flat query returning the same results. It seems that subquery scan is only used, when the query can not be translated into single flat query. Which is mostly good, I assume. The problem is, that if SELECT list of subquery contains function call, and the result of this function is used in multiple places in outer query, then the function is invoked multiple times (per row). This can get expensive with slow function and big queries. Tambet
"Tambet Matiisen" <t.matiisen@aprote.ee> writes: > It seems that subquery scan is only used, when the query can not be > translated into single flat query. Which is mostly good, I assume. The planner thinks so anyway ;-) If you're desperate you can put in an optimization fence, for instance LIMIT or OFFSET. SELECT * FROM (SELECT ... OFFSET 0) ss; In principle the planner could figure out that this offset is a no-op, throw it away, and then flatten the query. But it doesn't at the moment, and I doubt we'll teach it to do so in the future. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, March 16, 2005 5:42 PM > To: Tambet Matiisen > Cc: ch@rodos.fzk.de; pgsql-sql@postgresql.org > Subject: Re: [SQL] How to force subquery scan? > > > "Tambet Matiisen" <t.matiisen@aprote.ee> writes: > > It seems that subquery scan is only used, when the query can not be > > translated into single flat query. Which is mostly good, I assume. > > The planner thinks so anyway ;-) > > If you're desperate you can put in an optimization fence, for > instance LIMIT or OFFSET. > > SELECT * FROM (SELECT ... OFFSET 0) ss; > > In principle the planner could figure out that this offset is > a no-op, throw it away, and then flatten the query. But it > doesn't at the moment, and I doubt we'll teach it to do so in > the future. > > regards, tom lane > Thanks, that did the trick. It was a bit more strict than expected, because WHERE of the outer query is not optimized into subquery. But considering the semantics of OFFSET that seems reasonable and I can work around it for now. Tambet