Обсуждение: How to force subquery scan?

Поиск
Список
Период
Сортировка

How to force subquery scan?

От
"Tambet Matiisen"
Дата:
Another (possibly design?) problem of mine. I have a function
product_cost(product_id, date), which does simple SELECT call. I
declared it as STABLE, hoping that multiple invocations of the same
function are replaced with one. Query is something like this:

SELECT   p.product_id,   avg(product_cost(s.product_id, s.date)) as average_cost,   sum(product_cost(s.product_id,
s.date)* s.amount) as cost_total 
FROM products p
LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01'
GROUP BY p.product_id;

(For those interested in the intent of the query - imagine there is
internal cost associated with a product, which is different in different
periods. There is no cost column in sales table, because it might
change, for previous periods too).

When I ran the query for long periods I observed that my assumption
about STABLE was wrong. It did not help to reduce function invocations,
as one could think after reading the documentation. It was also
confirmed in mailing lists, that STABLE only allows function to be used
in index scan, there is no function result cacheing.

I was able to reduce function calls to just one per row by using
subquery:

SELECT   p.product_id,   avg(s.cost) as average_cost,   sum(s.cost * s.amount) as cost_total
FROM products p
LEFT JOIN (SELECT *, product_cost(product_id, date) as cost FROM sales)
s ON s.date between '2004-01-01' and '2005-01-01'
GROUP BY p.product_id;

But it did work only as long I used LEFT JOIN. When I used regular JOIN,
the optimizer happily optimized subquery scan to just table scan and
elevated the function call to next query level, where it was executed
twice. My question is, is there a trick that would force subquery scan
when I want it?
 Tambet


Re: How to force subquery scan?

От
Christoph Haller
Дата:
Tambet Matiisen wrote:
> 
> Another (possibly design?) problem of mine. I have a function
> product_cost(product_id, date), which does simple SELECT call. I
> declared it as STABLE, hoping that multiple invocations of the same
> function are replaced with one. Query is something like this:
> 
> SELECT
>     p.product_id,
>     avg(product_cost(s.product_id, s.date)) as average_cost,
>     sum(product_cost(s.product_id, s.date) * s.amount) as cost_total
> FROM products p
> LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01'
> GROUP BY p.product_id;
> 
> (For those interested in the intent of the query - imagine there is
> internal cost associated with a product, which is different in different
> periods. There is no cost column in sales table, because it might
> change, for previous periods too).
> 
> When I ran the query for long periods I observed that my assumption
> about STABLE was wrong. It did not help to reduce function invocations,
> as one could think after reading the documentation. It was also
> confirmed in mailing lists, that STABLE only allows function to be used
> in index scan, there is no function result cacheing.
> 
> I was able to reduce function calls to just one per row by using
> subquery:
> 
> SELECT
>     p.product_id,
>     avg(s.cost) as average_cost,
>     sum(s.cost * s.amount) as cost_total
> FROM products p
> LEFT JOIN (SELECT *, product_cost(product_id, date) as cost FROM sales)
> s ON s.date between '2004-01-01' and '2005-01-01'
> GROUP BY p.product_id;
> 
> But it did work only as long I used LEFT JOIN. When I used regular JOIN,
> the optimizer happily optimized subquery scan to just table scan and
> elevated the function call to next query level, where it was executed
> twice. My question is, is there a trick that would force subquery scan
> when I want it?
> 
>   Tambet
> 

Does this do better: 

SELECT   prodid,   avg(prodcost) as average_cost,   sum(prodcost * salesamount) as cost_total
FROM (
SELECT   p.product_id as prodid,   product_cost(s.product_id, s.date) as prodcost,   s.amount as salesamount
FROM products p
LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01'
) q
GROUP BY prodid;


Regards, Christoph


Re: How to force subquery scan?

От
"Tambet Matiisen"
Дата:
...
>
> Does this do better:
>
> SELECT
>     prodid,
>     avg(prodcost) as average_cost,
>     sum(prodcost * salesamount) as cost_total
> FROM (
> SELECT
>     p.product_id as prodid,
>     product_cost(s.product_id, s.date) as prodcost,
>     s.amount as salesamount
> FROM products p
> LEFT JOIN sales s ON s.date between '2004-01-01' and '2005-01-01'
> ) q
> GROUP BY prodid;
>

Sorry, couldn't test it as my query wasn't actual query. I just made up
something for the list. But I made some tests with similar queries and
your solution doesn't help. Subquery scan is not forced, as your query
can be easily translated into single flat query returning the same
results.

It seems that subquery scan is only used, when the query can not be
translated into single flat query. Which is mostly good, I assume. The
problem is, that if SELECT list of subquery contains function call, and
the result of this function is used in multiple places in outer query,
then the function is invoked multiple times (per row). This can get
expensive with slow function and big queries.
 Tambet


Re: How to force subquery scan?

От
Tom Lane
Дата:
"Tambet Matiisen" <t.matiisen@aprote.ee> writes:
> It seems that subquery scan is only used, when the query can not be
> translated into single flat query. Which is mostly good, I assume.

The planner thinks so anyway ;-)

If you're desperate you can put in an optimization fence, for instance
LIMIT or OFFSET.
SELECT * FROM (SELECT ... OFFSET 0) ss;

In principle the planner could figure out that this offset is a no-op,
throw it away, and then flatten the query.  But it doesn't at the
moment, and I doubt we'll teach it to do so in the future.
        regards, tom lane


Re: How to force subquery scan?

От
"Tambet Matiisen"
Дата:

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, March 16, 2005 5:42 PM
> To: Tambet Matiisen
> Cc: ch@rodos.fzk.de; pgsql-sql@postgresql.org
> Subject: Re: [SQL] How to force subquery scan?
>
>
> "Tambet Matiisen" <t.matiisen@aprote.ee> writes:
> > It seems that subquery scan is only used, when the query can not be
> > translated into single flat query. Which is mostly good, I assume.
>
> The planner thinks so anyway ;-)
>
> If you're desperate you can put in an optimization fence, for
> instance LIMIT or OFFSET.
>
>     SELECT * FROM (SELECT ... OFFSET 0) ss;
>
> In principle the planner could figure out that this offset is
> a no-op, throw it away, and then flatten the query.  But it
> doesn't at the moment, and I doubt we'll teach it to do so in
> the future.
>
>             regards, tom lane
>

Thanks, that did the trick. It was a bit more strict than expected,
because WHERE of the outer query is not optimized into subquery. But
considering the semantics of OFFSET that seems reasonable and I can work
around it for now.
 Tambet