Re: why doesn't optimizer can pull up where a > ( ... )

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: why doesn't optimizer can pull up where a > ( ... )
Дата
Msg-id 20191120172001.gri67zdnwycsobxy@development
обсуждение исходный текст
Ответ на why doesn't optimizer can pull up where a > ( ... )  (Andy Fan <zhihui.fan1213@gmail.com>)
Список pgsql-hackers
On Wed, Nov 20, 2019 at 08:15:19PM +0800, Andy Fan wrote:
>Hi Hackers:
>
> First I found the following queries running bad on pg.
>
> select count(*) from part2 p1 where p_size > 40 and p_retailprice >
>(select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand);
>
>the plan is
>                                     QUERY PLAN
>------------------------------------------------------------------------------------
> Aggregate  (cost=1899310537.28..1899310537.29 rows=1 width=8)
>   ->  Seq Scan on part2 p1  (cost=0.00..1899310456.00 rows=32513 width=0)
>         Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1)))
>         SubPlan 1
>           ->  Aggregate  (cost=6331.00..6331.01 rows=1 width=32)
>                 ->  Seq Scan on part2 p2  (cost=0.00..5956.00 rows=150000
>width=4)
>                       Filter: (p_brand = p1.p_brand)
>
>however if we change it to the following format, it runs pretty quick.
>
>select count(*) from part2,
>(select p_brand, avg(p_retailprice) as avg_price from part2 where p_size >
>40 group by p_brand) p2
>where p_retailprice > p2.avg_price
>and p_size > 40
>and part2.p_brand = p2.p_brand;
>
>The above example comes from
>https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with
>a litter modification.
>
>1.  why pg can't translate the query 1 to query 2.  after some checking
>on pull_up_sublinks_qual_recurse,  I still doesn't get the idea.
>2.  why pg can't do it,  while greenplum can?
>

I don't know the exact place(s) in the optimizer that would need to
consider this optimization, but the primary difference between the two
queries is that the first one is correlated subquery, while the second
one is not.

So I guess our optimizer is not smart enough to recognize this pattern,
and can't do the transformation from

    ... FROM p WHERE x > (SELECT avg(x) FROM q WHERE p.id = q.id) ...

to

    ... FROM p, (SELECT q.id, avg(x) x FROM q) q2 WHERE q2.id = p.id
                                                    AND q2.x < p.x

I.e. we don't have the code to consider this optimization, because no
one considered it interesting enough to submit a patch.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Planning counters in pg_stat_statements (using pgss_store)
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: why doesn't optimizer can pull up where a > ( ... )