Обсуждение: Another planner oddity

Поиск
Список
Период
Сортировка

Another planner oddity

От
Philip Warner
Дата:
Another mild planning oddity; this time, the query does not seem to rem,ove
an unreferenced column from the plan. No big deal, but for larger queries
it can significantly increase the cost.

create table g(n text, rn text); 
create table r(n text, p int);
create table t(p int, x int);

-- Basically LOJ t->r->g, and return 'n' from g if found.
create view tv as selectt.p,g.n as gn,x
from t left outer join r on (r.p=t.p)left outer join g on (g.rn = r.n);

explain select (select r.n from r where r.p=tv.p), -- no reference to gn!sum(x)
Fromtv
Group by 1
;

Aggregate  (cost=3378.54..3503.54 rows=2500 width=76) ->  Group  (cost=3378.54..3441.04 rows=25000 width=76)       ->
Sort (cost=3378.54..3378.54 rows=25000 width=76)             ->  Merge Join  (cost=584.18..911.68 rows=25000 width=76)
                ->  Sort  (cost=514.35..514.35 rows=5000 width=44)                         ->  Merge Join
(cost=139.66..207.16rows=5000
 
width=44)                               ->  Sort  (cost=69.83..69.83 rows=1000
width=8)                                     ->  Seq Scan on t  (cost=0.00..20.00
rows=1000 width=8)                               ->  Sort  (cost=69.83..69.83 rows=1000
width=36)                                     ->  Seq Scan on r  (cost=0.00..20.00
rows=1000 width=36)                   ->  Sort  (cost=69.83..69.83 rows=1000 width=32)
!!!!!!                    ->  Seq Scan on g  (cost=0.00..20.00 rows=1000
width=32)                   SubPlan
!?                    ->  Seq Scan on r  (cost=0.00..22.50 rows=5 width=32)


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Another planner oddity

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> explain select 
>     (select r.n from r where r.p=tv.p), -- no reference to gn!
>     sum(x)
> From
>     tv

What's your point?  We can't omit the join to g, as that would change
the set of returned rows.  (In general, anyway; in this case the
dependency is that multiple matches in g would change sum(x) for
any given r.n.)
        regards, tom lane


Re: Another planner oddity

От
Philip Warner
Дата:
At 10:53 3/11/01 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> explain select 
>>     (select r.n from r where r.p=tv.p), -- no reference to gn!
>>     sum(x)
>> From
>>     tv
>
>What's your point?  We can't omit the join to g, as that would change
>the set of returned rows.  (In general, anyway; in this case the
>dependency is that multiple matches in g would change sum(x) for
>any given r.n.)

Oops. Left out too much. Make each of the ref'd tables unique (so only one
match for given t.p):

create table g(n text, rn text unique); 
create table r(n text, p int primary key);
create table t(p int, x int);

create view tv as selectt.p,g.n as gn,x
from t left outer join r on (r.p=t.p)left outer join g on (g.rn = r.n);

explain select (select r.n from r where r.p=tv.p), -- no reference to gn!sum(x)
Fromtv
Group by 1
;

Aggregate  (cost=308.49..313.49 rows=100 width=76) ->  Group  (cost=308.49..310.99 rows=1000 width=76)       ->  Sort
(cost=308.49..308.49rows=1000 width=76)             ->  Merge Join  (cost=189.16..258.66 rows=1000 width=76)
      ->  Index Scan using g_rn_key on g  (cost=0.00..52.00
 
rows=1000 width=32)                   ->  Sort  (cost=189.16..189.16 rows=1000 width=44)                         ->
MergeJoin  (cost=69.83..139.33 rows=1000
 
width=44)                               ->  Index Scan using r_pkey on r
(cost=0.00..52.00 rows=1000 width=36)                               ->  Sort  (cost=69.83..69.83 rows=1000
width=8)                                     ->  Seq Scan on t  (cost=0.00..20.00
rows=1000 width=8)                   SubPlan                     ->  Index Scan using r_pkey on r  (cost=0.00..4.82
rows=1 width=32)




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Another planner oddity

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> At 10:53 3/11/01 -0500, Tom Lane wrote:
>> What's your point?  We can't omit the join to g, as that would change
>> the set of returned rows.

> Oops. Left out too much. Make each of the ref'd tables unique (so only one
> match for given t.p):

Hmm.  That in combination with the LEFT OUTER JOIN might be sufficient
to ensure that the output is the same with or without scanning g ...
but it seems far too fragile and specialized a chain of reasoning to
consider trying to get the planner to duplicate it.

We have to consider not only the potential benefit of any suggested
planner optimization, but also how often it's likely to win and how
many cycles we're likely to waste testing for the condition when it
doesn't hold.  This seems very unpromising.

My thoughts here are probably colored by bad past experience: before
about 6.5, the planner would in fact discard unreferenced relations
from its plan, with the result that it gave wrong answers for
perfectly-reasonable queries like "SELECT count(1) FROM foo".
I won't put back such an optimization without strong guarantees that
it's correct, and that implies a lot of cycles expended to determine
whether the optimization applies.
        regards, tom lane