Обсуждение: omitting redundant join predicate

Поиск
Список
Период
Сортировка

omitting redundant join predicate

От
Ehab Galal
Дата:

I tried the following query

 

explain select *

from t1, t2, t3

where t1.f <= t2.f

  and t2.f <= t3.f

  and t1.f <= t3.f;

 

And that's what I got:

 

Nested Loop  (cost=0.00..3.15 rows=1 width=368)

   Join Filter: (("outer".f <= "inner".f) AND ("inner".f <= "outer".f))

   ->  Nested Loop  (cost=0.00..2.10 rows=1 width=218)

         Join Filter: ("outer".f <= "inner".f)

         ->  Seq Scan on t1  (cost=0.00..1.01 rows=1 width=146)

         ->  Seq Scan on t3  (cost=0.00..1.04 rows=4 width=72)

   ->  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=150)

 

 

I was wondering if there is a way to omit the redundant join predicate.

 

Thanks,

--h



Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now!

Re: omitting redundant join predicate

От
Tom Lane
Дата:
Ehab Galal <ehabgalal123@hotmail.com> writes:
> explain select * 
> from t1, t2, t3 
> where t1.f <= t2.f
>   and t2.f <= t3.f
>   and t1.f <= t3.f;

> I was wondering if there is a
> way to omit the redundant join predicate.

You're not being very clear here.  Do you mean will you get the same
answer if you omit "t1.f <= t3.f"?  Yes, of course (ignoring possibly
different output ordering).  Do you mean you think the system should
discard it as redundant?  I disagree --- the more join clauses the
better, as a rule.  Do you mean that the EXPLAIN output looks like
the same comparison is being applied twice?  It isn't --- in a more
modern PG release the output looks like this:
                           QUERY PLAN                            
------------------------------------------------------------------Nested Loop  (cost=33.54..81794021.44 rows=362975624
width=12) Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))  ->  Nested Loop  (cost=0.00..124472.40 rows=1526533
width=8)       Join Filter: (t1.f <= t3.f)        ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=4)        ->
SeqScan on t3  (cost=0.00..31.40 rows=2140 width=4)  ->  Materialize  (cost=33.54..54.94 rows=2140 width=4)        ->
SeqScan on t2  (cost=0.00..31.40 rows=2140 width=4)
 
(8 rows)

This is of course the stupidest possible join plan, but it's hard to do
much better --- both hash and merge joins work only on equality
conditions.  You can do a bit better with an index on t2.f:
                             QUERY PLAN                              
----------------------------------------------------------------------Nested Loop  (cost=0.00..13222230.60
rows=362975624width=12)  ->  Nested Loop  (cost=0.00..124472.40 rows=1526533 width=8)        Join Filter: (t1.f <=
t3.f)       ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=4)        ->  Seq Scan on t3  (cost=0.00..31.40
rows=2140width=4)  ->  Index Scan using t2i on t2  (cost=0.00..5.01 rows=238 width=4)        Index Cond: ((t1.f <=
t2.f)AND (t2.f <= t3.f))
 
(7 rows)
        regards, tom lane


Re: omitting redundant join predicate

От
Ehab Galal
Дата:
Sorry for not being clear enough. What i meant is how aware the optimizer is about the transitivity of operators.

I agree that the more join clauses a query gets, the more flexibility the optimizer gets to pick an optimal plan.

what i expected is that the optimizer will use the redundant predicates to create the plan, but the execution plan itself will not execute a redundant predicate.

O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality and it is working well :)

Nested Loop  (cost=0.00..3.14 rows=1 width=368)
   Join Filter: ("outer".username = "inner".username)
   ->  Nested Loop  (cost=0.00..2.10 rows=1 width=218)
         Join Filter: ("outer".username = "inner".username)
         ->  Seq Scan on t1  (cost=0.00..1.01 rows=1 width=146)
         ->  Seq Scan on t3  (cost=0.00..1.04 rows=4 width=72)
   ->  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=150)

I am using postgresql 8.5.1, I am wondering is there is any patch that i can run to enable it to put the actual table names instead of inner/outer. Should i post this to the hackers mailing list?

Thanks a lot.




> To: ehabgalal123@hotmail.com
> CC: pgsql-sql@postgresql.org
> Subject: Re: [SQL] omitting redundant join predicate
> Date: Sun, 4 Nov 2007 11:35:36 -0500
> From: tgl@sss.pgh.pa.us
>
> Ehab Galal <ehabgalal123@hotmail.com> writes:
> > explain select *
> > from t1, t2, t3
> > where t1.f <= t2.f
> > and t2.f <= t3.f
> > and t1.f <= t3.f;
>
> > I was wondering if there is a
> > way to omit the redundant join predicate.
>
> You're not being very clear here. Do you mean will you get the same
> answer if you omit "t1.f <= t3.f"? Yes, of course (ignoring possibly
> different output ordering). Do you mean you think the system should
> discard it as redundant? I disagree --- the more join clauses the
> better, as a rule. Do you mean that the EXPLAIN output looks like
> the same comparison is being applied twice? It isn't --- in a more
> modern PG release the output looks like this:
>
> QUERY PLAN
> ------------------------------------------------------------------
> Nested Loop (cost=33.54..81794021.44 rows=362975624 width=12)
> Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))
> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)
> Join Filter: (t1.f <= t3.f)
> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)
> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)
> -> Materialize (cost=33.54..54.94 rows=2140 width=4)
> -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=4)
> (8 rows)
>
> This is of course the stupidest possible join plan, but it's hard to do
> much better --- both hash and merge joins work only on equality
> conditions. You can do a bit better with an index on t2.f:
>
> QUERY PLAN
> ----------------------------------------------------------------------
> Nested Loop (cost=0.00..13222230.60 rows=362975624 width=12)
> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)
> Join Filter: (t1.f <= t3.f)
> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)
> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)
> -> Index Scan using t2i on t2 (cost=0.00..5.01 rows=238 width=4)
> Index Cond: ((t1.f <= t2.f) AND (t2.f <= t3.f))
> (7 rows)
>
> regards, tom lane


Help yourself to FREE treats served up daily at the Messenger Café. Stop by today!

Re: omitting redundant join predicate

От
Tom Lane
Дата:
Ehab Galal <ehabgalal123@hotmail.com> writes:
> what i expected is that the optimizer will use the redundant predicates
> to create the plan, but the execution plan itself will not execute a
> redundant predicate. 

> O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality and it is working
well:)
 

The planner has a great deal more smarts about equality conditions than
inequality conditions.  There's more you can do with equalities, and
it's more useful for typical queries.

> I am using postgresql 8.5.1, I am wondering is there is any patch that
> i can run to enable it to put the actual table names instead of
> inner/outer.

Update to 8.2.
        regards, tom lane


materialize

От
Ehab Galal
Дата:
Greetings,<br /><br />I was wondering why do we need the Materialize node in the plan below when i explain a query? <br
/><br/><br />1: QUERY PLAN = "Nested Loop  (cost=10.99..24.34 rows=1 width=846)"    (typeid = 25, len = -1, typmod =
-1,byval = f)<br />----<br />1: QUERY PLAN = "      Join Filter: (("inner".cover)::text = ("outer".cover)::text)"  
 (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = "      ->  Index Scan using
idx_cover_ftn_on c_cover_ftn  (cost=0.00..9.30 rows=2 width=86)"    (typeid = 25, len = -1, typmod = -1, byval = f)<br
/>----<br/>1: QUERY PLAN = "               Index Cond: (username = 'user1'::name)"    (typeid = 25, len = -1, typmod =
-1,byval = f)<br />----<br />1: QUERY PLAN = "      ->  Materialize  (cost=10.99..11.89 rows=90 width=842)"  
 (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = "               ->  Seq Scan on
books (cost=0.00..10.90 rows=90 width=842)"    (typeid = 25, len = -1, typmod = -1, byval = f)<br />----<br /><br /><br
/>Thanks,<br/>Ehab<br /><br /><hr />Your smile counts. The more smiles you share, the more we donate. <a
href="www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline"target="_new">Join in!</a> 

Re: materialize

От
Tom Lane
Дата:
Ehab Galal <ehabgalal123@hotmail.com> writes:
> I was wondering why do we need the Materialize node in the plan below when i explain a query? 

It's cheaper to scan a materialized rowset many times than a raw table
--- we don't need to re-access shared memory nor re-check row visibility.
        regards, tom lane