Обсуждение: One question about transformation ANY Sublinks into joins

Поиск
Список
Период
Сортировка

One question about transformation ANY Sublinks into joins

От
"Armor"
Дата:
Hi
    I run a simple SQL with latest PG:
postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
           Filter: (t1.c1 = c2)
(5 rows)

and the table schema are as following:

postgres=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id1    | integer | 
 c1     | integer | 

postgres=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id2    | integer | 
 c2     | integer | 

     I find PG decide not to pull up this sublink because the whereClauses in this sublink refer to the Vars of parent query, for detail please check the function named convert_ANY_sublink_to_join in src/backend/optimizer/plan/subselect.c. 
     However, for such simple sublink which has no agg, no window function, no limit, may be we can carefully pull up the predicates in whereCluase which refers to the Vars of parent query, then pull up this sublink and produce a query plan as following:
    
postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);
                               QUERY PLAN                               
------------------------------------------------------------------------
 Hash Join  (cost=49.55..99.23 rows=565 width=8)
   Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
   ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=46.16..46.16 rows=226 width=8)
         ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
               Group Key: t2.id2, t2.c2
               ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)
   
------------------
Jerry Yu
https://github.com/scarbrofair
 

Re: One question about transformation ANY Sublinks into joins

От
Robert Haas
Дата:
On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:
> Hi
>     I run a simple SQL with latest PG:
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                          QUERY PLAN
> ------------------------------------------------------------
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>    Filter: (SubPlan 1)
>    SubPlan 1
>      ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>            Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>       Table "public.t1"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id1    | integer |
>  c1     | integer |
>
> postgres=# \d t2
>       Table "public.t2"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id2    | integer |
>  c2     | integer |
>
>      I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>      However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>    Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>    ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>    ->  Hash  (cost=46.16..46.16 rows=226 width=8)
>          ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>                Group Key: t2.id2, t2.c2
>                ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: One question about transformation ANY Sublinks into joins

От
Dilip Kumar
Дата:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Jul 21, 2016 at 9:53 PM, Robert Haas
<spandir="ltr"><<a href="mailto:robertmhaas@gmail.com" target="_blank">robertmhaas@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Itwould need
tobe a Hash Semi Join rather than a Hash Join, wouldn't it?</blockquote></div><br />I guess, Hash Join will do
here, </div><divclass="gmail_extra">because inner hash node is, on hash aggregate with group key on <span
style="font-size:12.8px">t2.id2,t2.c2</span></div><div class="gmail_extra"><span style="font-size:12.8px">and hash join
conditionis </span><span style="font-size:12.8px">(t1.id1 = t2.id2) AND (t1.c1 = t2.c2).</span></div><div
class="gmail_extra"><spanstyle="font-size:12.8px"><br /></span></div><div class="gmail_extra"><span
style="font-size:12.8px">SoI think these together will make sure that we don't get duplicate tuple for one outer
record.</span></div><divclass="gmail_extra"><br /></div><div class="gmail_extra">-- <br /><div class="gmail_signature"
data-smartmail="gmail_signature"><divdir="ltr"><span style="color:rgb(80,0,80);font-size:12.8px">Regards,</span><br
style="color:rgb(80,0,80);font-size:12.8px"/><span style="color:rgb(80,0,80);font-size:12.8px">Dilip Kumar</span><br
style="color:rgb(80,0,80);font-size:12.8px"/><span style="color:rgb(80,0,80);font-size:12.8px">EnterpriseDB: </span><a
href="http://www.enterprisedb.com/"style="color:rgb(17,85,204);font-size:12.8px"
target="_blank">http://www.enterprisedb.com</a><br/></div></div></div></div> 

Re: One question about transformation ANY Sublinks into joins

От
"Armor"
Дата:
After we pull up this sublink as semi join , when make join rel for semi join, the optimizer will take hash join method into account if a unique path can be created with the RHS, for detail please check make_join_rel in src/backend/optimizer/path/joinrels.c. 
For this case, the cost of  hash join is cheaper than semi join, so you can see the planner chose the hash join rather than semi join.

------------------
Jerry Yu
https://github.com/scarbrofair
 


------------------ Original ------------------
From:  "Robert Haas";<robertmhaas@gmail.com>;
Date:  Fri, Jul 22, 2016 00:23 AM
To:  "Armor"<yupengstone@qq.com>;
Cc:  "pgsql-hackers"<pgsql-hackers@postgresql.org>;
Subject:  Re: [HACKERS] One question about transformation ANY Sublinks into joins

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:
> Hi
>     I run a simple SQL with latest PG:
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                          QUERY PLAN
> ------------------------------------------------------------
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>    Filter: (SubPlan 1)
>    SubPlan 1
>      ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>            Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>       Table "public.t1"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id1    | integer |
>  c1     | integer |
>
> postgres=# \d t2
>       Table "public.t2"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id2    | integer |
>  c2     | integer |
>
>      I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>      However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>    Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>    ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>    ->  Hash  (cost=46.16..46.16 rows=226 width=8)
>          ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>                Group Key: t2.id2, t2.c2
>                ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company