Обсуждение: One question about transformation ANY Sublinks into joins

Поиск

Список

Период

Сортировка

One question about transformation ANY Sublinks into joins

От

"Armor"

Дата:

17 июля 2016 г., 14:59:57

I run a simple SQL with latest PG：

postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);

QUERY PLAN

------------------------------------------------------------

Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)

Filter: (SubPlan 1)

SubPlan 1

-> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)

Filter: (t1.c1 = c2)

(5 rows)

and the table schema are as following:

postgres=# \d t1

Table "public.t1"

Column | Type | Modifiers

--------+---------+-----------

id1 | integer |

c1 | integer |

postgres=# \d t2

Table "public.t2"

Column | Type | Modifiers

--------+---------+-----------

id2 | integer |

c2 | integer |

I find PG decide not to pull up this sublink because the whereClauses in this sublink refer to the Vars of parent query, for detail please check the function named convert_ANY_sublink_to_join in src/backend/optimizer/plan/subselect.c.

However, for such simple sublink which has no agg, no window function, no limit, may be we can carefully pull up the predicates in whereCluase which refers to the Vars of parent query, then pull up this sublink and produce a query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where c1=c2);

QUERY PLAN

------------------------------------------------------------------------

Hash Join (cost=49.55..99.23 rows=565 width=8)

Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))

-> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)

-> Hash (cost=46.16..46.16 rows=226 width=8)

-> HashAggregate (cost=43.90..46.16 rows=226 width=8)

Group Key: t2.id2, t2.c2

-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

------------------

Jerry Yu

https://github.com/scarbrofair

Re: One question about transformation ANY Sublinks into joins

От

Robert Haas

Дата:

21 июля 2016 г., 19:23:16

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:
> Hi
>     I run a simple SQL with latest PG：
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                          QUERY PLAN
> ------------------------------------------------------------
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>    Filter: (SubPlan 1)
>    SubPlan 1
>      ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>            Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>       Table "public.t1"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id1    | integer |
>  c1     | integer |
>
> postgres=# \d t2
>       Table "public.t2"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id2    | integer |
>  c2     | integer |
>
>      I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>      However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>    Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>    ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>    ->  Hash  (cost=46.16..46.16 rows=226 width=8)
>          ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>                Group Key: t2.id2, t2.c2
>                ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One question about transformation ANY Sublinks into joins

От

Dilip Kumar

Дата:

21 июля 2016 г., 20:07:38

<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Jul 21, 2016 at 9:53 PM, Robert Haas
<spandir="ltr"><<a href="mailto:robertmhaas@gmail.com" target="_blank">robertmhaas@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Itwould need
tobe a Hash Semi Join rather than a Hash Join, wouldn't it?</blockquote></div><br />I guess, Hash Join will do
here, </div><divclass="gmail_extra">because inner hash node is, on hash aggregate with group key on <span
style="font-size:12.8px">t2.id2,t2.c2</span></div><div class="gmail_extra"><span style="font-size:12.8px">and hash join
conditionis </span><span style="font-size:12.8px">(t1.id1 = t2.id2) AND (t1.c1 = t2.c2).</span></div><div
class="gmail_extra"><spanstyle="font-size:12.8px"><br /></span></div><div class="gmail_extra"><span
style="font-size:12.8px">SoI think these together will make sure that we don't get duplicate tuple for one outer
record.</span></div><divclass="gmail_extra"><br /></div><div class="gmail_extra">-- <br /><div class="gmail_signature"
data-smartmail="gmail_signature"><divdir="ltr"><span style="color:rgb(80,0,80);font-size:12.8px">Regards,</span><br
style="color:rgb(80,0,80);font-size:12.8px"/><span style="color:rgb(80,0,80);font-size:12.8px">Dilip Kumar</span><br
style="color:rgb(80,0,80);font-size:12.8px"/><span style="color:rgb(80,0,80);font-size:12.8px">EnterpriseDB: </span><a
href="http://www.enterprisedb.com/"style="color:rgb(17,85,204);font-size:12.8px"
target="_blank">http://www.enterprisedb.com</a><br/></div></div></div></div>

Re: One question about transformation ANY Sublinks into joins

От

"Armor"

Дата:

23 июля 2016 г., 19:39:06

After we pull up this sublink as semi join , when make join rel for semi join, the optimizer will take hash join method into account if a unique path can be created with the RHS, for detail please check make_join_rel in src/backend/optimizer/path/joinrels.c.

For this case, the cost of hash join is cheaper than semi join, so you can see the planner chose the hash join rather than semi join.

------------------

Jerry Yu

https://github.com/scarbrofair

------------------ Original ------------------

From: "Robert Haas";<robertmhaas@gmail.com>;

Date: Fri, Jul 22, 2016 00:23 AM

To: "Armor"<yupengstone@qq.com>;

Cc: "pgsql-hackers"<pgsql-hackers@postgresql.org>;

Subject: Re: [HACKERS] One question about transformation ANY Sublinks into joins

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone@qq.com> wrote:
> Hi
>     I run a simple SQL with latest PG：
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                          QUERY PLAN
> ------------------------------------------------------------
> Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)
>    Filter: (SubPlan 1)
>    SubPlan 1
>      -> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)
>            Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>       Table "public.t1"
> Column | Type   | Modifiers
> --------+---------+-----------
> id1    | integer |
> c1     | integer |
>
> postgres=# \d t2
>       Table "public.t2"
> Column | Type   | Modifiers
> --------+---------+-----------
> id2    | integer |
> c2     | integer |
>
>      I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>      However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>                                QUERY PLAN
> ------------------------------------------------------------------------
> Hash Join (cost=49.55..99.23 rows=565 width=8)
>    Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>    -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
>    -> Hash (cost=46.16..46.16 rows=226 width=8)
>          -> HashAggregate (cost=43.90..46.16 rows=226 width=8)
>                Group Key: t2.id2, t2.c2
>                -> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: One question about transformation ANY Sublinks into joins

One question about transformation ANY Sublinks into joins

Re: One question about transformation ANY Sublinks into joins

Re: One question about transformation ANY Sublinks into joins

Re: One question about transformation ANY Sublinks into joins