Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Поиск
Список
Период
Сортировка
От Shigeru HANADA
Тема Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Дата
Msg-id 32143998-E0F8-49EB-B9BE-BF1401D1A7D2@gmail.com
обсуждение исходный текст
Ответ на Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers
Hi Ashutosh,

Thanks for the review.

2015/04/22 19:28、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール:
> Tests
> -------
> 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test
thesanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of
enable_hashjoin,enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end.
Thatway, we will also make sure that foreign plans are chosen irrespective of future planner changes. 

I have different, rather opposite opinion about it.  I disabled other join types as least as the tests pass, because I
worryoversights come from planner changes.  I hope to eliminate enable_foo from the test script, by improving costing
modelsmarter. 

> 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I
donot see those being replaced anywhere else. 

It’s accidental removal, I restored the tests about inheritance feature.

> 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push
conditionsin ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the
safeconditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed
downbut the unsafe-to-push conditions will be applied locally. 

Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit.  As you say, in
theory,INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be
evaluatedno local side against rows retrieved without those conditions. 

> 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates
wouldtest the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join
ft2on (ft1.c1 = ft2.c1). 

Added an aggregate case, and also added an UNION case for Append.

> 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are
ableto push down the largest possible foreign join tree to the foreign server.  
>





> Code
> -------
> In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the
callersof classifyConditions() have not changed. Is this change intentional? 

Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON
clause,list of Expr. 

> The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may
nothave noticed it. Because of this change, we might be missing some optimizations e.g. in function
postgresGetForeignPlan()
>  793         if (list_member_ptr(fpinfo->remote_conds, rinfo))
>  794             remote_conds = lappend(remote_conds, rinfo->clause);
>  795         else if (list_member_ptr(fpinfo->local_conds, rinfo))
>  796             local_exprs = lappend(local_exprs, rinfo->clause);
>  797         else if (is_foreign_expr(root, baserel, rinfo->clause))
>  798             remote_conds = lappend(remote_conds, rinfo->clause);
>  799         else
>  800             local_exprs = lappend(local_exprs, rinfo->clause);
> Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are
doingan extra call to is_foreign_expr(). 
>

Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr
accordingto the node tag. 

> The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw
anerror, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN. 

Agreed, fixed.

> While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being
honored,which means that we will end up locking the rows which are not part of the join result even when the join is
pushedto the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests) 
> contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
>
                                 
>
                                 
>                                              QUERY PLAN
                                 
>
                                 
>
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------
>  LockRows  (cost=100.00..124.66 rows=822 width=426)
>    Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5,
ft2.c6,ft2.c7, ft2.c8, ft1.*, ft2.* 
>    ->  Foreign Scan  (cost=100.00..116.44 rows=822 width=426)
>          Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4,
ft2.c5,ft2.c6, ft2.c7, ft2.c8, ft1.*, 
>  ft2.*
>          Relations: (public.ft1) INNER JOIN (public.ft2)
>          Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6,
r.a7,r.a8, r.a9 FROM (SELECT l.a 
> 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM
(SELECT"C 1" a10, c2 a11, c3 a12 
> , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER
JOIN(SELECT r.a9, r.a10, r.a12,  
> r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9,
c2a10, c3 a12, c4 a13, c5 a14, c6 
>  a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
> (6 rows)
> It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to
theforeign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As
perthe postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to
theouter query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are
partof join. This is going to increase probability of deadlocks, if the join is between a big table and small table
wherebig table is being used in many queries and the join is going to have only a single row in the result. 
>



> Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function
prologue.
>

Oops, replaced with the description of prefix.


> The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like
that.

Fixed.

>
> In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals,
> 3014     if (IS_OUTER_JOIN(jointype))
> 3015     {
> 3016         extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
> 3017     }
> 3018     else
> 3019     {
> 3020         joinclauses = extract_actual_clauses(joinclauses, false);
> 3021         otherclauses = NIL;
> 3022     }
> we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other
clausesand they will be returned as other clauses, which is fine. Also, the following condition 
> 3050     /*
> 3051      * Other condition for the join must be safe to push down.
> 3052      */
> 3053     foreach(lc, otherclauses)
> 3054     {
> 3055         Expr *expr = (Expr *) lfirst(lc);
> 3056
> 3057         if (!is_foreign_expr(root, joinrel, expr))
> 3058         {
> 3059             ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
> 3060             return;
> 3061         }
> 3062     }
> is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after
obtainingthe join result from the foreign server. The join quals are all needed to be safe to push down, since they
decidewhich rows will contain NULL inner side in an OUTER join.  

I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking
aboutonly the result of filtering. 

IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all
conditionsof typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like
CROSSJOIN + WHERE filter.  In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with
jointype, but your suggestion makes it looks like this: 

fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;
  QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------Foreign Scan  (cost=100.00..101.00 rows=50 width=716)  Output: b.bid, b.bbalance, b.filler, t.tid,
t.bid,t.tbalance, t.filler  Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t)  Remote SQL:
SELECTl.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10,
fillera11 FROM public.pgbench_branches) l)l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid
a9,bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE 
((l.a1 = r.a2))
(4 rows)

Thoughts?

Regards,
--
Shigeru HANADA
shigeru.hanada@gmail.com







В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dean Rasheed
Дата:
Сообщение: Re: [committers] pgsql: RLS fixes, new hooks, and new test module
Следующее
От: Teodor Sigaev
Дата:
Сообщение: Bug in planner