Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Дата
Msg-id CAFjFpRc8GQwLNQago-zmJKdqHDXh+C0XXZCaJ4ihyZJ+dB0_1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Shigeru HANADA <shigeru.hanada@gmail.com>)
Список pgsql-hackers


On Fri, Apr 24, 2015 at 3:08 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
Hi Ashutosh,

Thanks for the review.

2015/04/22 19:28、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール:
> Tests
> -------
> 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test the sanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. That way, we will also make sure that foreign plans are chosen irrespective of future planner changes.

I have different, rather opposite opinion about it.  I disabled other join types as least as the tests pass, because I worry oversights come from planner changes.  I hope to eliminate enable_foo from the test script, by improving costing model smarter.

Ok, if you can do that, that will be excellent.
 

> 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I do not see those being replaced anywhere else.

It’s accidental removal, I restored the tests about inheritance feature.

Thanks.
 

> 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditions in ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safe conditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed down but the unsafe-to-push conditions will be applied locally.

Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit.  As you say, in theory, INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be evaluated no local side against rows retrieved without those conditions.

> 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates would test the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).

Added an aggregate case, and also added an UNION case for Append.

Thanks.
 

> 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are able to push down the largest possible foreign join tree to the foreign server.
>


Are you planning to do anything on this point?
 


> Code
> -------
> In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callers of classifyConditions() have not changed. Is this change intentional?

Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON clause, list of Expr.

> The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may not have noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan()
>  793         if (list_member_ptr(fpinfo->remote_conds, rinfo))
>  794             remote_conds = lappend(remote_conds, rinfo->clause);
>  795         else if (list_member_ptr(fpinfo->local_conds, rinfo))
>  796             local_exprs = lappend(local_exprs, rinfo->clause);
>  797         else if (is_foreign_expr(root, baserel, rinfo->clause))
>  798             remote_conds = lappend(remote_conds, rinfo->clause);
>  799         else
>  800             local_exprs = lappend(local_exprs, rinfo->clause);
> Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doing an extra call to is_foreign_expr().
>

Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr according to the node tag.

Thanks.
 

> The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw an error, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN.

Agreed, fixed.
Thanks.
 

> While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored, which means that we will end up locking the rows which are not part of the join result even when the join is pushed to the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests)
> contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
>
>
>                                              QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------
>  LockRows  (cost=100.00..124.66 rows=822 width=426)
>    Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
>    ->  Foreign Scan  (cost=100.00..116.44 rows=822 width=426)
>          Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
>  ft2.*
>          Relations: (public.ft1) INNER JOIN (public.ft2)
>          Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
> 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
> , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12,
> r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
>  a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
> (6 rows)
> It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to the foreign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As per the postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to the outer query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are part of join. This is going to increase probability of deadlocks, if the join is between a big table and small table where big table is being used in many queries and the join is going to have only a single row in the result.
>


Are you planning to do anything about this point?
 


> Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue.
>

Oops, replaced with the description of prefix.


> The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that.

Fixed.

Thanks.
 

>
> In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals,
> 3014     if (IS_OUTER_JOIN(jointype))
> 3015     {
> 3016         extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
> 3017     }
> 3018     else
> 3019     {
> 3020         joinclauses = extract_actual_clauses(joinclauses, false);
> 3021         otherclauses = NIL;
> 3022     }
> we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clauses and they will be returned as other clauses, which is fine. Also, the following condition
> 3050     /*
> 3051      * Other condition for the join must be safe to push down.
> 3052      */
> 3053     foreach(lc, otherclauses)
> 3054     {
> 3055         Expr *expr = (Expr *) lfirst(lc);
> 3056
> 3057         if (!is_foreign_expr(root, joinrel, expr))
> 3058         {
> 3059             ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
> 3060             return;
> 3061         }
> 3062     }
> is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtaining the join result from the foreign server. The join quals are all needed to be safe to push down, since they decide which rows will contain NULL inner side in an OUTER join.

I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking about only the result of filtering.

IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all conditions of typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like CROSS JOIN + WHERE filter.  In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with join type, but your suggestion makes it looks like this:

fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;

   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------
 Foreign Scan  (cost=100.00..101.00 rows=50 width=716)
   Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler
   Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t)
   Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11 FROM public.pgbench_branches) l)
 l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE
((l.a1 = r.a2))
(4 rows)

Thoughts?

It does hamper readability a bit. But it explicitly shows, how do we want to treat the join. We can leave this to the committers though.
 

Regards,
--
Shigeru HANADA
shigeru.hanada@gmail.com







--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Add transforms feature
Следующее
От: Pavel Stehule
Дата:
Сообщение: Faster setup_param_list() in plpgsql