Re: Join push-down support for foreign tables

Поиск
Список
Период
Сортировка
От Shigeru Hanada
Тема Re: Join push-down support for foreign tables
Дата
Msg-id CAEZqfEcQbb74r0gWefCaS_QMCN6wpPoPibtNxzb+dHxQnGMXVQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Join push-down support for foreign tables  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Список pgsql-hackers
Here is the v5 patch of Join push-down support for foreign tables.

Changes since v4:

- Separete remote conditions into ON and WHERE, per Ashutosh.
- Add regression test cases for foreign join.
- Don't skip reversed relation combination in OUTER join cases.

I'm now working on two issues from Kaigai-san and Ashutosu, whole-row
reference handling and use of get_joinrel_parampathinfo().

2015-03-05 22:00 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Hi Ashutosh, thanks for the review.
>
> 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
>> In create_foreignscan_path() we have lines like -
>> 1587     pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
>> 1588
>> required_outer);
>> Now, that the same function is being used for creating foreign scan paths
>> for joins, we should be calling get_joinrel_parampathinfo() on a join rel
>> and get_baserel_parampathinfo() on base rel.
>
> Got it.  Please let me check the difference.
>
>>
>> The patch seems to handle all the restriction clauses in the same way. There
>> are two kinds of restriction clauses - a. join quals (specified using ON
>> clause; optimizer might move them to the other class if that doesn't affect
>> correctness) and b. quals on join relation (specified in the WHERE clause,
>> optimizer might move them to the other class if that doesn't affect
>> correctness). The quals in "a" are applied while the join is being computed
>> whereas those in "b" are applied after the join is computed. For example,
>> postgres=# select * from lt;
>>  val | val2
>> -----+------
>>    1 |    2
>>    1 |    3
>> (2 rows)
>>
>> postgres=# select * from lt2;
>>  val | val2
>> -----+------
>>    1 |    2
>> (1 row)
>>
>> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
>>  val | val2 | val | val2
>> -----+------+-----+------
>>    1 |    2 |   1 |    2
>>    1 |    3 |     |
>> (2 rows)
>>
>> postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
>> lt2.val2);
>>  val | val2 | val | val2
>> -----+------+-----+------
>>    1 |    2 |   1 |    2
>> (1 row)
>>
>> The difference between these two kinds is evident in case of outer joins,
>> for inner join optimizer puts all of them in class "b". The remote query
>> sent to the foreign server has all those in ON clause. Consider foreign
>> tables ft1 and ft2 pointing to local tables on the same server.
>> postgres=# \d ft1
>>          Foreign table "public.ft1"
>>  Column |  Type   | Modifiers | FDW Options
>> --------+---------+-----------+-------------
>>  val    | integer |           |
>>  val2   | integer |           |
>> Server: loopback
>> FDW Options: (table_name 'lt')
>>
>> postgres=# \d ft2
>>          Foreign table "public.ft2"
>>  Column |  Type   | Modifiers | FDW Options
>> --------+---------+-----------+-------------
>>  val    | integer |           |
>>  val2   | integer |           |
>> Server: loopback
>> FDW Options: (table_name 'lt2')
>>
>> postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
>> ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>>
>> QUERY PLAN
>>
>>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------
>>  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
>>    Output: val, val2, val, val2
>>    Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
>> public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
>> _0, a_1)  ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
>> l.a_1))
>> (3 rows)
>>
>> The result is then wrong
>> postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
>> ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>>  val | val2 | val | val2
>> -----+------+-----+------
>>    1 |    2 |     |
>>    1 |    3 |     |
>> (2 rows)
>>
>> which should match the result obtained by substituting local tables for
>> foreign ones
>> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
>> lt.val + lt2.val > lt.val2 or lt2.val is null;
>>  val | val2 | val | val2
>> -----+------+-----+------
>>    1 |    3 |     |
>> (1 row)
>>
>> Once we start distinguishing the two kinds of quals, there is some
>> optimization possible. For pushing down a join it's essential that all the
>> quals in "a" are safe to be pushed down. But a join can be pushed down, even
>> if quals in "a" are not safe to be pushed down. But more clauses one pushed
>> down to foreign server, lesser are the rows fetched from the foreign server.
>> In postgresGetForeignJoinPath, instead of checking all the restriction
>> clauses to be safe to be pushed down, we need to check only those which are
>> join quals (class "a").
>
> The argument restrictlist of GetForeignJoinPaths contains both
> conditions mixed, so I added extract_actual_join_clauses() to separate
> it into two lists, join_quals and other clauses.  This is similar to
> what create_nestloop_plan and siblings do.
>
>
>>
>> Following EXPLAIN output seems to be confusing
>> ft1 and ft2 both are pointing to same lt on a foreign server.
>> postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
>> ft1.val + ft1.val2 = ft2.val;
>>
>> QUERY PLAN
>>
>>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>> --------------------------
>>  Foreign Scan  (cost=100.00..132.00 rows=2560 width=8)
>>    Output: (val + val2)
>>    Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM public.lt) l
>> (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)  ON ((
>> (r.a_0 + r.a_1) = l.a_0))
>>
>> Output just specified val + val2, it doesn't tell, where those val and val2
>> come from, neither it's evident from the rest of the context.
>>
>
> Actually val and val2 come from public.lt in "r" side, but as you say
> it's too difficult to know that from EXPLAIN output.  Do you have any
> idea to make the "Output" item more readable?
>
> --
> Shigeru HANADA



--
Shigeru HANADA

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Strange assertion using VACOPT_FREEZE in vacuum.c
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Strange assertion using VACOPT_FREEZE in vacuum.c