Обсуждение: Oddity in EXPLAIN for foreign/custom join pushdown plans

Поиск
Список
Период
Сортировка

Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
Hi,

I noticed that currently the core doesn't show any information on the
target relations involved in a foreign/custom join in EXPLAIN, by
itself.  Here is an example:

-- join two tables
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY
t1.c3, t1.c1 OFFSET 100 LIMIT 10;
                      QUERY PLAN           \


-------------------------------------------------------------------------------------------------------------------------------------------------------\
-----------------------------------
    Limit
      Output: t1.c1, t2.c1, t1.c3
      ->  Foreign Scan
            Output: t1.c1, t2.c1, t1.c3
            Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
            Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T
1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3 ASC N\
ULLS LAST, r1."C 1" ASC NULLS LAST
(6 rows)

postgres_fdw shows the target relations in the Relations line, as shown
above, but I think that the core should show such information
independently of FDWs; in the above example replace "Foreign Scan" with
"Foreign Join on public.ft1 t1, public.ft2 t2".  Please find attached a
patch for that.  Comments welcome!

Best regards,
Etsuro Fujita

Вложения

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Ashutosh Bapat
Дата:


On Wed, Jul 27, 2016 at 8:50 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
Hi,

I noticed that currently the core doesn't show any information on the target relations involved in a foreign/custom join in EXPLAIN, by itself.  Here is an example:

-- join two tables
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
                     QUERY PLAN           \

-------------------------------------------------------------------------------------------------------------------------------------------------------\
-----------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3
     ->  Foreign Scan
           Output: t1.c1, t2.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
           Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC N\
ULLS LAST, r1."C 1" ASC NULLS LAST
(6 rows)

postgres_fdw shows the target relations in the Relations line, as shown above, but I think that the core should show such information independently of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on public.ft1 t1, public.ft2 t2".  Please find attached a patch for that.  Comments welcome!


The patch always prints ForeignJoin when scanrelid <= 0, which would be odd considering that FDWs can now push down post-join operations. We need to device a better way to convey post-join operations. May be something like
Foreign Grouping, aggregation on ...
Foreign Join on ...

But then the question is a foreign scan node can be pushing down many operations together e.g. join, aggregation, sort OR join aggregation and windowing OR join and insert. How would we clearly convey this? May be we say
Foreign Scan
operations: join on ..., aggregation, ...

That wouldn't be so great and might be clumsy for many operations. Any better idea?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> I noticed that currently the core doesn't show any information on the
> target relations involved in a foreign/custom join in EXPLAIN, by
> itself.  Here is an example:
>
> -- join two tables
> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY
> t1.c3, t1.c1 OFFSET 100 LIMIT 10;
>                       QUERY PLAN           \
>
> ----------------------------------------------------------------------------------
> ---------------------------------------------------------------------\
> -----------------------------------
>     Limit
>       Output: t1.c1, t2.c1, t1.c3
>       ->  Foreign Scan
>             Output: t1.c1, t2.c1, t1.c3
>             Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>             Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T
> 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
> r1.c3 ASC N\
> ULLS LAST, r1."C 1" ASC NULLS LAST
> (6 rows)
>
> postgres_fdw shows the target relations in the Relations line, as shown
> above, but I think that the core should show such information
> independently of FDWs; in the above example replace "Foreign Scan" with
> "Foreign Join on public.ft1 t1, public.ft2 t2".  Please find attached a
> patch for that.  Comments welcome!
>
This output is, at least, not incorrect.
This ForeignScan actually scan a relation that consists of two joined
tables on the remote side. So, not incorrect, but may not convenient for
better understanding by users who don't have deep internal knowledge.

On the other hands, I cannot be happy with your patch because it concludes
the role of ForeignScan/CustomScan with scanrelid==0 is always join.
However, it does not cover all the case.

When postgres_fdw gets support of remote partial aggregate, we can implement
the feature using ForeignScan with scanrelid==0. Is it a join? No.

Probably, the core cannot know exact purpose of ForeignScan/CustomScan with
scanrelid==0. It can be a remote partial aggregation. It can be an alternative
sort logic by GPU. It can be something others.
So, I think extension needs to inform the core more proper label to show;
including a flag to control whether the relation(s) name shall be attached
next to the ForeignScan/CustomScan line.

I'd like you to suggest to add two fields below:- An alternative label extension wants to show instead of the default
one-A flag to turn on/off printing relation(s) name 

EXPLAIN can print proper information according to these requirements.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/27 13:09, Ashutosh Bapat wrote:
> The patch always prints ForeignJoin when scanrelid <= 0, which would be
> odd considering that FDWs can now push down post-join operations. We
> need to device a better way to convey post-join operations. May be
> something like
> Foreign Grouping, aggregation on ...
> Foreign Join on ...

Good point!

The point of the proposed patch is to print "Foreign Join"/"Custom 
Join", to show, in every text/xml/json/yaml format, that there are 
multiple target relations involved in that join and to print those 
relations, following the "Foreign Join"/"Custom Join" words, which I 
think would be more machine-readable.

> But then the question is a foreign scan node can be pushing down many
> operations together e.g. join, aggregation, sort OR join aggregation and
> windowing OR join and insert. How would we clearly convey this? May be
> we say
> Foreign Scan
> operations: join on ..., aggregation, ...
>
> That wouldn't be so great and might be clumsy for many operations. Any
> better idea?

How about: when doing post scan/join operations remotely, print such 
additional operations in EXPLAIN the same way as in the "Relations" or 
"Remote SQL" info printed by postgres_fdw.  Maybe something like this:
  Foreign Scan/Join on target relation(s)    Output: ...    Relations: ...    Additional Operations: grouping,
aggregate,...    Remote SQL: ...
 

That seems not that clumsy to me.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/27 13:51, Kouhei Kaigai wrote:
> This output is, at least, not incorrect.
> This ForeignScan actually scan a relation that consists of two joined
> tables on the remote side. So, not incorrect, but may not convenient for
> better understanding by users who don't have deep internal knowledge.

Hmm.

> On the other hands, I cannot be happy with your patch because it concludes
> the role of ForeignScan/CustomScan with scanrelid==0 is always join.
> However, it does not cover all the case.
>
> When postgres_fdw gets support of remote partial aggregate, we can implement
> the feature using ForeignScan with scanrelid==0. Is it a join? No.

Yeah, I think that that could be implemented in both cases (scanrelid>0  
and scanrelid=0).

> Probably, the core cannot know exact purpose of ForeignScan/CustomScan with
> scanrelid==0. It can be a remote partial aggregation. It can be an alternative
> sort logic by GPU. It can be something others.
> So, I think extension needs to inform the core more proper label to show;
> including a flag to control whether the relation(s) name shall be attached
> next to the ForeignScan/CustomScan line.
>
> I'd like you to suggest to add two fields below:
>  - An alternative label extension wants to show instead of the default one

How about adding something like the "Additional Operations" line as  
proposed in a previous email, instead?  As you know, FDWs/Extensions  
could add such information through ExplainForeignScan/ExplainCustomScan.

>  - A flag to turn on/off printing relation(s) name

ISTM that the relation information should be always printed even in the  
case of scanrelid=0 by the core, because that would be essential for  
analyzing EXPLAIN results.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> On 2016/07/27 13:51, Kouhei Kaigai wrote:
> > This output is, at least, not incorrect.
> > This ForeignScan actually scan a relation that consists of two joined
> > tables on the remote side. So, not incorrect, but may not convenient for
> > better understanding by users who don't have deep internal knowledge.
>
> Hmm.
>
> > On the other hands, I cannot be happy with your patch because it concludes
> > the role of ForeignScan/CustomScan with scanrelid==0 is always join.
> > However, it does not cover all the case.
> >
> > When postgres_fdw gets support of remote partial aggregate, we can implement
> > the feature using ForeignScan with scanrelid==0. Is it a join? No.
>
> Yeah, I think that that could be implemented in both cases (scanrelid>0
> and scanrelid=0).
>
> > Probably, the core cannot know exact purpose of ForeignScan/CustomScan with
> > scanrelid==0. It can be a remote partial aggregation. It can be an alternative
> > sort logic by GPU. It can be something others.
> > So, I think extension needs to inform the core more proper label to show;
> > including a flag to control whether the relation(s) name shall be attached
> > next to the ForeignScan/CustomScan line.
> >
> > I'd like you to suggest to add two fields below:
> >  - An alternative label extension wants to show instead of the default one
>
> How about adding something like the "Additional Operations" line as
> proposed in a previous email, instead?  As you know, FDWs/Extensions
> could add such information through ExplainForeignScan/ExplainCustomScan.
>
No. What I'm saying is here:


EXPLAIN (COSTS false, VERBOSE)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY
t1.c3, t1.c1 OFFSET 100 LIMIT 10;                     QUERY PLAN
-------------------------------------------   Limit     Output: t1.c1, t2.c1, t1.c3     ->  Foreign Scan
^^^^           Output: t1.c1, t2.c1, t1.c3           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
RemoteSQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T   
1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3 ASC N\
ULLS LAST, r1."C 1" ASC NULLS LAST
(6 rows)


Your concern is that EXPLAIN shows ForeignScan node as "Foreign Scan"
although it actually works as "Foreign Join".

My suggestion makes EXPLAIN print "Foreign %s" according to the label
assigned by the extension. Only FDW driver knows how this ForeignScan
node actually performs on, thus, only FDW driver can set meaningful label.

This label may be "Join", may be "Partial Aggregate + Join", or something
others according to the actual job of this node.

> >  - A flag to turn on/off printing relation(s) name
>
> ISTM that the relation information should be always printed even in the
> case of scanrelid=0 by the core, because that would be essential for
> analyzing EXPLAIN results.
>
We have no information which relations are actually scanned by ForeignScan
and CustomScan. Your patch abuses fs_relids and custom_relids, however,
these fields don't indicate the relations actually scan by these nodes.
It just tracks relations processed by this node or its children.

See the example below. This GpuJoin on CustomScan takes three SeqScan nodes
as inner/outer source of relations joins. GpuJoin never scans the tables
actually. It picks up the records generated by underlying SeqScan nodes.


postgres=# explain select id from t0 natural join t1 natural join t2;                               QUERY PLAN
---------------------------------------------------------------------------Custom Scan (GpuJoin)
(cost=12385.67..291245.35rows=9815465 width=4)  GPU Projection: t0.id  Depth 1: GpuHashJoin, HashKeys: (t0.bid)
 JoinQuals: (t0.bid = t2.bid)           Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)  Depth 2:
GpuHashJoin,HashKeys: (t0.aid)           JoinQuals: (t0.aid = t1.aid)           Nrows (in/out: 100.00%), KDS-Hash
(size:13.47MB, nbatches: 1)  ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)  ->  Seq Scan on t2
(cost=0.00..1935.00rows=100000 width=4)  ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4) 
(11 rows)


If EXPLAIN command attaches something like "on t0,t1,t2" after the GpuJoin
*with no choice*, it is problematic and misleading.
It shall be controllable by the extension which knows what tables are
actually scanned. (Please note that I never says extension makes the string.
It is a job of core explain. I suggest it needs to be controllable.)


Even though I recommended a boolean flag to turn on/off this print, it may
need to be a bitmap to indicate which underlying relations should be printed
by the explain command. Because we can easily assume ForeignScan/CustomScan
node that scans only a part of child tables. For example, it is available to
implement GpuJoin scans the t1 and t2 by itself but takes SeqScan on t0.

> >  - A flag to turn on/off printing relation(s) name

So, my second suggestion shall be adjusted as follows:
- A bitmap to indicate which relation(s) name shall be printed by the core explain command.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/28 10:01, Kouhei Kaigai wrote:
> What I'm saying is here:

> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY
> t1.c3, t1.c1 OFFSET 100 LIMIT 10;
>                       QUERY PLAN
> -------------------------------------------
>     Limit
>       Output: t1.c1, t2.c1, t1.c3
>       ->  Foreign Scan
>                   ^^^^
>             Output: t1.c1, t2.c1, t1.c3
>             Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>             Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T
> 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
> r1.c3 ASC N\
> ULLS LAST, r1."C 1" ASC NULLS LAST
> (6 rows)

> Your concern is that EXPLAIN shows ForeignScan node as "Foreign Scan"
> although it actually works as "Foreign Join".

That may be so, but my point is that the target relations involved in  
the foreign join (ie, ft1 and ft2) should be printed somewhere in the  
EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.

> My suggestion makes EXPLAIN print "Foreign %s" according to the label
> assigned by the extension. Only FDW driver knows how this ForeignScan
> node actually performs on, thus, only FDW driver can set meaningful label.
>
> This label may be "Join", may be "Partial Aggregate + Join", or something
> others according to the actual job of this node.

OK, thanks for that.  But sorry, I'm not sure that's a good idea.  One  
reason for that is that FDWs/extentions might give different labels to  
the same upper-planner action, which would lead to confusing EXPLAINs.  
Another is that labeling might be annoying, so some FDWs/extensions may  
not want to do that.

Couldn't we print EXPLAINs in a more unified way?  A simple idea is to  
introduce a broad label, eg, "Foreign Processing", as a unified label;  
if the FDW/extension performs any upper-planner actions remotely, then  
the label "Foreign Processing" will be printed by core, and following  
that, the FDW/extension would print what they want, using  
ExplainForeignScan/ExplainCustomScan.  Probably something like this:
  Foreign Processing    Remote Operations: ...

In the Remote Operations line, the FDW/extension could print any info  
about remote operations, eg, "Scan/Join + Aggregate".  Different data  
sources have different concepts/terms, so it seems reasonable to me that  
there would be different descriptions by different data sources, to the  
same plan actions done remotely.

>>>  - A flag to turn on/off printing relation(s) name
>>
>> ISTM that the relation information should be always printed even in the
>> case of scanrelid=0 by the core, because that would be essential for
>> analyzing EXPLAIN results.

> We have no information which relations are actually scanned by ForeignScan
> and CustomScan. Your patch abuses fs_relids and custom_relids, however,
> these fields don't indicate the relations actually scan by these nodes.
> It just tracks relations processed by this node or its children.

Maybe my explanation was not enough, but I just mean that *joining  
relations* should be printed somewhere in the EXPLAIN output for a  
foreign/custom join as well, by core.

> postgres=# explain select id from t0 natural join t1 natural join t2;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
>    GPU Projection: t0.id
>    Depth 1: GpuHashJoin, HashKeys: (t0.bid)
>             JoinQuals: (t0.bid = t2.bid)
>             Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
>    Depth 2: GpuHashJoin, HashKeys: (t0.aid)
>             JoinQuals: (t0.aid = t1.aid)
>             Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
>    ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
>    ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
>    ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> (11 rows)

> If EXPLAIN command attaches something like "on t0,t1,t2" after the GpuJoin
> *with no choice*, it is problematic and misleading.
> It shall be controllable by the extension which knows what tables are
> actually scanned. (Please note that I never says extension makes the string.
> It is a job of core explain. I suggest it needs to be controllable.)

Again, I just mean that the info "on t0,t1,t2" should be printed after  
the GpuJoin label, as its joining relations.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> On 2016/07/28 10:01, Kouhei Kaigai wrote:
> > What I'm saying is here:
>
> > EXPLAIN (COSTS false, VERBOSE)
> > SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY
> > t1.c3, t1.c1 OFFSET 100 LIMIT 10;
> >                       QUERY PLAN
> > -------------------------------------------
> >     Limit
> >       Output: t1.c1, t2.c1, t1.c3
> >       ->  Foreign Scan
> >                   ^^^^
> >             Output: t1.c1, t2.c1, t1.c3
> >             Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
> >             Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T
> > 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
> > r1.c3 ASC N\
> > ULLS LAST, r1."C 1" ASC NULLS LAST
> > (6 rows)
>
> > Your concern is that EXPLAIN shows ForeignScan node as "Foreign Scan"
> > although it actually works as "Foreign Join".
>
> That may be so, but my point is that the target relations involved in
> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.
>
Why? According to your rule, Hash Join should take "on t0,t1,t2".

postgres=# explain select id from t0 natural join t1 natural join t2;                                QUERY PLAN
-----------------------------------------------------------------------------Hash Join  (cost=6370.00..4560826.24
rows=98784048width=4)  Hash Cond: (t0.aid = t1.aid)  ->  Hash Join  (cost=3185.00..3199360.58 rows=98784048 width=8)
   Hash Cond: (t0.bid = t2.bid)        ->  Seq Scan on t0  (cost=0.00..1833334.80 rows=100000080 width=12)        ->
Hash (cost=1935.00..1935.00 rows=100000 width=4)              ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000
width=4) ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)        ->  Seq Scan on t1  (cost=0.00..1935.00
rows=100000width=4) 
(9 rows)

> > My suggestion makes EXPLAIN print "Foreign %s" according to the label
> > assigned by the extension. Only FDW driver knows how this ForeignScan
> > node actually performs on, thus, only FDW driver can set meaningful label.
> >
> > This label may be "Join", may be "Partial Aggregate + Join", or something
> > others according to the actual job of this node.
>
> OK, thanks for that.  But sorry, I'm not sure that's a good idea.  One
> reason for that is that FDWs/extentions might give different labels to
> the same upper-planner action, which would lead to confusing EXPLAINs.
>
If extension put a label hard to understand, it is a problem of the extension.

> Another is that labeling might be annoying, so some FDWs/extensions may
> not want to do that.
>
I'm happy with arbitrary label, not annoying. :-)

> Couldn't we print EXPLAINs in a more unified way?  A simple idea is to
> introduce a broad label, eg, "Foreign Processing", as a unified label;
> if the FDW/extension performs any upper-planner actions remotely, then
> the label "Foreign Processing" will be printed by core, and following
> that, the FDW/extension would print what they want, using
> ExplainForeignScan/ExplainCustomScan.  Probably something like this:
>
>    Foreign Processing
>      Remote Operations: ...
>
> In the Remote Operations line, the FDW/extension could print any info
> about remote operations, eg, "Scan/Join + Aggregate".  Different data
> sources have different concepts/terms, so it seems reasonable to me that
> there would be different descriptions by different data sources, to the
> same plan actions done remotely.
>
"Foreign" implies this node is processed by FDW, but "Procesing" gives us
no extra information; seems to me redundant.
Prior to the new invention, please explain why you don't want to by my
suggestion first? Annoying is a feel of you, but not a logic to persuade
others.

> >>>  - A flag to turn on/off printing relation(s) name
> >>
> >> ISTM that the relation information should be always printed even in the
> >> case of scanrelid=0 by the core, because that would be essential for
> >> analyzing EXPLAIN results.
>
> > We have no information which relations are actually scanned by ForeignScan
> > and CustomScan. Your patch abuses fs_relids and custom_relids, however,
> > these fields don't indicate the relations actually scan by these nodes.
> > It just tracks relations processed by this node or its children.
>
> Maybe my explanation was not enough, but I just mean that *joining
> relations* should be printed somewhere in the EXPLAIN output for a
> foreign/custom join as well, by core.
>
> > postgres=# explain select id from t0 natural join t1 natural join t2;
> >                                 QUERY PLAN
> > ---------------------------------------------------------------------------
> >  Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
> >    GPU Projection: t0.id
> >    Depth 1: GpuHashJoin, HashKeys: (t0.bid)
> >             JoinQuals: (t0.bid = t2.bid)
> >             Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
> >    Depth 2: GpuHashJoin, HashKeys: (t0.aid)
> >             JoinQuals: (t0.aid = t1.aid)
> >             Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
> >    ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
> >    ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
> >    ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> > (11 rows)
>
> > If EXPLAIN command attaches something like "on t0,t1,t2" after the GpuJoin
> > *with no choice*, it is problematic and misleading.
> > It shall be controllable by the extension which knows what tables are
> > actually scanned. (Please note that I never says extension makes the string.
> > It is a job of core explain. I suggest it needs to be controllable.)
>
> Again, I just mean that the info "on t0,t1,t2" should be printed after
> the GpuJoin label, as its joining relations.
>
I don't want. Your rule does not match existing EXPLAIN output convention.
Please see an EXPLAIN output that involves only base relations.

My largest concern for you proposition is, ForeignScan/CustomScan node is
enforced to print name of underlying relations, regardless of its actual
behavior. The above GpuJoin never scans tables at least, thus, it mislead
users if we have no choice to print underlying relation names.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/28 22:11, Kouhei Kaigai wrote:

I wrote:
>> That may be so, but my point is that the target relations involved in
>> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
>> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.

> Why? According to your rule, Hash Join should take "on t0,t1,t2".
>
> postgres=# explain select id from t0 natural join t1 natural join t2;
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
>  Hash Join  (cost=6370.00..4560826.24 rows=98784048 width=4)
>    Hash Cond: (t0.aid = t1.aid)
>    ->  Hash Join  (cost=3185.00..3199360.58 rows=98784048 width=8)
>          Hash Cond: (t0.bid = t2.bid)
>          ->  Seq Scan on t0  (cost=0.00..1833334.80 rows=100000080 width=12)
>          ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
>                ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
>    ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
>          ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> (9 rows)

I don't think it needs "on t0,t1,t2", because we can see joining  
relations from inner/outer plans in that case.  In a foreign-join case,  
however, we can't see such relations from the EXPLAIN printed *by core*.   postgres_fdw avoids this issue by adding
suchrelations to the EXPLAIN  
 
using ExplainForeignScan as shown in the below example, but since such  
relations are essential, I think that information should be shown by  
core itself.

postgres=# explain select * from (select ft1.a from ft1 left join ft2 on  
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3  
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;                           QUERY PLAN
---------------------------------------------------------------- Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
HashCond: (ft1.a = ft3.a)   ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)         Relations: (public.ft1) LEFT
JOIN(public.ft2)   ->  Hash  (cost=102.05..102.05 rows=1 width=4)         ->  Foreign Scan  (cost=100.00..102.05 rows=1
width=4)              Relations: (public.ft3) LEFT JOIN (public.ft4)
 
(7 rows)
From the Relations line shown by postgres_fdw, we can see which foreign  
join joins which foreign tables, but if no such lines, we couldn't.

I wrote:
>> Probably something like this:
>>
>>    Foreign Processing
>>      Remote Operations: ...
>>
>> In the Remote Operations line, the FDW/extension could print any info
>> about remote operations, eg, "Scan/Join + Aggregate".

> "Foreign" implies this node is processed by FDW, but "Procesing" gives us
> no extra information; seems to me redundant.

I intentionally chose that word and thought we could leave detailed  
descriptions about remote operations to the FDW/extension; a broader  
word like "Processing" seems to work well because we allow various kinds  
of operations to the remote side, in addition to scans/joins, to be  
performed in that one Foreign Scan node indicated by "Foreign  
Processing", such as aggregation, window functions, distinct, order by,  
row locking, table modification, or combinations of them.

> Prior to the new invention, please explain why you don't want to by my
> suggestion first? Annoying is a feel of you, but not a logic to persuade
> others.

I'm not saying that the idea I proposed is better than your suggestion.   Just brain storming.  I want to know what
optionswe have and the pros  
 
and cons of each approach.

>>> postgres=# explain select id from t0 natural join t1 natural join t2;
>>>                                 QUERY PLAN
>>> ---------------------------------------------------------------------------
>>>  Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
>>>    GPU Projection: t0.id
>>>    Depth 1: GpuHashJoin, HashKeys: (t0.bid)
>>>             JoinQuals: (t0.bid = t2.bid)
>>>             Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>>    Depth 2: GpuHashJoin, HashKeys: (t0.aid)
>>>             JoinQuals: (t0.aid = t1.aid)
>>>             Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>>    ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
>>>    ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
>>>    ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
>>> (11 rows)

> My largest concern for you proposition is, ForeignScan/CustomScan node is
> enforced to print name of underlying relations, regardless of its actual
> behavior. The above GpuJoin never scans tables at least, thus, it mislead
> users if we have no choice to print underlying relation names.

OK, I understand we would need special handling for such custom joins.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Ashutosh Bapat
Дата:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote"><span class=""></span><br /><span
class=""></span><br /><span class=""></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#cccsolid;padding-left:1ex"><span class=""> I wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"> Probably something like this:<br /><br />    Foreign Processing<br />
    Remote Operations: ...<br /><br /> In the Remote Operations line, the FDW/extension could print any info<br />
aboutremote operations, eg, "Scan/Join + Aggregate".<br /></blockquote></blockquote><br /></span><span
class=""><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
"Foreign"implies this node is processed by FDW, but "Procesing" gives us<br /> no extra information; seems to me
redundant.<br/></blockquote><br /></span> I intentionally chose that word and thought we could leave detailed
descriptionsabout remote operations to the FDW/extension; a broader word like "Processing" seems to work well because
weallow various kinds of operations to the remote side, in addition to scans/joins, to be performed in that one Foreign
Scannode indicated by "Foreign Processing", such as aggregation, window functions, distinct, order by, row locking,
tablemodification, or combinations of them.<span class=""><br /></span><br clear="all" /></blockquote></div>"Scan" is a
betterword than "Processing". From plan's perspective it's ultimately a Scan (on the data produced by the foreign
server)and not processing.<br /></div><div class="gmail_extra">-- <br /><div class="gmail_signature"
data-smartmail="gmail_signature"><divdir="ltr">Best Wishes,<br />Ashutosh Bapat<br />EnterpriseDB Corporation<br />The
PostgresDatabase Company<br /></div></div></div></div> 

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> I wrote:
> >> That may be so, but my point is that the target relations involved in
> >> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
> >> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.
>
> > Why? According to your rule, Hash Join should take "on t0,t1,t2".
> >
> > postgres=# explain select id from t0 natural join t1 natural join t2;
> >                                  QUERY PLAN
> > -----------------------------------------------------------------------------
> >  Hash Join  (cost=6370.00..4560826.24 rows=98784048 width=4)
> >    Hash Cond: (t0.aid = t1.aid)
> >    ->  Hash Join  (cost=3185.00..3199360.58 rows=98784048 width=8)
> >          Hash Cond: (t0.bid = t2.bid)
> >          ->  Seq Scan on t0  (cost=0.00..1833334.80 rows=100000080 width=12)
> >          ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
> >                ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
> >    ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
> >          ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> > (9 rows)
>
> I don't think it needs "on t0,t1,t2", because we can see joining
> relations from inner/outer plans in that case.  In a foreign-join case,
> however, we can't see such relations from the EXPLAIN printed *by core*.
>   postgres_fdw avoids this issue by adding such relations to the EXPLAIN
> using ExplainForeignScan as shown in the below example, but since such
> relations are essential, I think that information should be shown by
> core itself.
>
I never opposed to print the relation names by the core, however, it has
to be controllable by the extension which provides ForeignScan/CustomScan
because only extension can know how underlying relation shall be scanned
exactly.

> postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
> ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
> left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
>                             QUERY PLAN
> ----------------------------------------------------------------
>   Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
>     Hash Cond: (ft1.a = ft3.a)
>     ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>           Relations: (public.ft1) LEFT JOIN (public.ft2)
>     ->  Hash  (cost=102.05..102.05 rows=1 width=4)
>           ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>                 Relations: (public.ft3) LEFT JOIN (public.ft4)
> (7 rows)
>
>  From the Relations line shown by postgres_fdw, we can see which foreign
> join joins which foreign tables, but if no such lines, we couldn't.
>
In case of postgres_fdw, if extension can indicate the core EXPLAIN to
print all of its underlying relations, the core EXPLAIN routine will
print name of the relations. Here is no problem.

> >>> postgres=# explain select id from t0 natural join t1 natural join t2;
> >>>                                 QUERY PLAN
> >>> ---------------------------------------------------------------------------
> >>>  Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
> >>>    GPU Projection: t0.id
> >>>    Depth 1: GpuHashJoin, HashKeys: (t0.bid)
> >>>             JoinQuals: (t0.bid = t2.bid)
> >>>             Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
> >>>    Depth 2: GpuHashJoin, HashKeys: (t0.aid)
> >>>             JoinQuals: (t0.aid = t1.aid)
> >>>             Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
> >>>    ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
> >>>    ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
> >>>    ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
> >>> (11 rows)
>
> > My largest concern for you proposition is, ForeignScan/CustomScan node is
> > enforced to print name of underlying relations, regardless of its actual
> > behavior. The above GpuJoin never scans tables at least, thus, it mislead
> > users if we have no choice to print underlying relation names.
>
> OK, I understand we would need special handling for such custom joins.
>
It is not a case only for CustomScan.

Please assume an enhancement of postgres_fdw that reads a small local table (tbl_1)
and parse them as VALUES() clause within a remote query to execute remote join
with foreign tables (ftbl_2, ftbl_3).
This ForeignScan node has three underlying relations; tbl_1, ftbl_2 and ftbl_3.
Likely, tbl_1 will be scanned by SeqScan, not ForeignScan itself.
In this case, which relations shall be printed around ForeignScan?
Is it possible to choose proper relation names without hint by the extension?
  ^^^^^^^^^^^^ 
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/29 13:05, Etsuro Fujita wrote:
> In a foreign-join case,
> however, we can't see such relations from the EXPLAIN printed *by core*.
>  postgres_fdw avoids this issue by adding such relations to the EXPLAIN
> using ExplainForeignScan as shown in the below example, but since such
> relations are essential, I think that information should be shown by
> core itself.
>
> postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
> ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
> left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
>                            QUERY PLAN
> ----------------------------------------------------------------
>  Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
>    Hash Cond: (ft1.a = ft3.a)
>    ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>          Relations: (public.ft1) LEFT JOIN (public.ft2)
>    ->  Hash  (cost=102.05..102.05 rows=1 width=4)
>          ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
>                Relations: (public.ft3) LEFT JOIN (public.ft4)
> (7 rows)
>
> From the Relations line shown by postgres_fdw, we can see which foreign
> join joins which foreign tables, but if no such lines, we couldn't.

I thought about the Relations line a bit more and noticed that there are  
cases where the table reference names for joining relations in the  
Relations line are printed incorrectly.  Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,  
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2  
where t1.a = t2.a) as t(t1a, t2a);                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Unique (cost=204.12..204.13 rows=2 width=8)   Output: t1.a, t2.a   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a         Sort Key: t1.a, t2.a         ->  Append  (cost=100.00..204.11 rows=2 width=8)
-> Foreign Scan  (cost=100.00..102.04 rows=1 width=8)                     Output: t1.a, t2.a
Relations:(public.ft1 t1) INNER JOIN (public.ft2 t2)                     Remote SQL: SELECT r1.a, r2.a FROM (public.t1
r1 
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))               ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
              Output: t1_1.a, t2_1.a                     Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
          Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1  
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
(14 rows)

The table reference names for ft1 and ft2 in the Relations line for the  
second Foreign Scan should be t1_1 and t2_1 respectively.

Another concern about the Relations line is, that represents just an  
internal representation of a pushed-down join, so that would not  
necessarily match a deparsed query shown in the Remote SQL line.  Here  
is an example, which I found when working on supporting pushing down  
full outer join a lot more, by improving the deparsing logic so that  
postgres_fdw can build a remote query that involves subqueries [1],  
which I'll work on for 10.0:

+ -- full outer join with restrictions on the joining relations
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND  
60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON  
(t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+                                                                  QUERY  
PLAN  

+  

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Foreign Scan
+    Output: ft4.c1, ft5.c1
+    Relations: (public.ft4) FULL JOIN (public.ft5)
+    Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"  
WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM  
"S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =  
ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
+ (4 rows)

"(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not  
exactly match the deparsed query in the Remote SQL line, which I think  
would be rather confusing for users.  (We may be able to print more  
exact information in the Relations line so as to match the depaserd  
query, but I think that that would make the Relations line redundant.)

Would we really need the Relations line?  If joining relations are  
printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"  
as proposed upthread, we can see those relations from that, not the  
Relations line.  Also we can see the join tree structure from the  
deparsed query in the Remote SQL line.  The Relations line seems to be  
not that useful anymore, then.  What do you think about that?

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/07/29 13:28, Ashutosh Bapat wrote:

I wrote:
>             Probably something like this:
>
>                Foreign Processing
>                  Remote Operations: ...
>
>             In the Remote Operations line, the FDW/extension could print
>             any info
>             about remote operations, eg, "Scan/Join + Aggregate".

I wrote:
>     I intentionally chose that word and thought we could leave detailed
>     descriptions about remote operations to the FDW/extension; a broader
>     word like "Processing" seems to work well because we allow various
>     kinds of operations to the remote side, in addition to scans/joins,
>     to be performed in that one Foreign Scan node indicated by "Foreign
>     Processing", such as aggregation, window functions, distinct, order
>     by, row locking, table modification, or combinations of them.

> "Scan" is a better word than "Processing". From plan's perspective it's
> ultimately a Scan (on the data produced by the foreign server) and not
> processing.

Exactly, but one thing I'm concerned about is the table modification 
case; the EXPLAIN output would be something like this:
  Foreign Scan    Remote SQL: INSERT INTO remote_table VALUES ...

That would be strange, so I think a more broader word is better.  I 
don't think "Foreign Processing" is best.  "Foreign Upper" might be much 
better because the corresponding path is created by calling 
GetForeignUpperPaths.

Also for a Foreign Scan representing a foreign join, I think "Foreign 
Join" is better than "Foreign Scan".  Here is an example:
  Foreign Join on foreign_table1, foreign_table2    Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2

WHERE ...

I think "Foreign Join" better indicates that foreign tables listed after 
that (ie, foreign_table1 and foreign_table2 in the example) are joining 
(or component) tables of this join, than "Foreign Scan".

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Ashutosh Bapat
Дата:





I thought about the Relations line a bit more and noticed that there are cases where the table reference names for joining relations in the Relations line are printed incorrectly.  Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a = t2.a) as t(t1a, t2a);
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Unique  (cost=204.12..204.13 rows=2 width=8)
   Output: t1.a, t2.a
   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
         Output: t1.a, t2.a
         Sort Key: t1.a, t2.a
         ->  Append  (cost=100.00..204.11 rows=2 width=8)
               ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
                     Output: t1.a, t2.a
                     Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                     Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
               ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
                     Output: t1_1.a, t2_1.a
                     Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                     Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
(14 rows)

The table reference names for ft1 and ft2 in the Relations line for the second Foreign Scan should be t1_1 and t2_1 respectively.

Relations line prints the names of foreign tables that are being joined and the type of join. I find t1_1 and t2_1 more confusing since the query that user has provided does not mention t1_1 and t2_1.


Would we really need the Relations line?  If joining relations are printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1" as proposed upthread, we can see those relations from that, not the Relations line.

The join type is missing in that description.
 
Also we can see the join tree structure from the deparsed query in the Remote SQL line.

The remote SQL has the names of the table on the foreign server. It does not help to identify the local names.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/01 20:31, Ashutosh Bapat wrote:
>     I thought about the Relations line a bit more and noticed that there
>     are cases where the table reference names for joining relations in
>     the Relations line are printed incorrectly.  Here is an example:
>
>     postgres=# explain verbose select * from (select t1.a, t2.a from ft1
>     t1, ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1,
>     ft2 t2 where t1.a = t2.a) as t(t1a, t2a);
>                                                          QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------
>      Unique  (cost=204.12..204.13 rows=2 width=8)
>        Output: t1.a, t2.a
>        ->  Sort  (cost=204.12..204.12 rows=2 width=8)
>              Output: t1.a, t2.a
>              Sort Key: t1.a, t2.a
>              ->  Append  (cost=100.00..204.11 rows=2 width=8)
>                    ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                          Output: t1.a, t2.a
>                          Relations: (public.ft1 t1) INNER JOIN
>     (public.ft2 t2)
>                          Remote SQL: SELECT r1.a, r2.a FROM (public.t1
>     r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
>                    ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                          Output: t1_1.a, t2_1.a
>                          Relations: (public.ft1 t1) INNER JOIN
>     (public.ft2 t2)
>                          Remote SQL: SELECT r1.a, r2.a FROM (public.t1
>     r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
>     (14 rows)
>
>     The table reference names for ft1 and ft2 in the Relations line for
>     the second Foreign Scan should be t1_1 and t2_1 respectively.

> Relations line prints the names of foreign tables that are being joined
> and the type of join. I find t1_1 and t2_1 more confusing since the
> query that user has provided does not mention t1_1 and t2_1.

Please look at the Output line for the second Foreign Scan in the 
EXPLAIN.  (The reason for that is because postgres_fdw gets table 
reference names directly from rte->eref->aliasname, while EXPLAIN gets 
those through select_rtable_names_for_explain.)

>     Would we really need the Relations line?  If joining relations are
>     printed by core like "Foreign Join on public.ft1 t1_1, public.ft2
>     t2_1" as proposed upthread, we can see those relations from that,
>     not the Relations line.

> The join type is missing in that description.

>     Also we can see the join tree structure from the deparsed query in
>     the Remote SQL line.

> The remote SQL has the names of the table on the foreign server. It does
> not help to identify the local names.

We can see the remote names of the foreign tables from the catalogs, so 
we would easily identify the local names of foreign tables in the remote 
SQL and thus the join type (or join tree structure) from the SQL.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
> Sent: Monday, August 01, 2016 8:26 PM
> To: Ashutosh Bapat
> Cc: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
> 
> On 2016/07/29 13:28, Ashutosh Bapat wrote:
> 
> I wrote:
> >             Probably something like this:
> >
> >                Foreign Processing
> >                  Remote Operations: ...
> >
> >             In the Remote Operations line, the FDW/extension could print
> >             any info
> >             about remote operations, eg, "Scan/Join + Aggregate".
> 
> I wrote:
> >     I intentionally chose that word and thought we could leave detailed
> >     descriptions about remote operations to the FDW/extension; a broader
> >     word like "Processing" seems to work well because we allow various
> >     kinds of operations to the remote side, in addition to scans/joins,
> >     to be performed in that one Foreign Scan node indicated by "Foreign
> >     Processing", such as aggregation, window functions, distinct, order
> >     by, row locking, table modification, or combinations of them.
> 
> > "Scan" is a better word than "Processing". From plan's perspective it's
> > ultimately a Scan (on the data produced by the foreign server) and not
> > processing.
> 
> Exactly, but one thing I'm concerned about is the table modification
> case; the EXPLAIN output would be something like this:
> 
>    Foreign Scan
>      Remote SQL: INSERT INTO remote_table VALUES ...
> 
> That would be strange, so I think a more broader word is better.  I
> don't think "Foreign Processing" is best.  "Foreign Upper" might be much
> better because the corresponding path is created by calling
> GetForeignUpperPaths.
>
Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
the ForeignScan node actually insert tuples.
From the standpoint of users, it looks "Foreign Insert".

> Also for a Foreign Scan representing a foreign join, I think "Foreign
> Join" is better than "Foreign Scan".  Here is an example:
> 
>    Foreign Join on foreign_table1, foreign_table2
>      Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2
> WHERE ...
> 
> I think "Foreign Join" better indicates that foreign tables listed after
> that (ie, foreign_table1 and foreign_table2 in the example) are joining
> (or component) tables of this join, than "Foreign Scan".
>
Postgres_fdw knows it is remote join. It is quite easy to tell the core
this ForeignScan node is "Join". Also, it knows which tables are involved in.
We can add hint information to control relations name to be printed.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/01 22:25, Kouhei Kaigai wrote:

I wrote:
>>>     a broader
>>>     word like "Processing" seems to work well because we allow various
>>>     kinds of operations to the remote side, in addition to scans/joins,
>>>     to be performed in that one Foreign Scan node indicated by "Foreign
>>>     Processing", such as aggregation, window functions, distinct, order
>>>     by, row locking, table modification, or combinations of them.
>> On 2016/07/29 13:28, Ashutosh Bapat wrote:
>>> "Scan" is a better word than "Processing". From plan's perspective it's
>>> ultimately a Scan (on the data produced by the foreign server) and not
>>> processing.

I wrote:
>> Exactly, but one thing I'm concerned about is the table modification
>> case; the EXPLAIN output would be something like this:
>>
>>    Foreign Scan
>>      Remote SQL: INSERT INTO remote_table VALUES ...
>>
>> That would be strange, so I think a more broader word is better.  I
>> don't think "Foreign Processing" is best.  "Foreign Upper" might be much
>> better because the corresponding path is created by calling
>> GetForeignUpperPaths.

> Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
> the ForeignScan node actually insert tuples.
> From the standpoint of users, it looks "Foreign Insert".

My concern here is EXPLAIN for foreign joins.  I think it's another 
problem how we handle Foreign Scan plan nodes representing 
post-scan/join operations in EXPLAIN, so I'd like to leave that for 
another patch.

I wrote:
>> Also for a Foreign Scan representing a foreign join, I think "Foreign
>> Join" is better than "Foreign Scan".  Here is an example:
>>
>>    Foreign Join on foreign_table1, foreign_table2
>>      Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2
>> WHERE ...
>>
>> I think "Foreign Join" better indicates that foreign tables listed after
>> that (ie, foreign_table1 and foreign_table2 in the example) are joining
>> (or component) tables of this join, than "Foreign Scan".

> Postgres_fdw knows it is remote join. It is quite easy to tell the core
> this ForeignScan node is "Join". Also, it knows which tables are involved in.

Yeah, we can do that.

> We can add hint information to control relations name to be printed.

For foreign joins, it would make sense to add such a functionality when 
necessary, for example when we extend the pushdown feature so that we 
can do what you proposed upthread.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> On 2016/08/01 22:25, Kouhei Kaigai wrote:
> 
> I wrote:
> >>>     a broader
> >>>     word like "Processing" seems to work well because we allow various
> >>>     kinds of operations to the remote side, in addition to scans/joins,
> >>>     to be performed in that one Foreign Scan node indicated by "Foreign
> >>>     Processing", such as aggregation, window functions, distinct, order
> >>>     by, row locking, table modification, or combinations of them.
> 
>  >> On 2016/07/29 13:28, Ashutosh Bapat wrote:
> >>> "Scan" is a better word than "Processing". From plan's perspective it's
> >>> ultimately a Scan (on the data produced by the foreign server) and not
> >>> processing.
> 
> I wrote:
> >> Exactly, but one thing I'm concerned about is the table modification
> >> case; the EXPLAIN output would be something like this:
> >>
> >>    Foreign Scan
> >>      Remote SQL: INSERT INTO remote_table VALUES ...
> >>
> >> That would be strange, so I think a more broader word is better.  I
> >> don't think "Foreign Processing" is best.  "Foreign Upper" might be much
> >> better because the corresponding path is created by calling
> >> GetForeignUpperPaths.
> 
> > Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
> > the ForeignScan node actually insert tuples.
> > From the standpoint of users, it looks "Foreign Insert".
> 
> My concern here is EXPLAIN for foreign joins.  I think it's another
> problem how we handle Foreign Scan plan nodes representing
> post-scan/join operations in EXPLAIN, so I'd like to leave that for
> another patch.
>
What is the post-scan/join operations? Are you saying EPQ recheck and
alternative local join plan?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/02 13:32, Kouhei Kaigai wrote:

I wrote:
>> My concern here is EXPLAIN for foreign joins.  I think it's another
>> problem how we handle Foreign Scan plan nodes representing
>> post-scan/join operations in EXPLAIN, so I'd like to leave that for
>> another patch.

> What is the post-scan/join operations? Are you saying EPQ recheck and
> alternative local join plan?

No.  I mean e.g., aggregation, window functions, sorting, or table 
modification.  In other words, Foreign Scan plan nodes created from 
ForeignPath paths created from GetForeignUpperPaths.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
> Sent: Tuesday, August 02, 2016 2:45 PM
> To: Kaigai Kouhei(海外 浩平); Ashutosh Bapat
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
> 
> On 2016/08/02 13:32, Kouhei Kaigai wrote:
> 
> I wrote:
> >> My concern here is EXPLAIN for foreign joins.  I think it's another
> >> problem how we handle Foreign Scan plan nodes representing
> >> post-scan/join operations in EXPLAIN, so I'd like to leave that for
> >> another patch.
> 
> > What is the post-scan/join operations? Are you saying EPQ recheck and
> > alternative local join plan?
> 
> No.  I mean e.g., aggregation, window functions, sorting, or table
> modification.  In other words, Foreign Scan plan nodes created from
> ForeignPath paths created from GetForeignUpperPaths.
>
Why do you need to handle these upper paths individually?
FDW driver knows the remote query contains aggregation, window functions,
sorting, or table modification. It can give proper label.

If remote query contains partial aggregation and relations join, for
example, "Partial Agg + Scan" will be a proper label that shall be
followed by the "Foreign %s".

All you need to do are the two enhancements:
- Add "const char *explain_label" on the ForeignScanState or somewhere extension can set. It gives a label to be
printed.If NULL string is set, EXPLAIN shows "Foreign Scan" as a default.
 
- Add "Bitmapset explain_rels" on the ForeignScanState or somewhere extension can set. It indicates the relations to be
printedafter the "Foreign %s" token. If you want to print all the relations names underlying this ForeignScan node,
justcopy scanrelids bitmap. If NULL bitmap is set, EXPLAIN shows nothing as a default.
 

Please note that the default does not change the existing behavior.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/01 20:15, Etsuro Fujita wrote:
> I thought about the Relations line a bit more and noticed that there are
> cases where the table reference names for joining relations in the
> Relations line are printed incorrectly.  Here is an example:
>
> postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,
> ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2
> where t1.a = t2.a) as t(t1a, t2a);
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>
>  Unique  (cost=204.12..204.13 rows=2 width=8)
>    Output: t1.a, t2.a
>    ->  Sort  (cost=204.12..204.12 rows=2 width=8)
>          Output: t1.a, t2.a
>          Sort Key: t1.a, t2.a
>          ->  Append  (cost=100.00..204.11 rows=2 width=8)
>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                      Output: t1.a, t2.a
>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                      Output: t1_1.a, t2_1.a
>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
> (14 rows)
>
> The table reference names for ft1 and ft2 in the Relations line for the
> second Foreign Scan should be t1_1 and t2_1 respectively.
>
> Another concern about the Relations line is, that represents just an
> internal representation of a pushed-down join, so that would not
> necessarily match a deparsed query shown in the Remote SQL line.  Here
> is an example, which I found when working on supporting pushing down
> full outer join a lot more, by improving the deparsing logic so that
> postgres_fdw can build a remote query that involves subqueries [1],
> which I'll work on for 10.0:
>
> + -- full outer join with restrictions on the joining relations
> + EXPLAIN (COSTS false, VERBOSE)
> + SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND
> 60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON
> (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
> +                                                                 QUERY
> PLAN
> +
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> +  Foreign Scan
> +    Output: ft4.c1, ft5.c1
> +    Relations: (public.ft4) FULL JOIN (public.ft5)
> +    Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"
> WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM
> "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =
> ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
> + (4 rows)
>
> "(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not
> exactly match the deparsed query in the Remote SQL line, which I think
> would be rather confusing for users.  (We may be able to print more
> exact information in the Relations line so as to match the depaserd
> query, but I think that that would make the Relations line redundant.)
>
> Would we really need the Relations line?  If joining relations are
> printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"
> as proposed upthread, we can see those relations from that, not the
> Relations line.  Also we can see the join tree structure from the
> deparsed query in the Remote SQL line.  The Relations line seems to be
> not that useful anymore, then.  What do you think about that?

I removed the Relations line.  Here is an updated version of the patch.

* As I said upthread, I left the upper-relation handling for another
patch.  Currently, the patch prints "Foreign Scan" with no relations in
that case.

* I kept custom joins as-is.  We would need discussions about how to
choose relations we print in EXPLAIN, so I'd also like to leave that for
yet another patch.

Best regards,
Etsuro Fujita

Вложения

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
> Sent: Tuesday, August 02, 2016 9:36 PM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
>
> On 2016/08/01 20:15, Etsuro Fujita wrote:
> > I thought about the Relations line a bit more and noticed that there are
> > cases where the table reference names for joining relations in the
> > Relations line are printed incorrectly.  Here is an example:
> >
> > postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,
> > ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2
> > where t1.a = t2.a) as t(t1a, t2a);
> >                                                      QUERY PLAN
> >
> ----------------------------------------------------------------------------------
> ----------------------------------
> >
> >  Unique  (cost=204.12..204.13 rows=2 width=8)
> >    Output: t1.a, t2.a
> >    ->  Sort  (cost=204.12..204.12 rows=2 width=8)
> >          Output: t1.a, t2.a
> >          Sort Key: t1.a, t2.a
> >          ->  Append  (cost=100.00..204.11 rows=2 width=8)
> >                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
> >                      Output: t1.a, t2.a
> >                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
> >                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> > INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
> >                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
> >                      Output: t1_1.a, t2_1.a
> >                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
> >                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> > INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
> > (14 rows)
> >
> > The table reference names for ft1 and ft2 in the Relations line for the
> > second Foreign Scan should be t1_1 and t2_1 respectively.
> >
> > Another concern about the Relations line is, that represents just an
> > internal representation of a pushed-down join, so that would not
> > necessarily match a deparsed query shown in the Remote SQL line.  Here
> > is an example, which I found when working on supporting pushing down
> > full outer join a lot more, by improving the deparsing logic so that
> > postgres_fdw can build a remote query that involves subqueries [1],
> > which I'll work on for 10.0:
> >
> > + -- full outer join with restrictions on the joining relations
> > + EXPLAIN (COSTS false, VERBOSE)
> > + SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND
> > 60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON
> > (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
> > +                                                                 QUERY
> > PLAN
> > +
> >
> ----------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------
> ---------------------------------
> >
> > +  Foreign Scan
> > +    Output: ft4.c1, ft5.c1
> > +    Relations: (public.ft4) FULL JOIN (public.ft5)
> > +    Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"
> > WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM
> > "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =
> > ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
> > + (4 rows)
> >
> > "(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not
> > exactly match the deparsed query in the Remote SQL line, which I think
> > would be rather confusing for users.  (We may be able to print more
> > exact information in the Relations line so as to match the depaserd
> > query, but I think that that would make the Relations line redundant.)
> >
> > Would we really need the Relations line?  If joining relations are
> > printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"
> > as proposed upthread, we can see those relations from that, not the
> > Relations line.  Also we can see the join tree structure from the
> > deparsed query in the Remote SQL line.  The Relations line seems to be
> > not that useful anymore, then.  What do you think about that?
>
> I removed the Relations line.  Here is an updated version of the patch.
>
> * As I said upthread, I left the upper-relation handling for another
> patch.  Currently, the patch prints "Foreign Scan" with no relations in
> that case.
>
> * I kept custom joins as-is.  We would need discussions about how to
> choose relations we print in EXPLAIN, so I'd also like to leave that for
> yet another patch.
>
Please don't rely on fs_relids bitmap to pick up relations to be printed.
It just hold a set of underlying relations, but it does not mean all of
these relations are actually scanned inside of the ForeignScan.

You didn't answer the following scenario I pointed out in the upthread.

| Please assume an enhancement of postgres_fdw that reads a small local table (tbl_1)
| and parse them as VALUES() clause within a remote query to execute remote join
| with foreign tables (ftbl_2, ftbl_3).
| This ForeignScan node has three underlying relations; tbl_1, ftbl_2 and ftbl_3.
| Likely, tbl_1 will be scanned by SeqScan, not ForeignScan itself.
| In this case, which relations shall be printed around ForeignScan?
| Is it possible to choose proper relation names without hint by the extension?
|                                                ^^^^^^^^^^^^

To care about these FDW usage, you should add an alternative bitmap rather
than fs_relids/custom_relids. My suggestion is, having explain_relids for
the purpose.
Also, the logic to print "Foreign (Scan|Insert|Update|Delete)" is different
from what I suggested. I'm suggesting to allow extension giving a label
to fill up "Foreign %s" format.

Please explain why your choice is better than my proposition.

At least, my proposition is available to apply on both of foreign-scan and
custom-scan, and no need to future maintenance if and when FDW gets support
remote Aggregation, Sort or others.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/02 22:02, Kouhei Kaigai wrote:

I wrote:
>> I removed the Relations line.  Here is an updated version of the patch.
>>
>> * As I said upthread, I left the upper-relation handling for another
>> patch.  Currently, the patch prints "Foreign Scan" with no relations in
>> that case.
>>
>> * I kept custom joins as-is.  We would need discussions about how to
>> choose relations we print in EXPLAIN, so I'd also like to leave that for
>> yet another patch.

> Please don't rely on fs_relids bitmap to pick up relations to be printed.
> It just hold a set of underlying relations, but it does not mean all of
> these relations are actually scanned inside of the ForeignScan.

Firstly, I'd like to discuss about what the relations printed after  
"Foreign join" would mean.  I think they would mean the relations  
involved in that foreign join (in other words, the ones that participate  
in that foreign join) rather than the relations to be scanned by a  
Foreign Scan representing that foreign join.  Wouldn't that make sense?

In that sense I think it's reasonable to print all relations specified  
in fs_relids after "Foreign Join".  (And in that sense I was thinking we  
could handle the custom join case the same way as the foreign join case.)

> You didn't answer the following scenario I pointed out in the upthread.
>
> | Please assume an enhancement of postgres_fdw that reads a small local table (tbl_1)
> | and parse them as VALUES() clause within a remote query to execute remote join
> | with foreign tables (ftbl_2, ftbl_3).
> | This ForeignScan node has three underlying relations; tbl_1, ftbl_2 and ftbl_3.
> | Likely, tbl_1 will be scanned by SeqScan, not ForeignScan itself.
> | In this case, which relations shall be printed around ForeignScan?
> | Is it possible to choose proper relation names without hint by the extension?
> |                                                ^^^^^^^^^^^^
>
> To care about these FDW usage, you should add an alternative bitmap rather
> than fs_relids/custom_relids. My suggestion is, having explain_relids for
> the purpose.

We currently don't allow such a join to be performed as a foreign join,  
because we allow a join to be performed so if the relations of the join  
are all foreign tables belonging to the same remote server, as you know.

So, as I said upthread, I think it would make sense to introduce such a  
bitmap when we extend the existing foreign-join pushdown infrastructure  
so that we can do such a thing as proposed by you.  I guess that that  
would need to extend the concept of foreign joins, though.

> Also, the logic to print "Foreign (Scan|Insert|Update|Delete)" is different
> from what I suggested. I'm suggesting to allow extension giving a label
> to fill up "Foreign %s" format.
>
> Please explain why your choice is better than my proposition.

No, I haven't done anything about that yet.  I kept the behavior as-is.

> At least, my proposition is available to apply on both of foreign-scan and
> custom-scan, and no need to future maintenance if and when FDW gets support
> remote Aggregation, Sort or others.

I'd like to discuss this issue separately, maybe in a new thread.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
> Sent: Thursday, August 04, 2016 4:42 PM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
>
> On 2016/08/02 22:02, Kouhei Kaigai wrote:
>
> I wrote:
> >> I removed the Relations line.  Here is an updated version of the patch.
> >>
> >> * As I said upthread, I left the upper-relation handling for another
> >> patch.  Currently, the patch prints "Foreign Scan" with no relations in
> >> that case.
> >>
> >> * I kept custom joins as-is.  We would need discussions about how to
> >> choose relations we print in EXPLAIN, so I'd also like to leave that for
> >> yet another patch.
>
> > Please don't rely on fs_relids bitmap to pick up relations to be printed.
> > It just hold a set of underlying relations, but it does not mean all of
> > these relations are actually scanned inside of the ForeignScan.
>
> Firstly, I'd like to discuss about what the relations printed after
> "Foreign join" would mean.  I think they would mean the relations
> involved in that foreign join (in other words, the ones that participate
> in that foreign join) rather than the relations to be scanned by a
> Foreign Scan representing that foreign join.  Wouldn't that make sense?
>
> In that sense I think it's reasonable to print all relations specified
> in fs_relids after "Foreign Join".  (And in that sense I was thinking we
> could handle the custom join case the same way as the foreign join case.)
>
> > You didn't answer the following scenario I pointed out in the upthread.
> >
> > | Please assume an enhancement of postgres_fdw that reads a small local table (tbl_1)
> > | and parse them as VALUES() clause within a remote query to execute remote join
> > | with foreign tables (ftbl_2, ftbl_3).
> > | This ForeignScan node has three underlying relations; tbl_1, ftbl_2 and ftbl_3.
> > | Likely, tbl_1 will be scanned by SeqScan, not ForeignScan itself.
> > | In this case, which relations shall be printed around ForeignScan?
> > | Is it possible to choose proper relation names without hint by the extension?
> > |                                                ^^^^^^^^^^^^
> >
> > To care about these FDW usage, you should add an alternative bitmap rather
> > than fs_relids/custom_relids. My suggestion is, having explain_relids for
> > the purpose.
>
> We currently don't allow such a join to be performed as a foreign join,
> because we allow a join to be performed so if the relations of the join
> are all foreign tables belonging to the same remote server, as you know.
>
> So, as I said upthread, I think it would make sense to introduce such a
> bitmap when we extend the existing foreign-join pushdown infrastructure
> so that we can do such a thing as proposed by you.  I guess that that
> would need to extend the concept of foreign joins, though.
>
OK, right now, FDW does not support to take arbitrary child nodes, unlike
CustomScan. However, it implies your proposed infrastructure also has to
be revised once FDW gets enhanced in the near future.

> > Also, the logic to print "Foreign (Scan|Insert|Update|Delete)" is different
> > from what I suggested. I'm suggesting to allow extension giving a label
> > to fill up "Foreign %s" format.
> >
> > Please explain why your choice is better than my proposition.
>
> No, I haven't done anything about that yet.  I kept the behavior as-is.
>
> > At least, my proposition is available to apply on both of foreign-scan and
> > custom-scan, and no need to future maintenance if and when FDW gets support
> > remote Aggregation, Sort or others.
>
> I'd like to discuss this issue separately, maybe in a new thread.
>
Why do you try to re-invent a similar infrastructure twice and separately?

What I proposed perfectly covers what you want to do, and has more benefits.
- A common manner for both of ForeignScan and CustomScan
- Flexibility to control "Foreign XXX" label and relation names to be printed.

Even if it is sufficient for the current usage of FDW, I've been saying your
proposition is not sufficient for CustomScan nowadays, and ForeignScan in the
near future.
It is not an answer to ignore the CustomScan side, because we have to enhanced
the infrastructure of CustomScan side to follow up FDW sooner or later.
However, we will have to apply a different manner on CustomScan side, or overwrite
what you proposed on FDW side, at that time.
It is not a desirable future.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/04 18:03, Kouhei Kaigai wrote:

Kaigai-san wrote:
>>> Also, the logic to print "Foreign (Scan|Insert|Update|Delete)" is different
>>> from what I suggested. I'm suggesting to allow extension giving a label
>>> to fill up "Foreign %s" format.
>>>
>>> Please explain why your choice is better than my proposition.

I wrote:
>> No, I haven't done anything about that yet.  I kept the behavior as-is.

>>> At least, my proposition is available to apply on both of foreign-scan and
>>> custom-scan, and no need to future maintenance if and when FDW gets support
>>> remote Aggregation, Sort or others.

>> I'd like to discuss this issue separately, maybe in a new thread.

> Why do you try to re-invent a similar infrastructure twice and separately?

As I said above, I haven't changed the behavior of EXPLAIN for *upper  
relation processing* such as aggregation or sorting in a ForeignScan or  
CustomScan node.

> What I proposed perfectly covers what you want to do, and has more benefits.
> - A common manner for both of ForeignScan and CustomScan
> - Flexibility to control "Foreign XXX" label and relation names to be printed.

That may be so or not, but more importantly, this is more like a user  
interface problem, so each person would have different opinions about that.

> Even if it is sufficient for the current usage of FDW, I've been saying your
> proposition is not sufficient for CustomScan nowadays, and ForeignScan in the
> near future.

Again I haven't done anything about the EXPLAIN for upper relation  
processing in both ForeignScan and CustomScan cases.  I kept the  
behavior as-is, but I don't think the behavior as-is is OK, either.

> It is not an answer to ignore the CustomScan side, because we have to enhanced
> the infrastructure of CustomScan side to follow up FDW sooner or later.
> However, we will have to apply a different manner on CustomScan side, or overwrite
> what you proposed on FDW side, at that time.
> It is not a desirable future.

I agree on that point that it's better to handle both ForeignScan and  
CustomScan cases the same way.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/02 21:35, Etsuro Fujita wrote:
> I removed the Relations line.  Here is an updated version of the patch.

I revised code and comments a bit.  Attached is an updated version of
the patch.

Best regards,
Etsuro Fujita

Вложения

Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Robert Haas
Дата:
On Tue, Jul 26, 2016 at 11:20 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I noticed that currently the core doesn't show any information on the target
> relations involved in a foreign/custom join in EXPLAIN, by itself.

I think that's a feature, not a bug.

> postgres_fdw shows the target relations in the Relations line, as shown
> above, but I think that the core should show such information independently
> of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on
> public.ft1 t1, public.ft2 t2".

I disagree with that.  Currently, when we say that something is a join
(Merge Join, Hash Join, Nested Loop) we mean that the executor is
performing a join, but that's not the case here.  The executor is
performing a scan.  The remote side, we suppose, is performing a join
for us, but we are not performing a join: we are performing a scan.
So I think the fact that it shows up in the plan as "Foreign Scan" is
exactly right.  We are scanning some foreign thing, and that thing may
internally be doing other stuff, like joins and aggregates, but all
we're doing is scanning it.

Also, I don't really see the point of moving this from postgres_fdw to
core.  If, at some point in time, there are many FDWs that implement
sophisticated pushdown operations and we figure out that they are all
duplicating the code to do the EXPLAIN printout, and they're all
printing basically the same thing, perhaps not in an entirely
consistent way, then we could try to unify all of them into one
implementation in core.  But that's certainly not where we are right
now.  I don't see any harm at all in leaving this under the control of
the FDW, and in fact, I think it's better.  Neither the postgres_fdw
format nor what you want to replace it with are so unambiguously
awesome that some other FDW author might not come up with something
better.

I think we should leave this the way it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/05 21:47, Robert Haas wrote:
> On Tue, Jul 26, 2016 at 11:20 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> I noticed that currently the core doesn't show any information on the target
>> relations involved in a foreign/custom join in EXPLAIN, by itself.

> I think that's a feature, not a bug.

I agree with you.  I'd leave that for 10.0.

>> postgres_fdw shows the target relations in the Relations line, as shown
>> above, but I think that the core should show such information independently
>> of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on
>> public.ft1 t1, public.ft2 t2".

> I disagree with that.  Currently, when we say that something is a join
> (Merge Join, Hash Join, Nested Loop) we mean that the executor is
> performing a join, but that's not the case here.  The executor is
> performing a scan.  The remote side, we suppose, is performing a join
> for us, but we are not performing a join: we are performing a scan.
> So I think the fact that it shows up in the plan as "Foreign Scan" is
> exactly right.  We are scanning some foreign thing, and that thing may
> internally be doing other stuff, like joins and aggregates, but all
> we're doing is scanning it.

Hmm.  One thing I'm concerned about would be the case where direct 
modification is implemented by using GetForeignUpperPaths, not 
PlanDirectModify.  In that case, the way things are now, we would have 
"Foreign Scan" followed by an INSERT/UPDATE/DELETE query, but that seems 
odd to me.

> Also, I don't really see the point of moving this from postgres_fdw to
> core.  If, at some point in time, there are many FDWs that implement
> sophisticated pushdown operations and we figure out that they are all
> duplicating the code to do the EXPLAIN printout, and they're all
> printing basically the same thing, perhaps not in an entirely
> consistent way, then we could try to unify all of them into one
> implementation in core.  But that's certainly not where we are right
> now.  I don't see any harm at all in leaving this under the control of
> the FDW, and in fact, I think it's better.  Neither the postgres_fdw
> format nor what you want to replace it with are so unambiguously
> awesome that some other FDW author might not come up with something
> better.
>
> I think we should leave this the way it is.

One thing we need to do to leave that as is would be to fix a bug that I 
pointed out upthred.  Let me explain about that again.  The EXPLAIN 
command selects relation aliases to be used in printing a query so that 
each selected alias is unique, but postgres_fdw wouldn't consider the 
uniqueness.  Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, 
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 
where t1.a = t2.a) as t(t1a, t2a);                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Unique (cost=204.12..204.13 rows=2 width=8)   Output: t1.a, t2.a   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a         Sort Key: t1.a, t2.a         ->  Append  (cost=100.00..204.11 rows=2 width=8)
-> Foreign Scan  (cost=100.00..102.04 rows=1 width=8)                     Output: t1.a, t2.a
Relations:(public.ft1 t1) INNER JOIN (public.ft2 t2)                     Remote SQL: SELECT r1.a, r2.a FROM (public.t1
r1
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))               ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
              Output: t1_1.a, t2_1.a                     Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
          Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 
 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
(14 rows)

The relation aliases in the Relations line in the second Foreign Scan, 
t1 and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1 
(compare the aliases in the Relations line with ones in the Output line 
directly above that, created by core).  The reason for that is because 
postgres_fdw has created the Relations info by using 
rte->eref->aliasname as relation aliases as is at path-creation time. 
Probably it would be a little bit too early for postgers_fdw to do that.  Couldn't postgres_fdw create that info after
queryplanning, for 
 
example, during ExplainForeignScan?

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Robert Haas
Дата:
On Mon, Aug 8, 2016 at 12:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>>> I noticed that currently the core doesn't show any information on the
>>> target
>>> relations involved in a foreign/custom join in EXPLAIN, by itself.
>> I think that's a feature, not a bug.
> I agree with you.  I'd leave that for 10.0.

I don't want to change it at all, neither in 10 or any later version.

>> I disagree with that.  Currently, when we say that something is a join
>> (Merge Join, Hash Join, Nested Loop) we mean that the executor is
>> performing a join, but that's not the case here.  The executor is
>> performing a scan.  The remote side, we suppose, is performing a join
>> for us, but we are not performing a join: we are performing a scan.
>> So I think the fact that it shows up in the plan as "Foreign Scan" is
>> exactly right.  We are scanning some foreign thing, and that thing may
>> internally be doing other stuff, like joins and aggregates, but all
>> we're doing is scanning it.
>
> Hmm.  One thing I'm concerned about would be the case where direct
> modification is implemented by using GetForeignUpperPaths, not
> PlanDirectModify.  In that case, the way things are now, we would have
> "Foreign Scan" followed by an INSERT/UPDATE/DELETE query, but that seems odd
> to me.

I don't think there's really any problem there.  But if there is,
let's solve it when someone submits that patch, not now.

> One thing we need to do to leave that as is would be to fix a bug that I
> pointed out upthred.  Let me explain about that again.  The EXPLAIN command
> selects relation aliases to be used in printing a query so that each
> selected alias is unique, but postgres_fdw wouldn't consider the uniqueness.
> Here is an example:
>
> postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, ft2
> t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a
> = t2.a) as t(t1a, t2a);
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=204.12..204.13 rows=2 width=8)
>    Output: t1.a, t2.a
>    ->  Sort  (cost=204.12..204.12 rows=2 width=8)
>          Output: t1.a, t2.a
>          Sort Key: t1.a, t2.a
>          ->  Append  (cost=100.00..204.11 rows=2 width=8)
>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                      Output: t1.a, t2.a
>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
> JOIN public.t2 r2 ON (((r1.a = r2.a))))
>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>                      Output: t1_1.a, t2_1.a
>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
> JOIN public.t2 r2 ON (((r1.a = r2.a))))
> (14 rows)
>
> The relation aliases in the Relations line in the second Foreign Scan, t1
> and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1
> (compare the aliases in the Relations line with ones in the Output line
> directly above that, created by core).  The reason for that is because
> postgres_fdw has created the Relations info by using rte->eref->aliasname as
> relation aliases as is at path-creation time. Probably it would be a little
> bit too early for postgers_fdw to do that.  Couldn't postgres_fdw create
> that info after query planning, for example, during ExplainForeignScan?

Yes, it seems what we are doing now is not consistent with what
happens for plain tables; that should probably be fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/10 5:19, Robert Haas wrote:
> On Mon, Aug 8, 2016 at 12:22 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> One thing we need to do to leave that as is would be to fix a bug that I
>> pointed out upthred.  Let me explain about that again.  The EXPLAIN command
>> selects relation aliases to be used in printing a query so that each
>> selected alias is unique, but postgres_fdw wouldn't consider the uniqueness.
>> Here is an example:
>>
>> postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, ft2
>> t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a
>> = t2.a) as t(t1a, t2a);
>>                                                      QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------
>>  Unique  (cost=204.12..204.13 rows=2 width=8)
>>    Output: t1.a, t2.a
>>    ->  Sort  (cost=204.12..204.12 rows=2 width=8)
>>          Output: t1.a, t2.a
>>          Sort Key: t1.a, t2.a
>>          ->  Append  (cost=100.00..204.11 rows=2 width=8)
>>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>>                      Output: t1.a, t2.a
>>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
>> JOIN public.t2 r2 ON (((r1.a = r2.a))))
>>                ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>>                      Output: t1_1.a, t2_1.a
>>                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>>                      Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
>> JOIN public.t2 r2 ON (((r1.a = r2.a))))
>> (14 rows)
>>
>> The relation aliases in the Relations line in the second Foreign Scan, t1
>> and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1
>> (compare the aliases in the Relations line with ones in the Output line
>> directly above that, created by core).  The reason for that is because
>> postgres_fdw has created the Relations info by using rte->eref->aliasname as
>> relation aliases as is at path-creation time. Probably it would be a little
>> bit too early for postgers_fdw to do that.  Couldn't postgres_fdw create
>> that info after query planning, for example, during ExplainForeignScan?

> Yes, it seems what we are doing now is not consistent with what
> happens for plain tables; that should probably be fixed.

OK, I think we should fix the issue that postgres_fdw produces incorrect 
aliases for joining relations shown in the Relations line in EXPLAIN for 
a join pushdown query like the above) in advance of the 9.6 release, so 
I'll add this to the 9.6 open items.

Best regards,
Etsuro Fujita





Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Robert Haas
Дата:
On Tue, Aug 23, 2016 at 11:18 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> Yes, it seems what we are doing now is not consistent with what
>> happens for plain tables; that should probably be fixed.
>
> OK, I think we should fix the issue that postgres_fdw produces incorrect
> aliases for joining relations shown in the Relations line in EXPLAIN for a
> join pushdown query like the above) in advance of the 9.6 release, so I'll
> add this to the 9.6 open items.

Isn't it a bit late for that?

I'm not eager to have 9.6 get further delayed while we work on this
issue, and I definitely don't believe that this is a sufficiently
important issue to justify reverting join pushdown in its entirety.
We're talking about a minor detail of the EXPLAIN output that we'd
like to fix, but for which we have no consensus on exactly how to fix
it, and the fix may involve some redesign.  That does not seem like a
good thing to the week before rc1.  Let's just leave this well enough
alone and work on it for v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

От
Etsuro Fujita
Дата:
On 2016/08/25 1:08, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 11:18 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> OK, I think we should fix the issue that postgres_fdw produces incorrect
>> aliases for joining relations shown in the Relations line in EXPLAIN for a
>> join pushdown query like the above) in advance of the 9.6 release, so I'll
>> add this to the 9.6 open items.

> Isn't it a bit late for that?
>
> I'm not eager to have 9.6 get further delayed while we work on this
> issue, and I definitely don't believe that this is a sufficiently
> important issue to justify reverting join pushdown in its entirety.
> We're talking about a minor detail of the EXPLAIN output that we'd
> like to fix, but for which we have no consensus on exactly how to fix
> it, and the fix may involve some redesign.  That does not seem like a
> good thing to the week before rc1.  Let's just leave this well enough
> alone and work on it for v10.

That's fine with me.

You said upthread, "I don't want to change it at all, neither in 10 or 
any later version."  That was the reason why I proposed to fix this in 9.6.

Best regards,
Etsuro Fujita