Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Поиск
Список
Период
Сортировка
От Kouhei Kaigai
Тема Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Дата
Msg-id 9A28C8860F777E439AA12E8AEA7694F8010D4971@BPXM15GP.gisp.nec.co.jp
обсуждение исходный текст
Ответ на Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Ответы Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Robert,

Thanks for your comments.

> A few random cosmetic problems:
> 
> - The hunk in allpaths.c is useless.
> - The first hunk in fdwapi.h contains an extra space before the
> closing parenthesis.
>
OK, it's my oversight.

> And then:
> 
> +       else if (scan->scanrelid == 0 &&
> +                        (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
> +               varno = INDEX_VAR;
> 
> Suppose scan->scanrelid == 0 but the scan type is something else?  Is
> that legal?  Is varno == 0 the correct outcome in that case?
>
Right now, no other scan type has capability to return a tuples
with flexible type/attributes more than static definition.
I think it is a valid restriction that only foreign/custom-scan
can have scanrelid == 0.

I checked overall code again. One point doubtful was ExecScanFetch().
If estate->es_epqTuple is not NULL, it tries to save a tuple from
a particular scanrelid (larger than zero).
IIUC, es_epqTuple is used only when fetched tuple is updated then
visibility checks are applied on writer operation again.
So, it should work for CPS with underlying actual scan node on base
relations, however, I need code investigation if FDW/CSP replaced
an entire join subtree by an alternative relation scan (like a
materialized view).


> > [ new patch ]
> 
> A little more nitpicking:
> 
> ExecInitForeignScan() and ExecInitCustomScan() could declare
> currentRelation inside the if (scanrelid > 0) block instead of in the
> outer scope.
>
OK,

> I'm not too excited about the addition of GetFdwHandlerForRelation,
> which is a one-line function used in one place.  It seems like we
> don't really need that.
>
OK,

> On Fri, Mar 13, 2015 at 8:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> I don't object to the concept, but I think that is a pretty bad place
> >>> to put the hook call: add_paths_to_joinrel is typically called multiple
> >>> (perhaps *many*) times per joinrel and thus this placement would force
> >>> any user of the hook to do a lot of repetitive work.
> >
> >> Interesting point.  I guess the question is whether a some or all
> >> callers are going to actually *want* a separate call for each
> >> invocation of add_paths_to_joinrel(), or whether they'll be happy to
> >> operate on the otherwise-complete path list.
> >
> > Hmm.  You're right, it's certainly possible that some users would like to
> > operate on each possible pair of input relations, rather than considering
> > the joinrel "as a whole".  Maybe we need two hooks, one like your patch
> > and one like I suggested.
> 
> Let me attempt to summarize subsequent discussion on this thread by
> saying the hook location that you proposed (just before set_cheapest)
> has not elicited any enthusiasm from anyone else.  In a nutshell, the
> problem is that a single callback for a large join problem is just
> fine if there are no special joins involved, but in any other
> scenario, nobody knows how to use a hook at that location for anything
> useful.  To push down a join to the remote server, you've got to
> figure out how to emit an SQL query for it.  To execute it with a
> custom join strategy, you've got to know which of those joins should
> have inner join semantics vs. left join semantics.  A hook/callback in
> make_join_rel() or in add_paths_to_joinrel() makes that relatively
> straightforward. Otherwise, it's not clear what to do, short of
> copy-and-pasting join_search_one_level().  If you have a suggestion,
> I'd like to hear it.
>
Nothing I have. Once I tried to put a hook just after the set_cheapest(),
the largest problem was that we cannot extract a set of left and right
relations from a set of joined relations, like an extraction of apple
and orange from mix juice.

> If not, I'm going to press forward with the idea of putting the
> relevant logic in either add_paths_to_joinrel(), as previously
> proposed, or perhaps up oe level in make_one_rel().  Either way, if
> you don't need to be called multiple times per joinrel, you can stash
> a flag inside whatever you hang off of the joinrel's fdw_private and
> return immediately on every call after the first.  I think that's
> cheap enough that we shouldn't get too stressed about it: for FDWs, we
> only call the hook at all if everything in the joinrel uses the same
> FDW, so it won't get called at all except for joinrels where it's
> likely to win big; for custom joins, multiple calls are quite likely
> to be useful and necessary, and if the hook burns too much CPU time
> for the query performance you get out of it, that's the custom-join
> provider's fault, not ours.  The current patch takes this approach one
> step further and attempts FDW pushdown only once per joinrel.  It does
> that because, while postgres_fdw DOES need the jointype and a valid
> innerrel/outerrel breakdown to figure out what query to generate, it
> does NOT every possible breakdown; rather, the first one is as good as
> any other. But this might not be true for a non-PostgreSQL remote
> database.  So I think it's better to call the hook every time and let
> the hook return without doing anything if it wants.
>
Indeed. Although I and Hanada-san have discussed under an assumption of
remote PostgreSQL and join-pushdown cases, we may have remote RDBMS
that makes query execution plan according to the order of appear in
query.
If FDW driver don't want to call GetForeignJoinPaths() multiple times,
fdw_private of RelOptInfo is a good marker to determine whether it is
the first call or not.
In case when multiple CSP will add paths on join, we may need a facility
to allow multiple extensions to save its own private data.
If we could identify individual CSP by name, it may be an idea to
have a hash-table to track private data of CSP. But I don't think
it is mandatory feature in the 1st version.

> I'm still not totally sure whether make_one_rel() is better than
> add_paths_to_joinrel().  The current patch attempts to split the
> difference by doing FDW pushdown from make_one_rel() and custom joins
> from add_paths_to_joinrel().  I dunno why; if possible, those two
> things should happen in the same place.  Doing it in make_one_rel()
> makes for fewer arguments and fewer repetitive calls, but that's not
> much good if you would have had a use for the extra arguments that
> aren't computed until we get down to add_paths_to_joinrel().  I'm not
> sure whether that's the case or not.  The latest version of the
> postgres_fdw patch doesn't seem to mind not having extra_lateral_rels,
> but I'm wondering if that's busted.
>
As my initial proposition doing, my preference is add_paths_to_joinrel()
for values calculated during this routine (but also increases number
of arguments). Even if make_one_rel() called FDW/CSP, I expect extensions
have to re-generate these values again, by itself. It is not impossible
to implement, not a graceful manner at least.

As long as postgres_fdw checks fdw_private of RelOptInfo, amount of
code adjustment is not so much.
Hanada-san, how about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: inherit support for foreign tables
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: [BUGS] Failure to coerce unknown type to specific type