Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
От | Kouhei Kaigai |
---|---|
Тема | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) |
Дата | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8010D40FD@BPXM15GP.gisp.nec.co.jp обсуждение исходный текст |
Ответ на | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) (Shigeru HANADA <shigeru.hanada@gmail.com>) |
Ответы |
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
|
Список | pgsql-hackers |
Hanada-san, > I reviewed the Custom/Foreign join API patch again after writing a patch of join > push-down support for postgres_fdw. > Thanks for your dedicated jobs, my comments are inline below. > Here, please let me summarize the changes in the patch as the result of my review. > > * Add set_join_pathlist_hook_type in add_paths_to_joinrel > This hook is intended to provide a chance to add one or more CustomPaths for an > actual join combination. If the join is reversible, the hook is called for both > A * B and B * A. This is different from FDW API but it seems fine because FDWs > should have chances to process the join in more abstract level than CSPs. > > Parameters are same as hash_inner_and_outer, so they would be enough for hash-like > or nestloop-like methods. I’m not sure whether mergeclause_list is necessary > as a parameter or not. It’s information for merge join which is generated when > enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it > for processing a join in its own way? Then it must be in parameter list because > select_mergejoin_clauses is static so it’s not accessible from external modules. > I think, a preferable way is to reproduce the mergeclause_list by extension itself, rather than pass it as a hook argument, because it is uncertain whether CSP should follow "enable_mergejoin" parameter even if it implements a logic like merge-join. Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight- forward way. > The timing of the hooking, after considering all built-in path types, seems fine > because some of CSPs might want to use built-in paths as a template or a source. > > One concern is in the document of the hook function. "Implementing Custom Paths” > says: > > > A custom scan provider will be also able to add paths by setting the following > hook, to replace built-in join paths by custom-scan that performs as if a scan > on preliminary joined relations, which us called after the core code has generated > what it believes to be the complete and correct set of access paths for the join. > > I think “replace” would mis-lead readers that CSP can remove or edit existing > built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. IIUC > CSP can just add paths for the join relation, and planner choose it if it’s the > cheapest. > I adjusted the documentation stuff as follows: A custom scan provider will be also able to add paths by setting the following hook, to add <literal>CustomPath</> nodes that perform as if built-in join logic doing. It is typically expected to take two input relations then generate a joined output stream, or just scans preliminaty joined relations like materialized-view. This hook is called next to the consideration of core join logics, then planner will choose the best path to run the relations join in the built-in and custom ones. Probably, it can introduce what this hook works correctly. v12 patch updated only this portion. > * Add new FDW API GetForeignJoinPaths in make_join_rel > This FDW API is intended to provide a chance to add ForeignPaths for a join relation. > This is called only once for a join relation, so FDW should consider reversed > combination if it’s meaningful in their own mechanisms. > > Note that this is called only when the join relation was *NOT* found in the > PlannerInfo, to avoid redundant calls. > Yep, it is designed according to the discussion upthreads. It can produce N-way remote join paths even if intermediate join relation is more expensive than local join + two foreign scan. > Parameters seems enough for postgres_fdw to process N-way join on remote side > with pushing down join conditions and remote filters. > You ensured it clearly. > * Treat scanrelid == 0 as pseudo scan > A foreign/custom join is represented by a scan against a pseudo relation, i.e. > result of a join. Usually Scan has valid scanrelid, oid of a relation being > scanned, and many functions assume that it’s always valid. The patch adds another > code paths for scanrelid == 0 as custom/foreign join scans. > Right, > * Pseudo scan target list support > CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for > column reference tracking. A scan generated for custom/foreign join would have > column from multiple relations in its target list, i.e. output columns. Ordinary > scans have all valid columns of the relation as output, so references to them > can be resolved easily, but we need an additional mechanism to determine where > a reference in a target list of custom/foreign scan come from. This is very > similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect > reference to another relation’s var. > Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner which columns of relations are referenced, and which attribute represents what columns/relations. It is an interface contract when foreign/custom-scan is chosen instead of the built-in join logic. > For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan > and ForeignScan. For this change, new BitmapSet function bms_shift_members is > added. > > set_deparse_planstate is also changed to pass ps_tlist as namespace for > deparsing. > Yep, it is same as IndexOnlyScan. > These chanes seems reasonable, so I mark this patch as “ready for committers” > to hear committers' thoughts. > Thanks! -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Вложения
В списке pgsql-hackers по дате отправления: