Re: patch: SQL/MED(FDW) DDL

Поиск
Список
Период
Сортировка
От Shigeru HANADA
Тема Re: patch: SQL/MED(FDW) DDL
Дата
Msg-id 20100927230506.C6A4.6989961C@metrosystems.co.jp
обсуждение исходный текст
Ответ на Re: patch: SQL/MED(FDW) DDL  (SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp>)
Список pgsql-hackers
Hi hackers,

On Mon, 27 Sep 2010 15:50:34 +0900
SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote:
> Right. In any case, I should clearify what this API could cover
> by this patch and what could not.
> # And also how far I and my collaborator can implement..
> 
> As Itagaki points out, we have two types of FDW implementations in
> progress: flat file wrapper and postgresql wrapper. these items are
> described in wiki:
>   http://wiki.postgresql.org/wiki/SQL/MED
> 
> But it may be hard to get what this fdw routines 'could' provide,
> so I and my collaborator is now summarizing these items to discuss and
> ask for help. Anyway I think these items are needed to discuss further.
>   - API design that is used in executor for a single external table.
>   - simple FDW implements to clearfy what this API could provide.

To discuss how the SQL/MED patches should be, we made summary of
current SQL/MED proposal.

* What the current SQL/MED patches provide

Recent patch fdw_select20100917 consists of some parts: I/F of
FDW, new executor node ForeignScan, connection caching, and FDW
for PostgreSQL.  FDW for flat file is not included.  All of them
are based on DDLs and catalogs which are provided by
fdw_table20100917 patch.

Itagaki has summarized the details of fdw_table20100917 patch
well.  Thanks for review.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01653.php

The I/F of FDW is defined as struct FdwRoutine, set of API
functions which are called from backend modules.  The APIs are
designed to support only scanning a foreign at once.  Path/Plan
optimizations like JOIN/UNION push-down are out of scope.  Such
optimizations require planner hook as mentioned by Itagaki before. 
In short, our current goal is to implement SeqScan for foreign
tables.

ForeignScan node is an executor node which is like SeqScan node
for local table.  Optimizer generates T_ForeignScan path instead
of T_SeqScan path for a foreign table in set_plain_rel_pathlist()
if the RangeTblEntry was a foreign table.  Index paths and
tidscan paths are never generated for foreign tables.

ForeignScanState and FdwReply are introduced to represent the
status of a foreign scan.  ForeignScanState is a subclass of
ScanState, and FdwReply is a abstract type which is used to pass
FDW-specific data between API calls.

* Details of FDW API

FDWs should implement HANDLER function which returns a pointer to
a FdwRoutine instance which has pointers to actual functions.

struct FdRoutine {
   /*    * ConnectServer() will be called from ExecInitForeignScan()    * if the backend has no connection which can be
usedto    * execute the foreign query for the foreign table.    * FDW should establish a connection between foreign
server   * and return it with casting to pointer to FSConnection    * (abstract connection type).  If the FDW doesn't
needany    * connection, returning NULL is OK.    * The arguments, server and user, can be used to extract    *
connectioninformation.    */   FSConnection* (*ConnectServer)(ForeignServer *server,
UserMapping*user);
 
   /*    * FreeFSConnection() will be called when backend dies or    * DISCARD ALL command was executed.    * FDW
shouldclose connection gracefully and free resources    * if any.    */   void (*FreeFSConnection)(FSConnection
*conn);
   /*    * Open() will be called from ExecInitForeignScan().    * FDW should initialize ForeignScanState, internal
stateof    * a foreign scan, and ready to return tuple in next Iterate()    * call.    * For instance, FDW for
PostgreSQLonly generate SQL from    * ScanState.  And we implemented WHERE clause push-down here.    */   void
(*Open)(ForeignScanState*scanstate);
 
   /*    * Iterate() will be called from ExecForeignScan() when the    * executor requests next tuple.      * For
instance,FDW for PostgreSQL executes foreign query at    * first call and stores all results into TupleStore, and    *
returnseach tuple for each Iterate() call.    */   void (*Iterate)(ForeignScanState *scanstate);
 
   /*    * Close() will be called from ExecEndForeignScan().    * FDW should free resources for FdwReply if any.    */
void (*Close)(ForeignScanState *scanstate);
 
   /*    * ReOpen() will be called from ExecForeignReScan() when the    * foreign scan should be reseted to scan whole
datafrom the    * head again.    * For instance, FDW for PostgreSQL frees current result set    * to make next
Iterate()call to execute foreign query again.    */   void (*ReOpen)(ForeignScanState *scanstate);
 

};

Maybe FdwRoutine should have more APIs to support essential
features.

The startup/total cost of scanning a foreign table are fixed to
10000.0 in current patch.  They are groundless values, they just
say that scanning foreign table costs more than scanning local
table.  The cost should be estimated from statistics in
pg_statistic and pg_class.  This function has not been
implemented yet.
   /*    * UpdateStats() will be called from ANALYZE routine when    * statistics of a foreign table should be updated.
  *    * Some more parameters may be needed.    *    * For instance, FDW for PostgreSQL will retrieve    * pg_statistic
andpg_class from foreign server and store    * them in local catalog.  How to represent overhead of    * foreign query
isstill issue.    */   void (*UpdateStats)(Oid relid);
 

To support path/plan optimization, some APIs should be added, but
we think that simple FDWs can be implemented with APIs above. 

* Next action

To focus on API discussion, we are going to simplify and
re-submit patches.
 * Fix some mistakes pointed out by Itagaki. * Make FDW for PostgreSQL independent contrib module.   Currently it
sharesconnection management codes with   contrib/dblink. * Add simple FDW for flat file (maybe CSV ?) into core to
implementregression tests for FOREIGN TABLE DDLs and foreign   table query.  In the future, the FDW for flat file can
be  integrated with COPY FROM.
 

Any comments/questions are welcome.

Regards,
--
Shigeru Hanada



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gurjeet Singh
Дата:
Сообщение: Re: Improving prep_buildtree used in VPATH builds
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Large objects.