SQL/MED with simple wrappers
От | Shigeru HANADA |
---|---|
Тема | SQL/MED with simple wrappers |
Дата | |
Msg-id | 20101025204525.F62A.6989961C@metrosystems.co.jp обсуждение исходный текст |
Ответы |
Re: SQL/MED with simple wrappers
(Pavel Stehule <pavel.stehule@gmail.com>)
|
Список | pgsql-hackers |
Here is a revised version of SQL/MED patch which is based on the discussion in CommitFest 2010-09. The original post in the CommitFest is: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00779.php In this message, I'll describe major changes from previous version of patch. Changes from HEAD are described in the wiki. http://wiki.postgresql.org/wiki/SQL/MED === Changes from last patch === 1) Per-column generic options In addition to per-table options, now per-column options are supported. Per-column options are stored in pg_attribute.attgenoptions column in same form as other FDW objects. New information_schema view columns and column_options are also added. 2) FDW Interface Per previous discussion, a function EstimateCosts() was added to FdwRoutine. It's called from cost_foreignscan() to estimate FDW-specific costs of a foreign scan in their own ways. Calling sequence is: 1) EstimateCosts(), from cost_foreignscan(), from 2) ConnectServer(), from ExecInitForeignScan() 3) Open(), from ExecInitForeignScan() 4) Iterate() for each result tuple, from ForeignNext() 5) ReOpen() for each outer loop, if necessary, from ExecForeignReScan() 6) Close(), from ExecEndForeignScan() -) FreeFSConnection() is not called from backend 3) Simplified FDWs(WIP) I also have revised two FDWs, file_fdw and postgresql_fdw, to more simple implementation to focus on discussion about the interface. 3-1) file_fdw This FDW handles flat text files on the server side, like COPY FROM. The available format is TEXT and CSV of COPY command. The codes are almost copied from COPY FROM and have many duplication. It would be better to integrate them, maybe COPY FROM will use file_fdw instead of heap-am. For this FDW, ConnectServer() and FreeFSConnection() have nothing to do because the flat files need no connection. EstimateCosts() should consider costs of: - reading records from the file - parsing records - creating tuples 3-2) postgresql_fdw(in contrib) This FDW handles data in external PostgreSQL servers, like contrib/dblink. Some part of the implementation would be able to be integrated with contrib/dblink. ConnectServer() manages connection cache with identifying connection by the server name, and all cached connections are disposed when the backend dies. EstimateCosts() should consider costs of: - executing query on the remote PostgreSQL - transferring result tuples via libpq - creating tuples from string - storing tuples in tuplestore - retrieve tuples from tuplestore The first one of above would be able to estimated by executing EXPLAIN ANALYZE on the remote side. EstimateCosts() would need to call ConnectServer() by itself, though. === Known issues and feature improvement === 1) Optimize foreign query In current implementation, each foreign table are scanned separately. It would be useful to push ORDER BY, LIMIT and OFFSET down to remote side for applications which uses paging. To implement such kind of optimization, optimizer would need to generate practical numbers of sortkeys of a foreign table, not on a brute force method. Some foreign scans using same foreign server would be able to united into one foreign query. For instance, JOIN, UNION and subqueries. 2) Integrate FDW with COPY FROM COPY FROM can handle CSV, TEXT and BINARY format stream (file or stdin) as data source. It would be worth to change COPY FROM to accept various data source using FDW infrastructure. The syntax would be: COPY table_name FROM SERVER server_name WITH OPTIONS (on-the-fly options); 3) Integrate file_fdw with dblink Currently dblink and postgresql_fdw have connection cache mechanism for each. Can they share connection management? 3) Connection invalidation on catalog change Should the foreign connection be invalid if the generic options of FDW-related objects, such as user mapping or foreign server, have been changed after establishment of the connection? Or ALTER statement should be blocked if the object was used by active connection? 4) List of foreign connections Users (especially DBAs?) might want to see list of foreign connections. Currently postgresql_fdw provides its own connection list via postgresql_fdw_connections view. Common view such as pg_foreign_connections would be needed? If so, function which returns list of active connections would be necessary in FDW API. 5) Routine mapping If a function in local query can be evaluated on the remote side in same semantics, it seems efficient to push the function down to the remote side. But how can we know whether the function can be pushed down or not? For such purpose, SQL/MED standard defines "routine mapping". Should we implement routine mapping? 6) MERGE join support ExecForeignMarkPos() and ExecForeignRestrPos() has been defined, but not implemented. They are used to mark/restore the position of scanning for merge join. New APIs, MarkPos() and RestorePos() should be added to FdwRoutine? It's not clear to me how the planner choose merge join. 7) Using cursor in postgresql_fdw postgresql_fdw fetches all of the result tuples from the foreign server in the first pgIterate() call. It could cause out-of-memory if the result set was huge. If libpq supports protocol-level cursor, postgresql_fdw will be able to divide result set into some sets and lower the usage of memory. Or should we use declare implicit cursor with DECLARE statement? One connection can be used by multiple ForeignScan nodes in a local query alternately. An issue is that cursor requires implicit transaction block. Is it OK to start transaction automatically? 8) Query-time constraints Currently foreign tables can have NOT NULL and/or CHECK constraints. It helps table partitioning with constraint exclusion. It also helps data validation on query-time. But applying constraints at query-time is has not been implemented. Regards, -- Shigeru Hanada
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Itagaki TakahiroДата:
Сообщение: Extensible executor nodes for preparation of SQL/MED