Re: Async execution of postgres_fdw.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: Async execution of postgres_fdw. |
Дата | |
Msg-id | 20150113.204646.205937470.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Async execution of postgres_fdw. (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Ответы |
Re: Async execution of postgres_fdw.
(Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Async execution of postgres_fdw. (Robert Haas <robertmhaas@gmail.com>) Re: Async execution of postgres_fdw. (Michael Paquier <michael.paquier@gmail.com>) |
Список | pgsql-hackers |
Hello. This is a version 3 patch. - PgFdwConnState is removed - PgFdwConn is isolated as a separate module. - State transition was simplicated, I think. - Comment about multiple scans on a connection is added. - The issue of PREPARE is not addressed yet. - It is to show how the new style looks, so it is lacking for comments for every PgFdwConn functions. - Rebased to current master. ======= > > I added a comment describing the and logic and meaning of the > > statesjust above the enum declaration. > > > > This needs to be clarified further. But that can wait till we finalise the > approach and the patch. Esp. comment below is confusing > 1487 + * PGFDW_CONN_SYNC_RUNNING is rather an internal state in > 1488 + * fetch_more_data(). It indicates that the function shouldn't send > the next > 1489 + * fetch requst after getting the result. > > I couldn't get the meaning of the second sentence, esp. it's connection > with synchronous-ness. In this version, I removed PgFdwConnState. Now what indicates that async fetch is running or not is the existence of async_fetch. I think the complicated state transition is dissapeard. > > > The if condition if (entry->conn == NULL) in GetConnection(), used to > > track > > > whether there is a PGConn active for the given entry, now it tracks > > whether > > > it has PgFdwConn for the same. > > > > After some soncideration, I decided to make PgFdwConn to be > > handled more similarly to PGconn. This patch has shrunk as a > > result and bacame looks clear. > > > > I think it's still prone to segfaults considering two pointer indirections. PGconn itself already makes two-level indirection, and PgFdwConn has hidden its details mainly using macros. I may misunderstood you, but if you're worried that PgFdwConn.pgconn can be set from anywhere, we would should separate PgFdwConn into another C-module and hide all the details as PGconn does. It is shown as the separte patch. But I feel it a bit overdone because it is not an end-user interface. > > I wrote the outline of the logic in the comment for enum > > PgFdwConnState in postgres_fdw.h. Is it make sense? > > > > The point about two different ForeignScan nodes using the same connection > needs some clarification, I guess. It's not very clear, why would there be > more queries run on the same connection. I know why this happens, but it's > important to mention it somewhere. If it's already mentioned somewhere in > the file, sorry for not paying attention to that. Yeah. It is just what I stumbled on. I changed the comment in fetch_more_data() like below. Does it make sense? | /* | * On the current postgres_fdw implement, multiple PgFdwScanState | * on the same foreign server and mapped user share the same | * connection to the remote server (see GetConnection() in | * connection.c) and inidividual scans on it are separated using | * cursors. Since one connection cannot accept two or more | * asynchronous queries simultaneously, we should stop the async | * fetching if the another scan comes. | */ | | if (PFCgetNscans(conn) > 1) | PFCsetAsyncScan(conn, NULL); > I think there is more chance that there will more than one ForeignScan > nodes trying interact with a foreign server, even after the push-down work. > The current solution doesn't address that. We actually need parallel > querying in two cases > 1. Querying multiple servers in parallel > 2. Querying same server (by two querists) in parallel within the same query > e.g. an un-pushable join. > > We need a solution which is work in both the cases. The first point is realized by this patch with some limitations. The second point is that my old patch did, it made a dedicated connection for individual scans up to some fixed number aside the main connection, then the overflowed scans go to the main connection and they are done in the manner the unpatched postgres_fdw does. I was thinking that the 'some fiexed number' could be set by a parameter of a foreign server but I got a comment that it could fill up the remote server unless reasonable load or/and bandwidth managemant. So I abandoned the multiple-connection solution and decided to do everything on the first connection. It's how the current patch came. > Is it possible to use the parallel query infrastructure being built by > Robert or to do something like parallel seq scan? That will work, not just > for Postgres FDW but all the FDWs. I haven't seen closer to the patch but if my understanding by a glance is correct, the parallel scan devides the target table into multple parts then runs subscans on every part in parallel. It might allocate dedicated processes for every child scan on a partitioned table. But, I think, from the performance view, every scan of multiple foreign scans don't need correnponding local process. But if the parallel scan infrastructure makes the mechanism simpler, using it is a very promising choice. > In case of Prepared statements, ExecInit is called at the end of planning, > without subsequent execution like the case of EXPLAIN. I see that the patch > handles EXPLAIN well, but I didn't see any specific code for PREPARE. I'll look into the case after this, but I'd like to send a revised patch at this point. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: