Re: [BUGS] Replication to Postgres 10 on Windows is broken
От | Andres Freund |
---|---|
Тема | Re: [BUGS] Replication to Postgres 10 on Windows is broken |
Дата | |
Msg-id | 20170806171436.ve646fu4bpagdrc2@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: [BUGS] Replication to Postgres 10 on Windows is broken (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [BUGS] Replication to Postgres 10 on Windows is broken
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
Hi, On 2017-08-06 12:29:07 -0400, Tom Lane wrote: > Yeah. After some digging around I think I see exactly what is happening. > The error message would be better read as "Socket is not connected *yet*", > that is, the problem is that we're trying to write data before the > nonblocking connection request has completed. (This fits with the OP's > observation that local loopback connections work fine --- they probably > complete immediately.) PQconnectPoll believes that it just has to wait > for write-ready when waiting for a connection to complete. When using > connectDBComplete's wait loop, that reduces to a call to Windows' version > of select(2), in pqSocketPoll, and according to > > https://msdn.microsoft.com/en-us/library/windows/desktop/ms740141(v=vs.85).aspx > > "The parameter writefds identifies the sockets that are to be checked for > writability. If a socket is processing a connect call (nonblocking), a > socket is writeable if the connection establishment successfully > completes." > > On the other hand, in libpqwalreceiver, we're depending on latch.c's > implementation, and it uses WSAEventSelect's FD_WRITE event: > > https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx > > If I'm reading that correctly, FD_WRITE is set instantly by the connect > request, probably even in the nonblock case, and it only gets cleared > by a failed write request. It looks to me like we would have to > specifically look for FD_CONNECT, *not* FD_WRITE, to make this work. Nice digging. > This is problematic, because the APIs in between don't provide a way > to report that we're still waiting for connect rather than for > data-write-ready. Anybody have the stomach for extending PQconnectPoll's > API with an extra PGRES_POLLING_CONNECTING state? I'm a bit hesitant to do so at this phase of the release cycle, it'd kind of force all users to upgrade their code, and I'm sure there's a couple out-of-tree ones. And not just code explicitly using new versions of libpq, also users of old versions - several distributions just install newer libpq versions and rely on it being compatible. > If not, can we tell in > WaitEventAdjustWin32 that the socket is still connecting and we must > substitute FD_CONNECT for FD_WRITE? I was wondering, for a second, if we should just always use FD_CONNECT once in every set. But unfortunately there's plenty places that create/destroy sets at a high enough speed for that to not be a nice solution. A third solution would be to, for v10, add a #ifdef WIN32 block to libpqrcv_connect() that just waits till FD_CONNECT is ready. That has the disadvantage of not accepting interrupts, but still seems better than not working at all. That's not much of a real solution, but this late in the cycle it might be advisable to hold our noses :( Greetings, Andres Freund -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: