Обсуждение: Windows: Wrong error message at connection termination
Dear hackers,
I lately had a hard time to find the root cause for some wired behavior with the async API of libpq when running client and server on Windows. When the connection aborts with an error - most notably with an error at the connection setup - it sometimes fails with a wrong error message:
Instead of:
connection to server at "::1", port 5433 failed: FATAL: role "a" does not exist
it fails with:
connection to server at "::1", port 5433 failed: server closed the connection unexpectedly
I found out, that the recv() function of the Winsock API has some wired behavior. If the connection receives a TCP RST flag, recv() immediately returns -1, regardless if all previous data has been retrieved. So when the connection is closed hard, the behavior is timing dependent on the client side. It may drop the last packet or it delivers it to libpq, if libpq calls recv() quick enough.
This behavior is described at closesocket() here:
https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket
This is called a hard or abortive close, because the socket's virtual circuit is reset immediately, and any unsent data is lost. On Windows, any recv call on the remote side of the circuit will fail with WSAECONNRESET.
Unfortunately each connection is closed hard by a Windows PostgreSQL server with TCP flag RST. That in turn is another Winsock API behavior, that is that every socket, that wasn't closed by the application is closed hard with the RST flag at process termination. I didn't find any official documentation about this behavior.
Explicit closing the socket before process termination leads to a graceful close even on Windows. That is done by the attached patch. I think delivering the correct error message to the user is much more important that closing the process in sync with the socket.
Some background: I'm the maintainer of ruby-pg, the PostgreSQL client library for ruby. The next version of ruby-pg will switch to the async API for connection setup. Using this API changes the timing of socket operations and therefore often leads to the above wrong message. Previous versions made use of the sync API, which usually doesn't suffer from this issue. The original issue is here: https://github.com/ged/ruby-pg/issues/404
--
Kind Regards
Lars Kanis
Вложения
Lars Kanis <lars@greiz-reinsdorf.de> writes: > Explicit closing the socket before process termination leads to a > graceful close even on Windows. That is done by the attached patch. I > think delivering the correct error message to the user is much more > important that closing the process in sync with the socket. Per the comment immediately above this, it's intentional that we don't close the socket. I'm not really convinced that this is an improvement. Can we get anywhere by using shutdown(2) instead of close(), ie do a half-close? I have no idea what Windows thinks the semantics of that are, but it might be worth trying. regards, tom lane
On Thu, Nov 18, 2021 at 10:13 AM Lars Kanis <lars@greiz-reinsdorf.de> wrote: > Unfortunately each connection is closed hard by a Windows PostgreSQL server with TCP flag RST. That in turn is anotherWinsock API behavior, that is that every socket, that wasn't closed by the application is closed hard with the RSTflag at process termination. I didn't find any official documentation about this behavior. Interesting discovery. I think you might get the same behaviour from a Unix system if you set SO_LINGER to 0 before you exit[1]. I suppose if a TCP implementation is partially in user space (I have no idea if this is true for Windows, I never use it, but I recall that Winsock was at some point a DLL) and can't handle the existence of any socket state after the process is gone, you might want to nuke everything and tell the peer immediately that you're doing so on exit? I realise now that the experiments we did a while back to try to understand this across a few different operating systems[2] had missed this subtlety, because that Python script had an explicit close() call, whereas PostgreSQL exits. It still revealed that the client isn't allowed to read any data after its write failed, which is a known source of error messages being eaten. What I missed is that the client doesn't just get an RST and enter this no-you-can't-have-the-error-message-I-have-received state in response to data sent by the client (the usual way you expect to get RST), like in that test, but it also does so proactively when the server process exits, as you've explained (in other words, it's not necessary for the client to try to write to reach this error-eating state). [1] https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required [2] https://www.postgresql.org/message-id/flat/20190306030706.GA3967%40f01898859afd.ant.amazon.com#32f9f16f9be8da5ee5c3b405d6d1829c
Thomas Munro <thomas.munro@gmail.com> writes: > Interesting discovery. I think you might get the same behaviour from > a Unix system if you set SO_LINGER to 0 before you exit[1]. I suppose > if a TCP implementation is partially in user space (I have no idea if > this is true for Windows, I never use it, but I recall that Winsock > was at some point a DLL) and can't handle the existence of any socket > state after the process is gone, you might want to nuke everything and > tell the peer immediately that you're doing so on exit? It's definitely plausible that Windows does this because it can't handle retransmits once the sender's state is gone. However, it seems to me that any such state would be tied to the open socket, not to the sender process as such. Which would suggest that an early close() as Lars suggests would make things worse not better. This is all just speculation unfortunately. (Man, I hate dealing with closed-source software.) regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > I realise now that the experiments we did a while back to try to > understand this across a few different operating systems[2] had missed > this subtlety, because that Python script had an explicit close() > call, whereas PostgreSQL exits. It still revealed that the client > isn't allowed to read any data after its write failed, which is a > known source of error messages being eaten. Yeah. After re-reading that thread, I'm a bit confused about how to square the results we got then with Lars' report. The Windows documentation he pointed to does claim that the default behavior if you issue closesocket() is to do a "graceful close in the background", which one would think means allowing sent data to be received. That's not what we saw. It's possible that we would get different results if we re-tested with a scenario where the client doesn't attempt to send data after the server-side close; but I'm not sure how much it's worth to improve that case if the other case still fails hard. In any case, our previous results definitely show that issuing an explicit close() is no panacea. regards, tom lane
Am 18.11.21 um 03:04 schrieb Tom Lane: > Thomas Munro <thomas.munro@gmail.com> writes: >> I realise now that the experiments we did a while back to try to >> understand this across a few different operating systems[2] had missed >> this subtlety, because that Python script had an explicit close() >> call, whereas PostgreSQL exits. It still revealed that the client >> isn't allowed to read any data after its write failed, which is a >> known source of error messages being eaten. > Yeah. After re-reading that thread, I'm a bit confused about how > to square the results we got then with Lars' report. The Windows > documentation he pointed to does claim that the default behavior if you > issue closesocket() is to do a "graceful close in the background", which > one would think means allowing sent data to be received. That's not what > we saw. It's possible that we would get different results if we re-tested > with a scenario where the client doesn't attempt to send data after the > server-side close; but I'm not sure how much it's worth to improve that > case if the other case still fails hard. Form my experimentation the Winsock implementation has the two issues which I explained. First it drops all received but not yet retrieved data as soon as it receives a RST packet. And secondly it always sends a RST packet on every socket, that wasn't send-closed at process termination, regardless if there is any pending data. Sending data to a socket, that was already closed from the other side is only one way to trigger a RST packet, but closing a socket with l_linger=0 is another way and process termination is the third. They all can lead to data loss on the receiver side, presumably because of the RST flag. An alternative to closesocket() is shutdown(sock, SD_SEND). It doesn't free the socket resource, but leads to a graceful shutdown. However the FIN packet is send when the shutdown() or closesocket() function is called and that's still short before the process terminates. I did some more testing with different linger options, but it didn't change the behavior substantial. So I didn't find any way to close the socket with a FIN packet at the point in time of the process termination. The other way around would be to make sure on the client side, that the last message is retrieved before the RST packet arrives, so that no data is lost. This works mostly well through the sync API of libpq, but with the async API the trigger for data reception is outside of the scope of libpq, so that there's no way to ensure recv() is called quick enough, after the data was received but before RST arrives. On a local client+server combination there is only a gap of 0.5 milliseconds or so. I also didn't find a way to retrieve the enqueued data after RST arrived. Maybe there's a nasty hack to retrieve the data afterwards, but I didn't dig into assembly code and memory layout of Winsock internals. > In any case, our previous > results definitely show that issuing an explicit close() is no panacea. I don't fully understand the issue with closing the socket before process termination. Sure, it can be a valuable information that the corresponding backend process has definitely terminated. At least in the context of regression testing or so. But I think that loosing messages from the backend is way more critical than a non-sync process termination. Do I miss something? -- Regards, Lars Kanis
On Mon, Nov 22, 2021 at 8:19 AM Lars Kanis <lars@greiz-reinsdorf.de> wrote: > The other way around would be to make sure on the client side, that the > last message is retrieved before the RST packet arrives, so that no data > is lost. This works mostly well through the sync API of libpq, but with > the async API the trigger for data reception is outside of the scope of > libpq, so that there's no way to ensure recv() is called quick enough, > after the data was received but before RST arrives. On a local > client+server combination there is only a gap of 0.5 milliseconds or so. > I also didn't find a way to retrieve the enqueued data after RST > arrived. Maybe there's a nasty hack to retrieve the data afterwards, but > I didn't dig into assembly code and memory layout of Winsock internals. Hmm. Well, if I understand how this works (and I'm not too familiar with this Windows code so I maybe I don't), the postmaster duplicates the socket into the child process (see {write,read}_inheritable_socket()) and then closes its own handle (see ServerLoop()'s call to StreamClose(port->sock)). What if the postmaster kept the socket open, and then closed its copy after the child exits? Then, I guess, maybe, Winsock socket state would live on with a non-zero reference count and be able to perform the proper graceful TCP shutdown dance, at least as long as the postmaster itself is up. Various other ideas: don't do that, but duplicate the socket back into the postmaster before exit, or into some other process, or rewrite PostgreSQL to use threads...
On Mon, Nov 22, 2021 at 9:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Hmm. Well, if I understand how this works (and I'm not too familiar > with this Windows code so I maybe I don't), the postmaster duplicates > the socket into the child process (see > {write,read}_inheritable_socket()) and then closes its own handle (see > ServerLoop()'s call to StreamClose(port->sock)). What if the > postmaster kept the socket open, and then closed its copy after the > child exits? Then, I guess, maybe, Winsock socket state would live on > with a non-zero reference count and be able to perform the proper > graceful TCP shutdown dance, at least as long as the postmaster itself > is up. Various other ideas: don't do that, but duplicate the socket > back into the postmaster before exit, or into some other process, or > rewrite PostgreSQL to use threads... Hmm, maybe it's still not enough. Now that I have coffee, I thought about the well known failure of idle_in_transaction_timeout to report errors on Windows[1]. There'd be no RST on timeout with the above approach, which is good, but the next time you try to send a query, perhaps a race begins: the server's TCP stack receives the query packet and replies with RST (the "normal" kind that is a response to unreceivable data, not the linger=0 kind that is proactively sent), meanwhile the client begins to read, and *probably* reads the already buffered idle-in-transaction-timeout error message, but with unlucky scheduling the RST arrives first and drops the buffered data (unlike on Unix), right? [1] https://www.postgresql.org/message-id/CAP3o3PdzM0BLmNBELA5wV6YoN_1yYBVdoOvz9kYbOuK-YQGFAw%40mail.gmail.com
Thomas Munro <thomas.munro@gmail.com> writes: > Hmm. Well, if I understand how this works (and I'm not too familiar > with this Windows code so I maybe I don't), the postmaster duplicates > the socket into the child process (see > {write,read}_inheritable_socket()) and then closes its own handle (see > ServerLoop()'s call to StreamClose(port->sock)). What if the > postmaster kept the socket open, and then closed its copy after the > child exits? Ugh :-(. For starters, we risk running out of FDs in the postmaster, don't we? I did some tracing just now and convinced myself that socket_close is the first on_proc_exit callback registered in an ordinary backend, and therefore the last action done by proc_exit_prepare. The only things that happen after that are PROFILE_PID_DIR setup (not relevant in production builds), an elog(DEBUG) call, and any atexit callbacks that third-party code might have registered. If you're willing to avert your eyes from the question of what atexit callbacks might do, then it'd be okay to do closesocket in socket_close, reasoning that the backend has certainly disconnected itself from shmem and so on, and thus is effectively done even if it is still a live process so far as the kernel is concerned. So maybe Lars' proposed patch is acceptable after all. It feels a bit shaky, but when we're sitting atop a piece-of-junk TCP stack, we can't really have the guarantees we'd like. The main way in which it's shaky is that future rearrangements of the shutdown sequence, or additions of new on_proc_exit callbacks, could create a situation where socket_close is no longer the last interesting action. We could imagine doing something to make it less likely for that to happen accidentally, but I'm not sure it's worth the trouble. Essentially this is reverting 268313a95 of 2003-05-29. The commit message for that fails to cite any mailing-list discussion, but after some digging in the archives I think I did it in response to https://www.postgresql.org/message-id/flat/009c01c31ce9%24eeaf00f0%24fb02a8c0%40muskrat where the complaint was that a DB couldn't be dropped because a just-closed connection was still live so far as the server was concerned. We didn't do anything to make PQclose() synchronous, so the problem is really still there; but the idea was that other client libraries could make session-close synchronous if they wanted. For that purpose, being out of the ProcArray is really sufficient, and I think it's safe to suppose that socket_close must run after that. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > Hmm, maybe it's still not enough. Now that I have coffee, I thought > about the well known failure of idle_in_transaction_timeout to report > errors on Windows[1]. Yeah, I think that may well be a manifestation of the same problem: once the backend exits, Winsock issues RST which prevents the client from reading the queued data. We had been analyzing that under the assumption that Windows obeys the TCP RFCs ... but having now been disabused of that optimism, it seems to match up pretty well. It'd be useful to check if Lars' patch cures that symptom. regards, tom lane
On Mon, Nov 22, 2021 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Hmm, maybe it's still not enough. Now that I have coffee, I thought > > about the well known failure of idle_in_transaction_timeout to report > > errors on Windows[1]. > > Yeah, I think that may well be a manifestation of the same problem: > once the backend exits, Winsock issues RST which prevents the client > from reading the queued data. We had been analyzing that under the > assumption that Windows obeys the TCP RFCs ... but having now been > disabused of that optimism, it seems to match up pretty well. > It'd be useful to check if Lars' patch cures that symptom. Yeah, it sounds like it might solve at least the server-side problem. Let's call that weird behaviour #1: RST on process exit. (I wonder if my keep-the-socket-open-in-another-process thought experiment is theoretically better: a lingering socket should be capable of resending data that hasn't been ack'd yet in FIN-WAIT-1 state after close, which I suspect might not happen if the TCP stack nukes the socket. If close() avoids the proactive RST but still doesn't really follow the shutdown protocol then it's papering over a crack in the wall, but I'm not planning to argue about that...) IIUC we'd still have weird behaviour #2 on the client side: TCP stack drops buffered received data on the floor on receipt of RST. So yeah, it'd be interesting to know if by avoiding/hiding weird behaviour #1, idle_in_transaction_timeout works as desired most of the time by tilting the race in favour of eager clients and favourable scheduling. If a client sends a new query and then immediately begins to read the response, there's a good chance it'll be able to read the already-buffered error message before the query->RST ping pong... Which I now understand is exactly what Lars was explaining: that sync APIs (like the psql command shown in that other thread) might have a good chance of winning that race, but for async APIs, the author of the async API has no idea what its client is going to do.
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Nov 22, 2021 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It'd be useful to check if Lars' patch cures that symptom. > Yeah, it sounds like it might solve at least the server-side problem. > Let's call that weird behaviour #1: RST on process exit. (I wonder if > my keep-the-socket-open-in-another-process thought experiment is > theoretically better: a lingering socket should be capable of > resending data that hasn't been ack'd yet in FIN-WAIT-1 state after > close, which I suspect might not happen if the TCP stack nukes the > socket. If close() avoids the proactive RST but still doesn't really > follow the shutdown protocol then it's papering over a crack in the > wall, but I'm not planning to argue about that...) The language about "graceful shutdown" in the Windows docs at least suggests that they finish out the TCP connection cleanly; failing to retransmit at need would hardly qualify as "graceful". Of course, Redmond keeps finding ways to fail to meet reasonable expectations. > IIUC we'd still have weird behaviour #2 on the client side: TCP stack > drops buffered received data on the floor on receipt of RST. Do we know that that actually happens in an arm's-length connection (ie two separate machines)? I wonder if the data loss is strictly an artifact of a localhost connection. There'd be a lot more pressure on them to make cross-machine TCP work per spec, one would think. But in any case, if we can avoid sending RST in this situation, it seems mostly moot for our usage. regards, tom lane
Am 22.11.21 um 00:04 schrieb Tom Lane: > Do we know that that actually happens in an arm's-length connection > (ie two separate machines)? I wonder if the data loss is strictly > an artifact of a localhost connection. There'd be a lot more pressure > on them to make cross-machine TCP work per spec, one would think. > But in any case, if we can avoid sending RST in this situation, > it seems mostly moot for our usage. Sorry it took some days to get a setup to check this! The result is as expected: 1. Windows client to Linux server works without dropping the error message 2. Linux client to Windows server works without dropping the error message 3. Windows client to remote Windows server drops the error message, depending on the timing of the event loop In 1. the Linux server doesn't end the connection with a RST packet, so that the Windows client enqueues the error message properly and doesn't drop it. In 2. the Linux client doesn't care about the RST packet of the Windows server and properly enqueues and raises the error message. In 3. the combination of the bad RST behavior of client and server leads to data loss. It depends on the network timing. A delay of 0.5 ms in the event loop was enough in a localhost setup and as wall as in some LAN setup. On the contrary over some slower WLAN connection a delay of less than 15 ms did not loose data, but higher delays still did. The idea of running a second process, pass the socket handle to it, observe the parent process and close the socket when it exited, could work, but I guess it's overly complicated and creates more issues than it solves. Probably the same if the master process handles the socket closing. So I still think it's best to close the socket as proposed in the patch. -- Regards, Lars Kanis
Hello Lars, 27.11.2021 14:39, Lars Kanis wrote: > > So I still think it's best to close the socket as proposed in the patch. Please see also the previous discussion of the topic: https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > 27.11.2021 14:39, Lars Kanis wrote: >> So I still think it's best to close the socket as proposed in the patch. > Please see also the previous discussion of the topic: > https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org Hm, yeah, that discussion seems to have slipped through the cracks. Not sure why it didn't end up in pushing something. After re-reading that thread and re-studying relevant Windows documentation [1][2], I think the main open question is whether we need to issue shutdown() or not, and if so, whether to use SD_BOTH or just SD_SEND. I'm inclined to prefer not calling shutdown(), because [1] is self-contradictory as to whether it can block, and [2] is pretty explicit that it's not necessary. regards, tom lane [1] https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-shutdown [2] https://docs.microsoft.com/en-us/windows/win32/winsock/graceful-shutdown-linger-options-and-socket-closure-2
Hello Tom, 29.11.2021 22:16, Tom Lane wrote: > Hm, yeah, that discussion seems to have slipped through the cracks. > Not sure why it didn't end up in pushing something. > > After re-reading that thread and re-studying relevant Windows > documentation [1][2], I think the main open question is whether > we need to issue shutdown() or not, and if so, whether to use > SD_BOTH or just SD_SEND. I'm inclined to prefer not calling > shutdown(), because [1] is self-contradictory as to whether it > can block, and [2] is pretty explicit that it's not necessary. > > regards, tom lane > > [1] https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-shutdown > [2] https://docs.microsoft.com/en-us/windows/win32/winsock/graceful-shutdown-linger-options-and-socket-closure-2 I've tested the close-only patch with pg_sleep() in pqReadData(), and it works too. So I wonder how to understand "To assure that all data is sent and received on a connected socket before it is closed, an application should use shutdown to close connection before calling closesocket." in [1]. Maybe they mean that shutdown should be used before, but not after closesocket ). Or maybe the Windows' behaviour somehow evolved over time. (With the patch I cannot reproduce the FATAL message loss even on Windows 2012 R2.) So without a practical evidence of the importance of shutdown() I'm inclined to a more simple solution too. As to 268313a95, back in 2003 it was possible to compile server on Windows only using Cygwin (though you could compile libpq with Visual C, see [3]). So "#ifdef WIN32" that is proposed now, will not affect that scenario anyway. Best regards, Alexander [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=doc/src/sgml/install-win32.sgml;hb=268313a95
Alexander Lakhin <exclusion@gmail.com> writes: > 29.11.2021 22:16, Tom Lane wrote: >> After re-reading that thread and re-studying relevant Windows >> documentation [1][2], I think the main open question is whether >> we need to issue shutdown() or not, and if so, whether to use >> SD_BOTH or just SD_SEND. I'm inclined to prefer not calling >> shutdown(), because [1] is self-contradictory as to whether it >> can block, and [2] is pretty explicit that it's not necessary. > I've tested the close-only patch with pg_sleep() in pqReadData(), and it > works too. Thanks for testing! > So I wonder how to understand "To assure that all data is > sent and received on a connected socket before it is closed, an > application should use shutdown to close connection before calling > closesocket." in [1]. I suppose their documentation has evolved over time. This sentence probably predates their explicit acknowledgement in [2] that you don't have to call shutdown(). Maybe, once upon a time with very old versions of Winsock, you did have to do so if you wanted graceful close. I'll push the close-only change in a little bit. regards, tom lane
On 02.12.2021 22:31, Tom Lane wrote: > I'll push the close-only change in a little bit. Unexpectedly, this changes the error message: postgres=# set idle_session_timeout = '1s'; SET postgres=# select 1; could not receive data from server: Software caused connection abort (0x00002745/10053) The connection to the server was lost. Succeeded. postgres=# Without shutdown/closesocket it would most likely be: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. When the timeout expires, the server sends the error message and gracefully closes the connection by sending a FIN. Later, psql sends another query to the server, and the server responds with a RST. But now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054). Without shutdown/closesocket, after the timeout expires, the server sends the error message, the client sends an ACK, and the server responds with a RST. Then psql tries to sends the next query, but nothing is sent at the TCP level, and the next recv() returns WSAECONNRESET. IIUIC, in both cases we may or may not recv() the error message from the server depending on how fast the RST arrives from the server. Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read? -- Sergey Shinderuk https://postgrespro.com/
On 14.01.2022 13:01, Sergey Shinderuk wrote: > When the timeout expires, the server sends the error message and > gracefully closes the connection by sending a FIN. Later, psql sends > another query to the server, and the server responds with a RST. But > now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054). On the other hand, I cannot reproduce this behavior with a remote server even if pause psql just before the recv() call to let the RST win the race. So I get: postgres=# set idle_session_timeout = '1s'; recv() returned 15 errno 0 SET recv() returned -1 errno 10035 (WSAEWOULDBLOCK) postgres=# select 1; recv() returned 116 errno 0 recv() returned 0 errno 0 recv() returned 0 errno 0 FATAL: terminating connection due to idle-session timeout server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. recv() signals EOF like on Unix. Here I connected from a Windows virtual machine to the macOS host, but the Wireshark dump looks the same (there is a RST) as for a localhost connection. Is this "error-eating" behavior of RST on Windows specific only to localhost connections? -- Sergey Shinderuk https://postgrespro.com/
On 14.01.2022 13:01, Sergey Shinderuk wrote: > Unexpectedly, this changes the error message: > > postgres=# set idle_session_timeout = '1s'; > SET > postgres=# select 1; > could not receive data from server: Software caused connection > abort (0x00002745/10053) For the record, after more poking I realized that it depends on timing. By injecting delays I can get any of the following from libpq: * could not receive data from server: Software caused connection abort * server closed the connection unexpectedly * no connection to the server > Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read? So this doesn't make sense anymore. Sorry for the noise. -- Sergey Shinderuk https://postgrespro.com/
Sergey Shinderuk <s.shinderuk@postgrespro.ru> writes: > On 14.01.2022 13:01, Sergey Shinderuk wrote: >> Unexpectedly, this changes the error message: > ... > For the record, after more poking I realized that it depends on timing. > By injecting delays I can get any of the following from libpq: > * could not receive data from server: Software caused connection abort > * server closed the connection unexpectedly > * no connection to the server Thanks for the follow-up. At the moment I'm not planning to do anything pending the results of the other thread [1]. It seems likely though that we'll end up reverting this explicit-close behavior in the back branches, as the other changes involved look too invasive for back-patching. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BOeoETZQ%3DQw5Ub5h3tmwQhBmDA%3DnuNO3KG%3DzWfUypFAw%40mail.gmail.com