Обсуждение: closesocket behavior in different platforms

Поиск
Список
Период
Сортировка

closesocket behavior in different platforms

От
vignesh C
Дата:
Hi,

In few scenarios the message displayed in psql console is not consistent in windows and linux. The execution results from few scenarios in windows and linux is listed below:

In CentOS
========================================================
After transaction idle timeout
postgres=# SET idle_in_transaction_session_timeout=300;
SET
postgres=# BEGIN;
BEGIN
postgres=# SELECT * FROM pg_class;
FATAL:  terminating connection due to idle-in-transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

After pg_terminate_backend from another session:
postgres=# select * from dual;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Similarly in pg_ctl kill TERM and drop database with (force).

In Windows
========================================================
After transaction idle timeout
postgres=# set idle_in_transaction_session_timeout=300;
SET
postgres=# begin;
BEGIN
postgres=# select * from dual
postgres-# ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

After pg_terminate_backend from another session:
postgres=# select * from dual;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Similarly in pg_ctl kill TERM and drop database with (force).
There may be some more scenarios which I'm missing.

It is noticed that in all the 4 cases the message "FATAL:  terminating connection due to administrator command" does not appear in windows.

However the following message is present in the server log file:
FATAL:  terminating connection due to administrator command

The reason for this looks like:
When the server closes a connection, it sends the ErrorResponse packet, and then closes the socket and terminates the backend process. If the packet is received before the server closes the connection, the error message is received in both windows and linux. If the packet is not received before the server closes the connection, the error message is not received in case of windows where as in linux it is received.

There have been a couple of discussion earlier also on this [1] & [2], but we could not find any alternate solution.

One of the options that msdn suggests in [3] is to use SO_LINGER option, we had tried this option with no luck in solving. One other thing that we had tried was to sleep for 1 second before closing the socket, this solution works if the client is active, whereas in case of inactive clients it does not solves the problem. One other thought that we had was to simultaneously check the connection from psql, when we are waiting for query input in gets_interactive function or have a separate thread to check the connection status periodically, this might work only in case of psql but will not work for application which uses libpq. Amit had also suggested one solution in [4], where he proposed 'I have also tried calling closesocket() explicitly in our function socket_close which has changed the error message to "could not receive data from server: Software caused connection abort (0x00002745/10053)".'
Or
Should we add some documentation for the above behavior.

Thoughts?

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJowQypXSKsjws9A%2BnEQDD0-mExHZqFXtJ09N209rCO5A%40mail.gmail.com#0629f079bc59ecdaa0d6ac9f8f2c18ac
[2] https://www.postgresql.org/message-id/87k1iy44fd.fsf@news-spur.riddles.org.uk
[3] https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket
[4] https://www.postgresql.org/message-id/CAA4eK1%2BGNyjaPK77y%2Beuh5eAgM75pncG1JYZhxYZF%2BSgS6NpjA%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: closesocket behavior in different platforms

От
Amit Kapila
Дата:
On Fri, Dec 6, 2019 at 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
>
> It is noticed that in all the 4 cases the message "FATAL:  terminating connection due to administrator command" does
notappear in windows. 
>
> However the following message is present in the server log file:
> FATAL:  terminating connection due to administrator command
>
> The reason for this looks like:
> When the server closes a connection, it sends the ErrorResponse packet, and then closes the socket and terminates the
backendprocess. If the packet is received before the server closes the connection, the error message is received in
bothwindows and linux. If the packet is not received before the server closes the connection, the error message is not
receivedin case of windows where as in linux it is received. 
>
> There have been a couple of discussion earlier also on this [1] & [2], but we could not find any alternate solution.
>
> One of the options that msdn suggests in [3] is to use SO_LINGER option, we had tried this option with no luck in
solving.One other thing that we had tried was to sleep for 1 second before closing the socket, this solution works if
theclient is active, whereas in case of inactive clients it does not solves the problem. One other thought that we had
wasto simultaneously check the connection from psql, when we are waiting for query input in gets_interactive function
orhave a separate thread to check the connection status periodically, this might work only in case of psql but will not
workfor application which uses libpq. Amit had also suggested one solution in [4], where he proposed 'I have also tried
callingclosesocket() explicitly in our function socket_close which has changed the error message to "could not receive
datafrom server: Software caused connection abort (0x00002745/10053)".' 
>

Based on previous investigation and information in this email, I don't
see anything we can do about this.

> Should we add some documentation for the above behavior.
>

That sounds reasonable to me.  Any proposal for the same?   One idea
could be to add something like "Client Disconnection Problems" after
the "Client Connection Problems" section in docs [1].

Anybody else has any better suggestions on this topic?


[1] - https://www.postgresql.org/docs/devel/server-start.html

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: closesocket behavior in different platforms

От
vignesh C
Дата:
On Tue, Jan 21, 2020 at 11:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 6, 2019 at 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > It is noticed that in all the 4 cases the message "FATAL:  terminating connection due to administrator command"
doesnot appear in windows. 
> >
> > However the following message is present in the server log file:
> > FATAL:  terminating connection due to administrator command
> >
> > The reason for this looks like:
> > When the server closes a connection, it sends the ErrorResponse packet, and then closes the socket and terminates
thebackend process. If the packet is received before the server closes the connection, the error message is received in
bothwindows and linux. If the packet is not received before the server closes the connection, the error message is not
receivedin case of windows where as in linux it is received. 
> >
> > There have been a couple of discussion earlier also on this [1] & [2], but we could not find any alternate
solution.
> >
> > One of the options that msdn suggests in [3] is to use SO_LINGER option, we had tried this option with no luck in
solving.One other thing that we had tried was to sleep for 1 second before closing the socket, this solution works if
theclient is active, whereas in case of inactive clients it does not solves the problem. One other thought that we had
wasto simultaneously check the connection from psql, when we are waiting for query input in gets_interactive function
orhave a separate thread to check the connection status periodically, this might work only in case of psql but will not
workfor application which uses libpq. Amit had also suggested one solution in [4], where he proposed 'I have also tried
callingclosesocket() explicitly in our function socket_close which has changed the error message to "could not receive
datafrom server: Software caused connection abort (0x00002745/10053)".' 
> >
>
> Based on previous investigation and information in this email, I don't
> see anything we can do about this.
>
> > Should we add some documentation for the above behavior.
> >
>
> That sounds reasonable to me.  Any proposal for the same?   One idea
> could be to add something like "Client Disconnection Problems" after
> the "Client Connection Problems" section in docs [1].
>

Thanks for your review and suggestion. I have made a patch based on
similar lines. Attached patch has the doc update with the explanation.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: closesocket behavior in different platforms

От
amul sul
Дата:



On Wed, Jan 29, 2020 at 4:34 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Jan 21, 2020 at 11:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 6, 2019 at 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
> >
[...] 

Thanks for your review and suggestion. I have made a patch based on
similar lines. Attached patch has the doc update with the explanation.
Thoughts?
 
Hi Vignesh,

I have looked into the patch, realised that some format tagging and
the grammar changes are needed. Commented inline below:

+
+    <para>
+     You will get server closed the connection unexpectedly message while

The error message is usually wrapped inside the <computeroutput> tag.

+     trying to execute sql command on disconnected connection. The message is

Also, s/disconnected connection/the disconnected connection

+     slightly different in windows and non-windows. In non-windows, you will

s/in windows/on <systemitem class="osname">Windows</systemitem>
s/In non-windows/On non-windows

+     see a FATAL message before the error message:

How about : On non-window you'll see a fatal error as below.

+<screen>
+FATAL:  terminating connection due to idle-in-transaction timeout
+server closed the connection unexpectedly
+        This probably means the server terminated abnormally
+        before or while processing the request.
+The connection to the server was lost. Attempting reset: Succeeded.
+</screen>
+     In windows, you might not see the FATAL message:

s/In windows /On <systemitem class="osname">Windows</systemitem>
s/FATAL message/fatal error

+<screen>
+server closed the connection unexpectedly
+        This probably means the server terminated abnormally
+        before or while processing the request.
+The connection to the server was lost. Attempting reset: Succeeded.
+</screen>
+     This message "FATAL:  terminating connection due to idle-in-transaction

Usually <quote> for doubt-quoting is used. Here think, we should remove FATAL
and wrap the error message text inside <computeroutput> tag.

+     timeout" that is sent from server will not be displayed in windows,

How about : that is sent from the server will not be displayed on windows.

+     however it will be present in the log file. The reason for this is, in

s/however/However
s/in/on

+     windows the client cannot receive the message sent by the server when the

s/windows/Windows or <systemitem class="osname">Windows</systemitem>

+     server has closed the client connection. This behavior can be noticed when
+     the client connection has been disconnected because of
+     idle_in_transaction_session_timeout, pg_terminate_backend, pg_ctl kill
+     TERM and drop database with (force).

s/idle_in_transaction_session_timeout/<xref linkend="guc-idle-in-transaction-session-timeout"/>
s/pg_terminate_backend/<function>pg_terminate_backend()</function>
s/pg_ctl kill TERM/<command>DROP DATABASE ... WITH ( FORCE )</command>

Regards,
Amul 

Re: closesocket behavior in different platforms

От
Robert Haas
Дата:
On Wed, Jan 29, 2020 at 6:04 AM vignesh C <vignesh21@gmail.com> wrote:
> Thanks for your review and suggestion. I have made a patch based on
> similar lines. Attached patch has the doc update with the explanation.
> Thoughts?

Documenting this doesn't seem very useful to me. If we could fix the
code, that would be useful, but otherwise I think I'd just do nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: closesocket behavior in different platforms

От
Amit Kapila
Дата:
On Wed, Jan 29, 2020 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 6:04 AM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for your review and suggestion. I have made a patch based on
> > similar lines. Attached patch has the doc update with the explanation.
> > Thoughts?
>
> Documenting this doesn't seem very useful to me.
>

I thought of documenting it because this has been reported/discussed
multiple times (see some of the links of discussions at the end of the
first email) and every time we need to spend time explaining the same
thing.  However, if we decide not to do that I am fine with it.

> If we could fix the
> code, that would be useful, but otherwise I think I'd just do nothing.
>

Yeah, that is our first choice as well, but there doesn't seem to be a
good solution to it as this is a platform-specific behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: closesocket behavior in different platforms

От
Amit Kapila
Дата:
On Thu, Jan 30, 2020 at 9:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Jan 29, 2020 at 6:04 AM vignesh C <vignesh21@gmail.com> wrote:
> > > Thanks for your review and suggestion. I have made a patch based on
> > > similar lines. Attached patch has the doc update with the explanation.
> > > Thoughts?
> >
> > Documenting this doesn't seem very useful to me.
> >
>
> I thought of documenting it because this has been reported/discussed
> multiple times (see some of the links of discussions at the end of the
> first email) and every time we need to spend time explaining the same
> thing.  However, if we decide not to do that I am fine with it.
>

Does anybody else have any opinion on whether it makes sense to
document this behavior?  To summarize for others, the behavior
difference as noted by Vignesh in his patch is:

+
+    <para>
+     You will get server closed the connection unexpectedly message while
+     trying to execute sql command on disconnected connection. The message is
+     slightly different in windows and non-windows. In non-windows, you will
+     see a FATAL message before the error message:
+<screen>
+FATAL:  terminating connection due to idle-in-transaction timeout
+server closed the connection unexpectedly
+        This probably means the server terminated abnormally
+        before or while processing the request.
+The connection to the server was lost. Attempting reset: Succeeded.
+</screen>
+     In windows, you might not see the FATAL message:
+<screen>
+server closed the connection unexpectedly
+        This probably means the server terminated abnormally
+        before or while processing the request.
+The connection to the server was lost. Attempting reset: Succeeded.

We have spent a decent amount of time on this and it is due to windows
API behaving differently.  By documenting, we might avoid the future
effort of explaining this to users.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com