Обсуждение: Keepalive

Поиск
Список
Период
Сортировка

Keepalive

От
Rui DeSousa
Дата:
Hi All,

I have a very long running query that is not being terminated after a keep alive timeout event.  The situation is that
theclient drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to
runwithout a network connection.   

The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my
experience. My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also
terminatedonce the tcp/ip connection is dropped by the kernel. 

Does anyone know if there is a difference on how Linux handles interrupted vs FreeBSD?  I’ve actually used tcpdrop on
FreeBSDto terminate stubborn sessions that were not responding to pg_terminate_backend(). 

Is this really expected behavior on Linux?

-Rui.


Re: Keepalive

От
Tom Lane
Дата:
Rui DeSousa <rui.desousa@icloud.com> writes:
> I have a very long running query that is not being terminated after a keep alive timeout event.  The situation is
thatthe client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues
torun without a network connection.   

> The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been
myexperience.  My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also
terminatedonce the tcp/ip connection is dropped by the kernel. 

Really?

I would expect the query to keep running until the backend tries to
perform some I/O to the client.  How quickly that happens would depend
a great deal on the details of the query, but not on which OS you're
running on.

            regards, tom lane



Re: Keepalive

От
Laurenz Albe
Дата:
On Fri, 2024-06-14 at 11:22 -0400, Rui DeSousa wrote:
> I have a very long running query that is not being terminated after a keep alive timeout event.
> The situation is that the client drops from the network, the servers’ tcp/ip stack drops the
> connection, and the Postgres query continues to run without a network connection.
>
> The given system is running on Linux and I’m being told this is expected behavior; however,
> that is not has not been my experience.  My preferred platform to run Postgres on is FreeBSD
> and in cases like this the Postgres session is also terminated once the tcp/ip connection is
> dropped by the kernel.

That would surprise me.

There is the parameter "client_connection_check_interval" exactly for that.

Yours,
Laurenz Albe



Re: Keepalive

От
Rui DeSousa
Дата:


On Jun 14, 2024, at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Rui DeSousa <rui.desousa@icloud.com> writes:
I have a very long running query that is not being terminated after a keep alive timeout event.  The situation is that the client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to run without a network connection.  

The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my experience.  My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminated once the tcp/ip connection is dropped by the kernel.

Really?

I would expect the query to keep running until the backend tries to
perform some I/O to the client.  How quickly that happens would depend
a great deal on the details of the query, but not on which OS you're
running on.

regards, tom lane


I just tried the following spinner() function on FreeBSD and the keep alive timeout event cause both the network connection to be torn down along with the Postgres session -- as I would expect it to do.  I will try this exact function on the Linux system and see if I get different results and report back; however, I might not be able to test it out until next week.


create or replace function spinner()
returns void
as $$
declare
 _x bigint := 0;
begin
   loop
     _x := _x + 1;
   end loop;
end;
$$ language plpgsql
;



Re: Keepalive

От
Rui DeSousa
Дата:

On Jun 14, 2024, at 3:54 PM, Rui DeSousa <rui.desousa@icloud.com> wrote:



On Jun 14, 2024, at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Rui DeSousa <rui.desousa@icloud.com> writes:
I have a very long running query that is not being terminated after a keep alive timeout event.  The situation is that the client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to run without a network connection.  

The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my experience.  My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminated once the tcp/ip connection is dropped by the kernel.

Really?

I would expect the query to keep running until the backend tries to
perform some I/O to the client.  How quickly that happens would depend
a great deal on the details of the query, but not on which OS you're
running on.

regards, tom lane


I just tried the following spinner() function on FreeBSD and the keep alive timeout event cause both the network connection to be torn down along with the Postgres session -- as I would expect it to do.  I will try this exact function on the Linux system and see if I get different results and report back; however, I might not be able to test it out until next week.


create or replace function spinner()
returns void
as $$
declare
 _x bigint := 0;
begin
   loop
     _x := _x + 1;
   end loop;
end;
$$ language plpgsql
;




Actually, I just tested on it the Linux system.  The keep alive event occurred, the kernel state of the connection went to CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is still running with no network connection in the kernel table.

So, keep alive does behave differently between FreeBSD and Linux.  I really do prefer FreeBSD for many reasons.

-Rui.

Re: Keepalive

От
Rui DeSousa
Дата:

> On Jun 14, 2024, at 1:47 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Fri, 2024-06-14 at 11:22 -0400, Rui DeSousa wrote:
>> I have a very long running query that is not being terminated after a keep alive timeout event.
>> The situation is that the client drops from the network, the servers’ tcp/ip stack drops the
>> connection, and the Postgres query continues to run without a network connection.
>>
>> The given system is running on Linux and I’m being told this is expected behavior; however,
>> that is not has not been my experience.  My preferred platform to run Postgres on is FreeBSD
>> and in cases like this the Postgres session is also terminated once the tcp/ip connection is
>> dropped by the kernel.
>
> That would surprise me.
>
> There is the parameter "client_connection_check_interval" exactly for that.
>
> Yours,
> Laurenz Albe


I retested the spinner() function on Linux with the client_connection_check_interval set and it now terminates the
spinner()function.   

Thanks!
Rui.


Re: Keepalive

От
Tom Lane
Дата:
Rui DeSousa <rui.desousa@icloud.com> writes:
> Actually, I just tested on it the Linux system.  The keep alive event occurred, the kernel state of the connection
wentto CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is
stillrunning with no network connection in the kernel table. 

> So, keep alive does behave differently between FreeBSD and Linux.  I really do prefer FreeBSD for many reasons.

The behavior you report for Linux is what I'd expect anywhere.
I tried to replicate your results on a freshly-updated FreeBSD 14.1
installation, and could not.  With a purely stock Postgres
configuration, I see the "spinner" query running indefinitely after
the client is killed --- although the kernel does show the server
process's client connection being in CLOSE_WAIT state.  But if I set
client_connection_check_interval to a positive value then the query
kills itself at the next multiple of that time, again as expected.

So I think there is something non-default about your FreeBSD system.
Maybe you'd previously configured it with nonzero
client_connection_check_interval, and then forgot about that?

The alternative is to suppose that that kernel will kill processes
as soon as they have a connection in CLOSE_WAIT state, which would be
quite evil for many purposes and is certainly not a "preferable"
behavior.

            regards, tom lane



Re: Keepalive

От
Rui DeSousa
Дата:


On Jun 15, 2024, at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Rui DeSousa <rui.desousa@icloud.com> writes:
Actually, I just tested on it the Linux system.  The keep alive event occurred, the kernel state of the connection went to CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is still running with no network connection in the kernel table.

So, keep alive does behave differently between FreeBSD and Linux.  I really do prefer FreeBSD for many reasons.

The behavior you report for Linux is what I'd expect anywhere.
I tried to replicate your results on a freshly-updated FreeBSD 14.1
installation, and could not.  With a purely stock Postgres
configuration, I see the "spinner" query running indefinitely after
the client is killed --- although the kernel does show the server
process's client connection being in CLOSE_WAIT state.  But if I set
client_connection_check_interval to a positive value then the query
kills itself at the next multiple of that time, again as expected.

So I think there is something non-default about your FreeBSD system.
Maybe you'd previously configured it with nonzero
client_connection_check_interval, and then forgot about that?

The alternative is to suppose that that kernel will kill processes
as soon as they have a connection in CLOSE_WAIT state, which would be
quite evil for many purposes and is certainly not a "preferable"
behavior.

regards, tom lane



Yes, I see the same behavior.  So trying to figure out why my first test was flawed and I determine Murphy's law is in play.  I had an appointment, so I kicked off the query, disconnect the client, when to my appointment, came back and the query was gone.  What I didn’t expect was to lose power for few minutes while I was out.  I just looked at last command and it reported the server had crashed and rebooted.  Hmm.. not knowing why, I also checked the switch it’s connected to and it too rebooted at the same time; so it’s safe to say the system crash do to a power outage.  Neither of those are plugged into a UPS; although my firewall is.

I did setup a quick cron job to output the netstat for the connection every minute and didn’t see the four minute gap when I looked at… 


.
.
.
Fri Jun 14 14:02:00 EDT 2024
tcp4       0      0 10.6.3.10.5432         10.6.3.44.51478        ESTABLISHED
Fri Jun 14 14:03:00 EDT 2024
tcp4       0      0 10.6.3.10.5432         10.6.3.44.51478        ESTABLISHED
Fri Jun 14 14:04:00 EDT 2024
tcp4       0      0 10.6.3.10.5432         10.6.3.44.51478        ESTABLISHED
Fri Jun 14 14:08:00 EDT 2024
Fri Jun 14 14:09:00 EDT 2024
Fri Jun 14 14:10:00 EDT 2024
Fri Jun 14 14:11:00 EDT 2024
.
.
.