Обсуждение: keep alive and running query

Поиск
Список
Период
Сортировка

keep alive and running query

От
desmodemone
Дата:
Hello,
           I am testing keep alive on PostgreSQL 9.2 and  9.3 on Centos 6.X .
As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's not a bug] :
When a connection it's in idle state or in idle in transaction, if the connection with client it's broken  for a number of keep alive, the backend will be terminated.

By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the time of keep alive,  in a DWH enviroment could be a problem.
Imagine your application server, where there is an ETL, will go down for 1 minute and your transactions are still running on the DWH database, that transactions could run for hours before the
keep alive will terminate them, because they are in transaction state and not idle or idle in transaction.

Are some tricks about ? solutions  ?

Kind Regards

Mat

Re: keep alive and running query

От
Albe Laurenz
Дата:
desmodemone wrote:
> As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's
> not a bug] :
> 
> When a connection it's in idle state or in idle in transaction, if the connection with client it's
> broken  for a number of keep alive, the backend will be terminated.
> 
> 
> By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the
> time of keep alive,  in a DWH enviroment could be a problem.
> 
> Imagine your application server, where there is an ETL, will go down for 1 minute and your
> transactions are still running on the DWH database, that transactions could run for hours before the
> 
> keep alive will terminate them, because they are in transaction state and not idle or idle in
> transaction.

TCP keepalive will also terminate a session that is currently
stuck in a long running SQL query if the client end dies.

I think that your problem is that you mix up different meanings of "idle".

In PostgreSQL, a connection is idle (or idle in transaction) if processing
of the last command is finished and the server is waiting for the next
command from the client.

In TCP, a connection is idle if there is no network traffic.

Yours,
Laurenz Albe

Re: keep alive and running query

От
desmodemone
Дата:



2013/12/16 Albe Laurenz <laurenz.albe@wien.gv.at>
desmodemone wrote:
> As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's
> not a bug] :
>
> When a connection it's in idle state or in idle in transaction, if the connection with client it's
> broken  for a number of keep alive, the backend will be terminated.
>
>
> By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the
> time of keep alive,  in a DWH enviroment could be a problem.
>
> Imagine your application server, where there is an ETL, will go down for 1 minute and your
> transactions are still running on the DWH database, that transactions could run for hours before the
>
> keep alive will terminate them, because they are in transaction state and not idle or idle in
> transaction.

TCP keepalive will also terminate a session that is currently
stuck in a long running SQL query if the client end dies.

I think that your problem is that you mix up different meanings of "idle".

In PostgreSQL, a connection is idle (or idle in transaction) if processing
of the last command is finished and the server is waiting for the next
command from the client.

In TCP, a connection is idle if there is no network traffic.

Yours,
Laurenz Albe


Hello  Laurenz,
                         so I have a strange behavior on some test servers.
I am using Centos 6.4 and I set up the tcp keep alive kernel parameter very low to see the effects [the postgresql.conf parameter have 0 value so use the OS value] :

net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 1

then I run a long query from a remote client and I kill that "psql client" . By the way, even even if after long time, the query remains alive,
until it finishes and it returns error because could not return the row to the client.

If I do the same with an update, it's the same except become idle in transaction and only after then it's killed .

So, if " TCP keepalive will also terminate a session that is currently stuck in a long running SQL query if the client end dies."  ,
what is  wrong on the setup ? Could someone try the tcp keep alive or explain why is not working as expected  ?

Have a nice day

Mat

Re: keep alive and running query

От
Albe Laurenz
Дата:
desmodemone wrote:
> 2013/12/16 Albe Laurenz <laurenz.albe@wien.gv.at>
>> desmodemone wrote:
>> As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's
>> not a bug] :
>>
>> When a connection it's in idle state or in idle in transaction, if the connection with client it's
>> broken  for a number of keep alive, the backend will be terminated.
>>
>>
>> By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the
>> time of keep alive,  in a DWH enviroment could be a problem.
>>
>> Imagine your application server, where there is an ETL, will go down for 1 minute and your
>> transactions are still running on the DWH database, that transactions could run for hours before the
>>
>> keep alive will terminate them, because they are in transaction state and not idle or idle in
>> transaction.

> TCP keepalive will also terminate a session that is currently
> stuck in a long running SQL query if the client end dies.
> 
> I think that your problem is that you mix up different meanings of "idle".
> 
> In PostgreSQL, a connection is idle (or idle in transaction) if processing
> of the last command is finished and the server is waiting for the next
> command from the client.
> 
> In TCP, a connection is idle if there is no network traffic.

>                          so I have a strange behavior on some test servers.
> 
> I am using Centos 6.4 and I set up the tcp keep alive kernel parameter very low to see the effects
> [the postgresql.conf parameter have 0 value so use the OS value] :
> 
> net.ipv4.tcp_keepalive_intvl = 2
> net.ipv4.tcp_keepalive_probes = 2
> net.ipv4.tcp_keepalive_time = 1
> 
> 
> then I run a long query from a remote client and I kill that "psql client" . By the way, even even if
> after long time, the query remains alive,
> 
> until it finishes and it returns error because could not return the row to the client.
> 
> 
> If I do the same with an update, it's the same except become idle in transaction and only after then
> it's killed .
> 
> 
> So, if " TCP keepalive will also terminate a session that is currently stuck in a long running SQL
> query if the client end dies."  ,
> what is  wrong on the setup ? Could someone try the tcp keep alive or explain why is not working as
> expected  ?

That's as expected; the problem is that I was imprecise and misleading.
Not only does "idle" stand for two things, but also "connection".

While TCP keepalive will terminate a TCP connection after a while
if it detects that the remote end does no longer respond, that does
not imply that the PostgreSQL session is immediately terminated.
That only happens when PostgreSQL tries to read or write on the socket
belonging to the dead TCP connection.

So a long running query will happily continue until it is ready to
write to the socket and then detect that the socket is no longer there.
In the case of an idle session, the server keeps trying to read from
the socket and will notice when it goes away.

So in effect setting low keepalive limits will only make TCP connections
get removed quickly, which will have an effect on idle database sessions,
but not on sessions where the server does not try to communicate with
the client.

Yours,
Laurenz Albe

Re: keep alive and running query

От
desmodemone
Дата:



2013/12/16 Albe Laurenz <laurenz.albe@wien.gv.at>
desmodemone wrote:
> 2013/12/16 Albe Laurenz <laurenz.albe@wien.gv.at>
>> desmodemone wrote:
>> As I tested and saw until now, the keep alive functions as follow [if I understand correctly and it's
>> not a bug] :
>>
>> When a connection it's in idle state or in idle in transaction, if the connection with client it's
>> broken  for a number of keep alive, the backend will be terminated.
>>
>>
>> By the way if this could be ok in an OLTP enviroment, because the average time of a query is << the
>> time of keep alive,  in a DWH enviroment could be a problem.
>>
>> Imagine your application server, where there is an ETL, will go down for 1 minute and your
>> transactions are still running on the DWH database, that transactions could run for hours before the
>>
>> keep alive will terminate them, because they are in transaction state and not idle or idle in
>> transaction.

> TCP keepalive will also terminate a session that is currently
> stuck in a long running SQL query if the client end dies.
>
> I think that your problem is that you mix up different meanings of "idle".
>
> In PostgreSQL, a connection is idle (or idle in transaction) if processing
> of the last command is finished and the server is waiting for the next
> command from the client.
>
> In TCP, a connection is idle if there is no network traffic.

>                          so I have a strange behavior on some test servers.
>
> I am using Centos 6.4 and I set up the tcp keep alive kernel parameter very low to see the effects
> [the postgresql.conf parameter have 0 value so use the OS value] :
>
> net.ipv4.tcp_keepalive_intvl = 2
> net.ipv4.tcp_keepalive_probes = 2
> net.ipv4.tcp_keepalive_time = 1
>
>
> then I run a long query from a remote client and I kill that "psql client" . By the way, even even if
> after long time, the query remains alive,
>
> until it finishes and it returns error because could not return the row to the client.
>
>
> If I do the same with an update, it's the same except become idle in transaction and only after then
> it's killed .
>
>
> So, if " TCP keepalive will also terminate a session that is currently stuck in a long running SQL
> query if the client end dies."  ,
> what is  wrong on the setup ? Could someone try the tcp keep alive or explain why is not working as
> expected  ?

That's as expected; the problem is that I was imprecise and misleading.
Not only does "idle" stand for two things, but also "connection".

While TCP keepalive will terminate a TCP connection after a while
if it detects that the remote end does no longer respond, that does
not imply that the PostgreSQL session is immediately terminated.
That only happens when PostgreSQL tries to read or write on the socket
belonging to the dead TCP connection.

So a long running query will happily continue until it is ready to
write to the socket and then detect that the socket is no longer there.
In the case of an idle session, the server keeps trying to read from
the socket and will notice when it goes away.

So in effect setting low keepalive limits will only make TCP connections
get removed quickly, which will have an effect on idle database sessions,
but not on sessions where the server does not try to communicate with
the client.

Yours,
Laurenz Albe

Hi Laurenz,
                          thank you for your answer, I suspect how so and now it's more clear now how it's working.
By the way, the problem, as I show , it's for DWH or similar environment where it's usually to have a long running query (with sort / group by / hash , so before the first rows
is returned, it needs time) and in those cases , if the client will die, the backend will run for a lot of time before the backend will try to write to the socket.
So it's possible to have a heavy load in a database server with a lot of death connections where the backends are still working as described.
I think it's not a minor problem, no?


Kind Regards

Mat

Re: keep alive and running query

От
Albe Laurenz
Дата:
desmodemone wrote:
> By the way, the problem, as I show , it's for DWH or similar environment where it's usually to have a
> long running query (with sort / group by / hash , so before the first rows
> is returned, it needs time) and in those cases , if the client will die, the backend will run for a
> lot of time before the backend will try to write to the socket.
> 
> So it's possible to have a heavy load in a database server with a lot of death connections where the
> backends are still working as described.
> 
> I think it's not a minor problem, no?

No, it isn't.

But if that happens on a routine basis, I would argue that the client
program is at fault.  It should cancel the query and close the session
when the user interrupts or closes it.
Admitted, it cannot do that if the client machine crashes, the network
connection dies or the user kills the program with SIGKILL, but I'd say
that this should not happen on a regular basis.

I don't know how easy it would be to add code for the server to check
the state of the network sockets regularly.

Yours,
Laurenz Albe