Обсуждение: Statement timeout not working on broken connections with active queries

Поиск
Список
Период
Сортировка

Statement timeout not working on broken connections with active queries

От
"Brendan O'Shea"
Дата:
We have discovered a situation where the statement_timeout is not honored for broken connections.  If a connection is in the process of returning results to the client and the connection is severed (for example, network cable on client is unplugged) then the query continues to run on the server even after the statement_timeout is exceeded.  The connection will eventually close on its own after about 18-19 minutes and the following log lines will be generated in the postgresql log file:
 
2006-12-12 04:03:22 LOG:  could not send data to client: No route to host
2006-12-12 04:03:22 ERROR:  canceling statement due to statement timeout
2006-12-12 04:03:22 LOG:  could not send data to client: Broken pipe
2006-12-12 04:03:22 LOG:  unexpected EOF on client connection
 
Our server setup is:
Linux 2.4
Postgresql 8.1.4
 
Our client setup is:
Windows XP
Java 1.5
postgresql-8.1.jdbc2ee.jar
 
This behavior appears to be a bug with the statement_timeout.  I'd like to know if there is a way to get the connection to close once the statement_timeout is exceeded even if the connection to the client has been severed.  I'd also like to know what is causing the connection to close on its own after 18-19 minutes and if this can be adjusted.  Any help here would be greatly appreciated.
 
I tried adjusting the "tcp_keepalives_idle" setting and related settings but this had no affect on the time it took for the connection to close on its own.
 
I have also tried cancelling the active query via a call to "select pg_cancel_backend(pid)", but this has no affect.  I then tried killing the connection by running the command "./pg_ctl kill TERM pid", but this also has no affect (I realize 'kill TERM' isn't considered safe yet, I see it's still on the pg todo list).  The connection can be killed with a QUIT signal, but this is not recommended because it causes the database to restart in an unclean way.  I'd prefer that the statement_timeout setting simply cancelled the query and the connection was closed without any manual intervention, but does anyone know of a way to manually kill or cancel connections of this sort in a clean manner?
 
You can duplicate the problem with other clients besides java.  For example, you can use PG Admin III following these steps:
 
1) execute "set statement_timeout = 15000"
2) run a query that will return a large number of rows that will take more than 15 seconds to retrieve
3) a few seconds after you execute the query unplug your network cable
4) wait about 10 seconds
5) plug your network cable back in
6) query the pg_stat_activity view and you will see a non idle connection running your query
 

Below is the Java code used to duplicate the error.  You need to sever your network connection once you see the output "set statement_timeout = ...".
 
Thanks, Brendan
 
 

import java.sql.*;
 
public class TestStatementTimeout {
 
    private static final String URL = "jdbc:postgresql://hostname/db_name";
    private static final String DB_USER = "user";
    private static final String DB_PASSWORD = "password";
    private static final int STMT_TIMEOUT = 15 * 1000;
 
    public static void main(String[] args) throws Exception {
        String sql = "SELECT * FROM table_with_many_rows"; 
        try {
     System.out.println("Connecting to " + URL);
          Class.forName("org.postgresql.Driver");
          Connection conn = java.sql.DriverManager.getConnection(URL, DB_USER, DB_PASSWORD);
  
          Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE,java.sql.ResultSet.CONCUR_READ_ONLY);
          stmt.execute("set statement_timeout = " + STMT_TIMEOUT);
          System.out.println("set statement_timeout = " + STMT_TIMEOUT);
          ResultSet rs=stmt.executeQuery(sql);
          System.out.println("executed query");
 
          while (rs.next())
          {
             System.out.print("column 1 = " + rs.getInt(1) + "\015");
          }
 
          System.out.println("Closing Connection");
          rs.close(); stmt.close();
          conn.close();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Re: Statement timeout not working on broken connections with active queries

От
Tom Lane
Дата:
"Brendan O'Shea" <brendanoshea@comcast.net> writes:
> We have discovered a situation where the statement_timeout is not =
> honored for broken connections.  If a connection is in the process of =
> returning results to the client and the connection is severed (for =
> example, network cable on client is unplugged) then the query continues =
> to run on the server even after the statement_timeout is exceeded.

Well, the backend is blocked on a write() to the socket and cannot abort
that without bollixing the connection completely (i.e., loss of message
synchronization).  So I think waiting until the TCP stack declares
failure is an appropriate response.  If you want faster TCP failure,
see whether your TCP stack allows timeout adjustments.

Note that the query is not "running" in the sense of consuming any
meaningful CPU or I/O resources in this state ...

            regards, tom lane

Re: Statement timeout not working on broken connections with active queries

От
Brian Wipf
Дата:
On 12-Dec-06, at 4:30 PM, Tom Lane wrote:
> "Brendan O'Shea" <brendanoshea@comcast.net> writes:
>> We have discovered a situation where the statement_timeout is not =
>> honored for broken connections.  If a connection is in the process
>> of =
>> returning results to the client and the connection is severed (for =
>> example, network cable on client is unplugged) then the query
>> continues =
>> to run on the server even after the statement_timeout is exceeded.
>
> Well, the backend is blocked on a write() to the socket and cannot
> abort
> that without bollixing the connection completely (i.e., loss of
> message
> synchronization).  So I think waiting until the TCP stack declares
> failure is an appropriate response.  If you want faster TCP failure,
> see whether your TCP stack allows timeout adjustments.
>
> Note that the query is not "running" in the sense of consuming any
> meaningful CPU or I/O resources in this state ...
Of course, the query may have locks that block other meaningful
operations. When a hung connection like this occurs on our server, I
have resorted to using gdb to return from the write() method. Is this
an acceptable way to kill the connection on the server side?


Re: Statement timeout not working on broken connections with active queries

От
Tom Lane
Дата:
"Brendan O'Shea" <brendanoshea@comcast.net> writes:
> Is there no way to specify a timeout for the write() to the socket or some
> other way to abort?

This is really a question to take up with your TCP stack implementors.
I think it is fundamentally wrong for Postgres to be second-guessing
the network software about whether a network connection is still live.

            regards, tom lane

Re: Statement timeout not working on broken connections with active queries

От
Martijn van Oosterhout
Дата:
On Tue, Dec 12, 2006 at 10:41:19PM -0500, Tom Lane wrote:
> "Brendan O'Shea" <brendanoshea@comcast.net> writes:
> > Is there no way to specify a timeout for the write() to the socket or some
> > other way to abort?
>
> This is really a question to take up with your TCP stack implementors.
> I think it is fundamentally wrong for Postgres to be second-guessing
> the network software about whether a network connection is still live.

It would ofcourse be nice if postgres could honour signals while
writing to the client, but the currently that's hard to do.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Statement timeout not working on broken connections with active queries

От
"Brendan O'Shea"
Дата:
Brian Wipf writes:

> On 12-Dec-06, at 4:30 PM, Tom Lane wrote:
>> "Brendan O'Shea" <brendanoshea@comcast.net> writes:
>>> We have discovered a situation where the statement_timeout is not =
>>> honored for broken connections.  If a connection is in the process  of =
>>> returning results to the client and the connection is severed (for =
>>> example, network cable on client is unplugged) then the query  continues
>>> =
>>> to run on the server even after the statement_timeout is exceeded.
>>
>> Well, the backend is blocked on a write() to the socket and cannot  abort
>> that without bollixing the connection completely (i.e., loss of  message
>> synchronization).  So I think waiting until the TCP stack declares
>> failure is an appropriate response.  If you want faster TCP failure,
>> see whether your TCP stack allows timeout adjustments.
>>
>> Note that the query is not "running" in the sense of consuming any
>> meaningful CPU or I/O resources in this state ...
> Of course, the query may have locks that block other meaningful
> operations. When a hung connection like this occurs on our server, I  have
> resorted to using gdb to return from the write() method. Is this  an
> acceptable way to kill the connection on the server side?
>

In our case locked resources is actually the problem that we are running
into.  When the query is hung the vacuum daemon will not free up unused
space on the table that the query is accessing.  The table happens to be
very heavy on update transactions, so it grows in size from only around 10
MB to sometimes over 100 MB.  We were running into this problem due to the
fact that we have users running our client application from remote offices
that have poor network connectivity.

Is there no way to specify a timeout for the write() to the socket or some
other way to abort?  Do we still need to be concerned with message
synchronization if the connection is essentially a lost cause anyhow since
it was severed?  It would be great if this could be made into a configurable
parameter in the postgresql.conf file.

Thanks for the suggestion about the TCP stack, I'll take a look at what can
be done there.

Brendan


Re: Statement timeout not working on broken connections with active queries

От
"Brendan O'Shea"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us>
> "Brendan O'Shea" <brendanoshea@comcast.net> writes:
>> We have discovered a situation where the statement_timeout is not =
>> honored for broken connections.  If a connection is in the process of =
>> returning results to the client and the connection is severed (for =
>> example, network cable on client is unplugged) then the query continues =
>> to run on the server even after the statement_timeout is exceeded.
>
> Well, the backend is blocked on a write() to the socket and cannot abort
> that without bollixing the connection completely (i.e., loss of message
> synchronization).  So I think waiting until the TCP stack declares
> failure is an appropriate response.  If you want faster TCP failure,
> see whether your TCP stack allows timeout adjustments.
>
> Note that the query is not "running" in the sense of consuming any
> meaningful CPU or I/O resources in this state ...
>
> regards, tom lane

I noticed something odd about the query that is still active on the severed
connection.  If the statement_timeout on the query is exceeded and I send a
cancel signal then the query still stays active, it does not respond to
cancel or kill TERM signals.  But if I send the cancel signal before the
statement timeout is exceeded on the severed connection then the query is
cancelled and the status goes back to 'idle'.  Can this behavior be changed
so that the cancel signal works regardless of whether or not the statement
timeout has been exceeded?  This would be useful so that we can free
up the locks used by the active query that are interfering with the vacuum
daemon.

Thanks,
Brendan