Обсуждение: Statement timeout not working on broken connections with active queries
We have discovered a situation where the statement_timeout is not honored for broken connections. If a connection is in the process of returning results to the client and the connection is severed (for example, network cable on client is unplugged) then the query continues to run on the server even after the statement_timeout is exceeded. The connection will eventually close on its own after about 18-19 minutes and the following log lines will be generated in the postgresql log file:
2006-12-12 04:03:22 LOG: could not send data to client: No route to host
2006-12-12 04:03:22 ERROR: canceling statement due to statement timeout
2006-12-12 04:03:22 LOG: could not send data to client: Broken pipe
2006-12-12 04:03:22 LOG: unexpected EOF on client connection
2006-12-12 04:03:22 ERROR: canceling statement due to statement timeout
2006-12-12 04:03:22 LOG: could not send data to client: Broken pipe
2006-12-12 04:03:22 LOG: unexpected EOF on client connection
Our server setup is:
Linux 2.4
Postgresql 8.1.4
Linux 2.4
Postgresql 8.1.4
Our client setup is:
Windows XP
Java 1.5
postgresql-8.1.jdbc2ee.jar
Windows XP
Java 1.5
postgresql-8.1.jdbc2ee.jar
This behavior appears to be a bug with the statement_timeout. I'd like to know if there is a way to get the connection to close once the statement_timeout is exceeded even if the connection to the client has been severed. I'd also like to know what is causing the connection to close on its own after 18-19 minutes and if this can be adjusted. Any help here would be greatly appreciated.
I tried adjusting the "tcp_keepalives_idle" setting and related settings but this had no affect on the time it took for the connection to close on its own.
I have also tried cancelling the active query via a call to "select pg_cancel_backend(pid)", but this has no affect. I then tried killing the connection by running the command "./pg_ctl kill TERM pid", but this also has no affect (I realize 'kill TERM' isn't considered safe yet, I see it's still on the pg todo list). The connection can be killed with a QUIT signal, but this is not recommended because it causes the database to restart in an unclean way. I'd prefer that the statement_timeout setting simply cancelled the query and the connection was closed without any manual intervention, but does anyone know of a way to manually kill or cancel connections of this sort in a clean manner?
You can duplicate the problem with other clients besides java. For example, you can use PG Admin III following these steps:
1) execute "set statement_timeout = 15000"
2) run a query that will return a large number of rows that will take more than 15 seconds to retrieve
3) a few seconds after you execute the query unplug your network cable
4) wait about 10 seconds
5) plug your network cable back in
6) query the pg_stat_activity view and you will see a non idle connection running your query
2) run a query that will return a large number of rows that will take more than 15 seconds to retrieve
3) a few seconds after you execute the query unplug your network cable
4) wait about 10 seconds
5) plug your network cable back in
6) query the pg_stat_activity view and you will see a non idle connection running your query
Below is the Java code used to duplicate the error. You need to sever your network connection once you see the output "set statement_timeout = ...".
Thanks, Brendan
import java.sql.*;
public class TestStatementTimeout {
private static final String URL = "jdbc:postgresql://hostname/db_name";
private static final String DB_USER = "user";
private static final String DB_PASSWORD = "password";
private static final int STMT_TIMEOUT = 15 * 1000;
public static void main(String[] args) throws Exception {
String sql = "SELECT * FROM table_with_many_rows";
try {
System.out.println("Connecting to " + URL);
Class.forName("org.postgresql.Driver");
Connection conn = java.sql.DriverManager.getConnection(URL, DB_USER, DB_PASSWORD);
Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE,java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.execute("set statement_timeout = " + STMT_TIMEOUT);
System.out.println("set statement_timeout = " + STMT_TIMEOUT);
ResultSet rs=stmt.executeQuery(sql);
System.out.println("executed query");
private static final String DB_USER = "user";
private static final String DB_PASSWORD = "password";
private static final int STMT_TIMEOUT = 15 * 1000;
public static void main(String[] args) throws Exception {
String sql = "SELECT * FROM table_with_many_rows";
try {
System.out.println("Connecting to " + URL);
Class.forName("org.postgresql.Driver");
Connection conn = java.sql.DriverManager.getConnection(URL, DB_USER, DB_PASSWORD);
Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE,java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.execute("set statement_timeout = " + STMT_TIMEOUT);
System.out.println("set statement_timeout = " + STMT_TIMEOUT);
ResultSet rs=stmt.executeQuery(sql);
System.out.println("executed query");
while (rs.next())
{
System.out.print("column 1 = " + rs.getInt(1) + "\015");
}
{
System.out.print("column 1 = " + rs.getInt(1) + "\015");
}
System.out.println("Closing Connection");
rs.close(); stmt.close();
conn.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
rs.close(); stmt.close();
conn.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
"Brendan O'Shea" <brendanoshea@comcast.net> writes: > We have discovered a situation where the statement_timeout is not = > honored for broken connections. If a connection is in the process of = > returning results to the client and the connection is severed (for = > example, network cable on client is unplugged) then the query continues = > to run on the server even after the statement_timeout is exceeded. Well, the backend is blocked on a write() to the socket and cannot abort that without bollixing the connection completely (i.e., loss of message synchronization). So I think waiting until the TCP stack declares failure is an appropriate response. If you want faster TCP failure, see whether your TCP stack allows timeout adjustments. Note that the query is not "running" in the sense of consuming any meaningful CPU or I/O resources in this state ... regards, tom lane
On 12-Dec-06, at 4:30 PM, Tom Lane wrote: > "Brendan O'Shea" <brendanoshea@comcast.net> writes: >> We have discovered a situation where the statement_timeout is not = >> honored for broken connections. If a connection is in the process >> of = >> returning results to the client and the connection is severed (for = >> example, network cable on client is unplugged) then the query >> continues = >> to run on the server even after the statement_timeout is exceeded. > > Well, the backend is blocked on a write() to the socket and cannot > abort > that without bollixing the connection completely (i.e., loss of > message > synchronization). So I think waiting until the TCP stack declares > failure is an appropriate response. If you want faster TCP failure, > see whether your TCP stack allows timeout adjustments. > > Note that the query is not "running" in the sense of consuming any > meaningful CPU or I/O resources in this state ... Of course, the query may have locks that block other meaningful operations. When a hung connection like this occurs on our server, I have resorted to using gdb to return from the write() method. Is this an acceptable way to kill the connection on the server side?
"Brendan O'Shea" <brendanoshea@comcast.net> writes: > Is there no way to specify a timeout for the write() to the socket or some > other way to abort? This is really a question to take up with your TCP stack implementors. I think it is fundamentally wrong for Postgres to be second-guessing the network software about whether a network connection is still live. regards, tom lane
Re: Statement timeout not working on broken connections with active queries
От
Martijn van Oosterhout
Дата:
On Tue, Dec 12, 2006 at 10:41:19PM -0500, Tom Lane wrote: > "Brendan O'Shea" <brendanoshea@comcast.net> writes: > > Is there no way to specify a timeout for the write() to the socket or some > > other way to abort? > > This is really a question to take up with your TCP stack implementors. > I think it is fundamentally wrong for Postgres to be second-guessing > the network software about whether a network connection is still live. It would ofcourse be nice if postgres could honour signals while writing to the client, but the currently that's hard to do. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Re: Statement timeout not working on broken connections with active queries
От
"Brendan O'Shea"
Дата:
Brian Wipf writes: > On 12-Dec-06, at 4:30 PM, Tom Lane wrote: >> "Brendan O'Shea" <brendanoshea@comcast.net> writes: >>> We have discovered a situation where the statement_timeout is not = >>> honored for broken connections. If a connection is in the process of = >>> returning results to the client and the connection is severed (for = >>> example, network cable on client is unplugged) then the query continues >>> = >>> to run on the server even after the statement_timeout is exceeded. >> >> Well, the backend is blocked on a write() to the socket and cannot abort >> that without bollixing the connection completely (i.e., loss of message >> synchronization). So I think waiting until the TCP stack declares >> failure is an appropriate response. If you want faster TCP failure, >> see whether your TCP stack allows timeout adjustments. >> >> Note that the query is not "running" in the sense of consuming any >> meaningful CPU or I/O resources in this state ... > Of course, the query may have locks that block other meaningful > operations. When a hung connection like this occurs on our server, I have > resorted to using gdb to return from the write() method. Is this an > acceptable way to kill the connection on the server side? > In our case locked resources is actually the problem that we are running into. When the query is hung the vacuum daemon will not free up unused space on the table that the query is accessing. The table happens to be very heavy on update transactions, so it grows in size from only around 10 MB to sometimes over 100 MB. We were running into this problem due to the fact that we have users running our client application from remote offices that have poor network connectivity. Is there no way to specify a timeout for the write() to the socket or some other way to abort? Do we still need to be concerned with message synchronization if the connection is essentially a lost cause anyhow since it was severed? It would be great if this could be made into a configurable parameter in the postgresql.conf file. Thanks for the suggestion about the TCP stack, I'll take a look at what can be done there. Brendan
Re: Statement timeout not working on broken connections with active queries
От
"Brendan O'Shea"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> > "Brendan O'Shea" <brendanoshea@comcast.net> writes: >> We have discovered a situation where the statement_timeout is not = >> honored for broken connections. If a connection is in the process of = >> returning results to the client and the connection is severed (for = >> example, network cable on client is unplugged) then the query continues = >> to run on the server even after the statement_timeout is exceeded. > > Well, the backend is blocked on a write() to the socket and cannot abort > that without bollixing the connection completely (i.e., loss of message > synchronization). So I think waiting until the TCP stack declares > failure is an appropriate response. If you want faster TCP failure, > see whether your TCP stack allows timeout adjustments. > > Note that the query is not "running" in the sense of consuming any > meaningful CPU or I/O resources in this state ... > > regards, tom lane I noticed something odd about the query that is still active on the severed connection. If the statement_timeout on the query is exceeded and I send a cancel signal then the query still stays active, it does not respond to cancel or kill TERM signals. But if I send the cancel signal before the statement timeout is exceeded on the severed connection then the query is cancelled and the status goes back to 'idle'. Can this behavior be changed so that the cancel signal works regardless of whether or not the statement timeout has been exceeded? This would be useful so that we can free up the locks used by the active query that are interfering with the vacuum daemon. Thanks, Brendan