Обсуждение: Idle in TX / Java process hang's in the vicinity of JDBC
We've got an issue where an ETL process just sort of stalled on us last night. It left a dreaded "Idle in Transaction" in the database. Although fortunatly it didn't appear to have any tables locked. No errors logged in the postgresql logfile. (we're on JDBC driver v8.3-603) We got a stack trace of the ETL process: Name: Thread-957 State: RUNNABLE Total blocked: 0 Total waited: 0 Stack trace: java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.read(SocketInputStream.java:129) org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135) org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104) org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:186) org.postgresql.core.PGStream.Receive(PGStream.java:445) org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:350) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1306) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) - locked org.postgresql.core.v3.QueryExecutorImpl@18493c9 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:336) org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:235) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.tPostgresqlInput_1Process(Load_Class_Roster_Fact.java:5507) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.Create_Audit_1_tJava_2Process(Load_Class_Roster_Fact.java:3360) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.Create_Audit_1_tPostgresqlCommit_1Process(Load_Class_Roster_Fact.java:3274) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.Create_Audit_1_tPostgresqlInput_1Process(Load_Class_Roster_Fact.java:2823) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.tPostgresqlConnection_1Process(Load_Class_Roster_Fact.java:819) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.tJava_4Process(Load_Class_Roster_Fact.java:12623) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.runJobInTOS(Load_Class_Roster_Fact.java:14079) talend_reporting_etl.load_class_roster_fact_0_1.Load_Class_Roster_Fact.runJob(Load_Class_Roster_Fact.java:13818) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_2Process(Load_Facts.java:1722) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_1Process(Load_Facts.java:1654) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_6Process(Load_Facts.java:1465) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_5Process(Load_Facts.java:1353) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_4Process(Load_Facts.java:1164) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_3Process(Load_Facts.java:1052) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_8Process(Load_Facts.java:863) talend_reporting_etl.load_facts_0_1.Load_Facts.tRunJob_7Process(Load_Facts.java:674) talend_reporting_etl.load_facts_0_1.Load_Facts$1.run(Load_Facts.java:3722) We were wondering what the: org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) - locked org.postgresql.core.v3.QueryExecutorImpl@18493c9 was refering to. Any direction for debugging would be useful. I'm not really that knowledgeable in the Java area and my developers are a bit stumped. Thanks Dave
On Thu, 29 Jul 2010, David Kerr wrote: > We've got an issue where an ETL process just sort of stalled on us last > night. > > It left a dreaded "Idle in Transaction" in the database. Although > fortunatly it didn't appear to have any tables locked. > > Stack trace: > java.net.SocketInputStream.socketRead0(Native Method) > java.net.SocketInputStream.read(SocketInputStream.java:129) > org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135) > org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104) > org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:186) > org.postgresql.core.PGStream.Receive(PGStream.java:445) > org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:350) > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1306) > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) > - locked org.postgresql.core.v3.QueryExecutorImpl@18493c9 > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:336) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:235) The stacktrace indicates that Statement.executeQuery was called and the JDBC driver started receiving the results from the server. The driver is in the middle of reading a particular row and is waiting for more data from the server. From what you're saying the server thinks it is done, but the driver does not, indicating loss of protocol synchronization. That's tough to debug after the fact, without a reproducible case or network traffic dump. If you still have this connection stuck in this state you could try poking around with a debugger to say what values the ReceiveTupleV3 routine has for l_size and already stored in answer. > We were wondering what the: > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) > - locked org.postgresql.core.v3.QueryExecutorImpl@18493c9 > > was refering to. > This is just indicating that this routine is synchronized. This is done to prevent multiple threads from trying to communicate with the server simultaneously and does not indicate a problem. Kris Jurka
- >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:235) - - The stacktrace indicates that Statement.executeQuery was called and the - JDBC driver started receiving the results from the server. The driver is - in the middle of reading a particular row and is waiting for more data - from the server. From what you're saying the server thinks it is done, - but the driver does not, indicating loss of protocol synchronization. - That's tough to debug after the fact, without a reproducible case or - network traffic dump. If you still have this connection stuck in this - state you could try poking around with a debugger to say what values the - ReceiveTupleV3 routine has for l_size and already stored in answer. Thanks, that's helpful! we don't have a debugger configured for that JVM so we might have to try to catch it next time around. Getting protocols out of sync sort of sounds to me like a network glitch of some sort to me. does that sound possible? - >We were wondering what the: - >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) - > - locked org.postgresql.core.v3.QueryExecutorImpl@18493c9 - > - >was refering to. - > - - This is just indicating that this routine is synchronized. This is done - to prevent multiple threads from trying to communicate with the server - simultaneously and does not indicate a problem. Awesome, thanks. Dave
On Thu, 29 Jul 2010, David Kerr wrote: > Getting protocols out of sync sort of sounds to me like a network glitch > of some sort to me. does that sound possible? No. TCP has pretty robust mechanisms for preventing that sort of error. Matthew -- As you approach the airport, you see a sign saying "Beware - low flying airplanes". There's not a lot you can do about that. Take your hat off? -- Michael Flanders
On 7/30/2010 1:44 AM, Matthew Wakeling wrote: > On Thu, 29 Jul 2010, David Kerr wrote: >> Getting protocols out of sync sort of sounds to me like a network glitch >> of some sort to me. does that sound possible? > > No. TCP has pretty robust mechanisms for preventing that sort of error. > > Matthew > Ok, yes that's consistent with what we found yesterday where we were able to reproduce it on system that was being used by other people without problem. we've got a working theory that the program opens up a connection to DB1 and DB2. In DB1 we're running a very long query (15 minutes) and after some sort of timeout(or something) the program hangs leaving the idle in TX in DB2. Thanks Dave
David Kerr wrote: > we've got a working theory that the program opens up a connection to > DB1 and DB2. In DB1 we're running a very long query (15 minutes) and > after some sort of timeout(or something) the program hangs leaving the > idle in TX in DB2. You may want to look at any firewalls you have between the client and the DBs. Often, stateful firewalls will drop state for connections that are idle for extended periods. -O
On Sat, Jul 31, 2010 at 10:28:22AM +1200, Oliver Jowett wrote: - David Kerr wrote: - - > we've got a working theory that the program opens up a connection to - > DB1 and DB2. In DB1 we're running a very long query (15 minutes) and - > after some sort of timeout(or something) the program hangs leaving the - > idle in TX in DB2. - - You may want to look at any firewalls you have between the client and - the DBs. Often, stateful firewalls will drop state for connections that - are idle for extended periods. - - -O - Turns out, they were accessing a 2nd DB on a seprate server that (that the developers neglected to mention) and that process was spinning due to bad statistics on the tables. so this process was just sitting there legitimately waiting. Thanks for all the replies! Dave