Обсуждение: Thread hangs in VisibleBufferedInputStream.readMore
Hi all, I have recently upgraded from postgresql 7.4.7 to 8.3.7. Since then, I have experienced in our application a thread that hangs forever. The problem is quite difficult to trace, since the program hangs very irregularly. Sometimes it hangs after 10 minutes already, sometimes only after two days. After searching the archives, I thought this was maybe connected to batch inserts and batch updates. Since the thread in question is doing a lot of batch updates and inserts, I changed it to do normal updates/inserts. But without success. I have also upgraded to the latest version of the JDBC driver, compiled from CVS, because I saw there was some improvement regarding DescribeStatement messages. No success, either. The call stack of the hanging thread is as follows: "MonitoringThread" prio=1 tid=0x00007f40187ac350 nid=0x41e7 runnable [0x0000000045157000..0x0000000045157e20] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135) at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104) at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:259) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1205) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:194) - locked <0x00007f4047a7ab08> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:271) Any hints on how I could go about debugging this issue? BTW, I am using Java 1.5.0_18-b02 from Sun on Debian Lenny on an AMD64 platform. Thanks in advance! Oliver
Вложения
Oliver Hitz wrote: > I have recently upgraded from postgresql 7.4.7 to 8.3.7. Since then, I > have experienced in our application a thread that hangs forever. The > problem is quite difficult to trace, since the program hangs very > irregularly. Sometimes it hangs after 10 minutes already, sometimes only > after two days. > > After searching the archives, I thought this was maybe connected to batch > inserts and batch updates. Since the thread in question is doing a lot of > batch updates and inserts, I changed it to do normal updates/inserts. But > without success. It won't be the batch insert deadlock, that manifests as blocking on write, not on read. > The call stack of the hanging thread is as follows: > > "MonitoringThread" prio=1 tid=0x00007f40187ac350 nid=0x41e7 runnable [0x0000000045157000..0x0000000045157e20] > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135) This is "normal" in that the driver is just waiting for more data from the server. > Any hints on how I could go about debugging this issue? I'd take a look at the server backend processes to see what they're doing when your application hangs. -O
Wow that was a fast reply! Thanks! On 28 Apr 2009, Oliver Jowett wrote: > >Any hints on how I could go about debugging this issue? > I'd take a look at the server backend processes to see what they're > doing when your application hangs. Ok, I'm now waiting for the next hang. The application runs for about 8 hours now... Oliver
Вложения
On 28 Apr 2009, Oliver Jowett wrote: > >Any hints on how I could go about debugging this issue? > I'd take a look at the server backend processes to see what they're > doing when your application hangs. The application hung again at the same place. All backend processes were idle and pg_locks contained nothing which could point into the direction of a deadlock. Anything else that I could try? I will try if loglevel=2 results in some useful output now. Oliver
Вложения
On 28 Apr 2009, Oliver Hitz wrote: > Anything else that I could try? I will try if loglevel=2 results in some > useful output now. Ok now here's a query that hangs after it has been sent. Right after "Sync" the program hangs: 19:24:45.594 (172) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d7e2b57, maxRows=0,fetchSize=0, flags=17 19:24:45.594 (172) FE=> Parse(stmt=null,query="SELECT changes.what,changes.whenchanged FROM changes WHERE what=$1 AND whenchanged>$2",oids={1043,0}) 19:24:45.594 (172) FE=> Bind(stmt=null,portal=null,$1=<'docsiscmts'>,$2=<'2009-04-28 13:41:32.425000 +02:00:00'>) 19:24:45.594 (172) FE=> Describe(portal=null) 19:24:45.594 (172) FE=> Execute(portal=null,limit=0) 19:24:45.594 (172) FE=> Sync Nothing special on the backend. From what I can see, all processes are in idle state. Any idea what could be wrong here? Thanks in advance, Oliver
Вложения
Hello Oliver, This is a problem with a full receive buffer on the client side blocking the server from writing more results while the client tries to send more commands to server. Either split your server input into several single statements or increase the receive buffer size of your client. Daniel Migowski PS: I'd like this problem to be solved, too, but this would need either a multithreaded driver, or some timeout handling within the driver, both coming with some small performance losses. Maybe an option should be added to the driver here... Oliver Hitz schrieb: > On 28 Apr 2009, Oliver Hitz wrote: > >> Anything else that I could try? I will try if loglevel=2 results in some >> useful output now. >> > > Ok now here's a query that hangs after it has been sent. Right after > "Sync" the program hangs: > > 19:24:45.594 (172) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d7e2b57,maxRows=0, fetchSize=0, flags=17 > 19:24:45.594 (172) FE=> Parse(stmt=null,query="SELECT changes.what,changes.whenchanged FROM changes WHERE what=$1 ANDwhenchanged>$2",oids={1043,0}) > 19:24:45.594 (172) FE=> Bind(stmt=null,portal=null,$1=<'docsiscmts'>,$2=<'2009-04-28 13:41:32.425000 +02:00:00'>) > 19:24:45.594 (172) FE=> Describe(portal=null) > 19:24:45.594 (172) FE=> Execute(portal=null,limit=0) > 19:24:45.594 (172) FE=> Sync > > Nothing special on the backend. From what I can see, all processes are in > idle state. > > Any idea what could be wrong here? > > Thanks in advance, > > Oliver >
Daniel Migowski wrote: > This is a problem with a full receive buffer on the client side blocking > the server from writing more results while the client tries to send more > commands to server. Either split your server input into several single > statements or increase the receive buffer size of your client. No it isn't, go look at the thread dumps, his application is blocking on read not on write. -O
On 07 May 2009, Oliver Jowett wrote: > No it isn't, go look at the thread dumps, his application is blocking on > read not on write. The strange thing is that I didn't see anything at the backend. All connections were idle at that time. If the application was hanging because of a transaction lock, shouldn't I see this in pg_locks or pg_stat_activity? Anyway, I have changed some transactions and re-arranged some of the code. The application is running for about four days now without a hang, but I'm not sure if the problem has really gone away. Oliver
Вложения
Oliver Hitz wrote: > On 07 May 2009, Oliver Jowett wrote: >> No it isn't, go look at the thread dumps, his application is blocking on >> read not on write. > > The strange thing is that I didn't see anything at the backend. All > connections were idle at that time. If the application was hanging > because of a transaction lock, shouldn't I see this in pg_locks or > pg_stat_activity? > > Anyway, I have changed some transactions and re-arranged some of the > code. The application is running for about four days now without a hang, > but I'm not sure if the problem has really gone away. Yes, it seems quite strange .. Probably you would need to attach a debugger to the server process corresponding to the hung application thread and see what it was doing, but that might be tricky to arrange. -O