Обсуждение: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
От
jgassner@gmail.com
Дата:
I am experiencing a problem while using the JDBC driver where one or more connections hang soon after application startup. The connections don't hang immediately, but instead hang after performing a few operations (such as SELECTS) - we are using a connection pool. The number of operations it takes for a connection to hang varies, but typically we are able to reproduce the issue within an hour or two. In all cases, however, when they do hang, it is while performing an INSERT into one particular table of ours. I'm not aware if it is always the first INSERT for a particular connection that results in a hang, but I have seen this to be the case in many db log traces. The table where the hanging INSERTS are occurring is the table that receives, by far, the most number of INSERTS. We are using a batched INSERT to perform the insert operation into this table. When we attempt to insert batches of size 600 rows, we observe this hang issue. However, we evetually found that lowering the batch size to 100 avoided the hang issue. While it is nice to apparently get around the issue using a lower batch size, it would be nice to understand what the actual problem is so we can be sure it won't pop up again. This is my question... Anyone know why lowering the batch size fixed the problem and why the problem occurred in the first place? Is there a maximum amount of data that can be inserted in one batch? Here are some observations on the problem: From a stack trace of the java side of the connection, I see that it consistenly is hanging in a socketWrite and while binding the values: Thread-94 [RUNNABLE] java.net.SocketOutputStream.socketWrite0(native method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.FilterOutputStream.write(FilterOutputStream.java:80) org.postgresql.core.PGStream.SendInteger2(PGStream.java:205) org.postgresql.core.v3.QueryExecutorImpl.sendBind(QueryExecutorImpl.java: 840) org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java: 1053) org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java: 644) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java: 345) org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java: 2674) Performing an strace on the PostgreSQL server process corresponding to the hung connection shows that it is stuck inside a call to "sendto". Using lsof I can see that the tcp connection is still in the ESTABLISHED state. So it appears that boths side are trying to write to each other at the same time and the underlying tcp connection is good. Threading: This is a multi-threaded java server application. All threads take connections from the same connection pool. Most threads aren't very busy, other than the threads inserting into the particular table where the INSERTS are hanging. Info on my setup: JDBC driver: postgresql-8.3-603.jdbc3.jar PostgreSQL Server: postgresql-8.3.3.tar.bz2 (built with the options -- without-readline --disable-rpath --with-openssl on x86-64) - Although it is built with ssl, our tests are not using ssl JVM: Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15- b04) Machine: SLES 10 SP2 (x86_64); 4 CPU cores - Doesn't seem to happen on machines with only 2 CPU cores All idea are appreciated. Thanks, John
On Mon, 20 Oct 2008, jgassner@gmail.com wrote: > We are using a batched INSERT to perform the insert operation into > this table. When we attempt to insert batches of size 600 rows, we > observe this hang issue. However, we evetually found that lowering > the batch size to 100 avoided the hang issue. While it is nice to > apparently get around the issue using a lower batch size, it would be > nice to understand what the actual problem is so we can be sure it > won't pop up again. This is my question... Anyone know why lowering > the batch size fixed the problem and why the problem occurred in the > first place? Is there a maximum amount of data that can be inserted > in one batch? See the comments here starting at line #226. http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java?annotate=1.42 The driver is already breaking your batch into sub-batches of 256 queries. We don't expect to get a large response from an insert statement, but that's apparently not true in your case. Perhaps there is something unusual about your inserts? Do they fail, resulting in large error messages? Do they produce warnings or notices for things like escape_string_warning? If you can reproduce this failure, then turning on additional logging via the loglevel=2 URL parameter will show the exact protocol level responses coming from the server and then we should be able to see what's really happening. (You'd want to look at the protocol level logging for both the success and failure cases, because on the failure side, we never get around to processing the results. Kris Jurka
On Oct 22, 1:48 pm, bo...@ejurka.com (Kris Jurka) wrote: > On Mon, 20 Oct 2008, jgass...@gmail.com wrote: > > We are using a batched INSERT to perform the insert operation into > > this table. When we attempt to insert batches of size 600 rows, we > > observe this hang issue. However, we evetually found that lowering > > the batch size to 100 avoided the hang issue. While it is nice to > > apparently get around the issue using a lower batch size, it would be > > nice to understand what the actual problem is so we can be sure it > > won't pop up again. This is my question... Anyone know why lowering > > the batch size fixed the problem and why the problem occurred in the > > first place? Is there a maximum amount of data that can be inserted > > in one batch? > > See the comments here starting at line #226. > > http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresq... > > The driver is already breaking your batch into sub-batches of 256 queries. > We don't expect to get a large response from an insert statement, but > that's apparently not true in your case. Perhaps there is something > unusual about your inserts? Do they fail, resulting in large error > messages? Do they produce warnings or notices for things like > escape_string_warning? If you can reproduce this failure, then turning on > additional logging via the loglevel=2 URL parameter will show the exact > protocol level responses coming from the server and then we should be able > to see what's really happening. (You'd want to look at the protocol level > logging for both the success and failure cases, because on the failure > side, we never get around to processing the results. > > Kris Jurka > > -- > Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-jdbc I was able to reproduce the issue with loglevel=2. I believe I located the connection in the log file that corresponds to the hung connection. Reviewing the log activity of this connection didn't show any errors that were plain for me to see. I searched for "warning" and didn't get any hits either. I do see that this connection (3) just stops logging at the log message pasted below. This concurs with my finding that it is hung within a bind statement. All the other connections continue logging, so it was fairly obvious that this was the connection that was the hung one. Could it be related to the INSERT being done on a table that has 330 columns? Thanks, John 2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=> Execute(portal=null,limit=1) 2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=> Describe(statement=S_489) 2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=> Bind(stmt=S_489,portal=null,$1=<42BE2611-84F5-102B-A965-0022190B8F18>, $2=<2008-10 -25 19:03:55.000000 +00:00:00>,$3=<101000>,$4=<0>,$5=<0>,$6=<0>, $7=<101000>,$8=<7655123>,$9=<109100>,$10=<NULL>,$11=<NULL>,$12=<NULL>, $13=<NULL>,$ 14=<NULL>,$15=<443>,$16=<16001>,$17=<NULL>,$18=<NULL>,$19=<3>, $20=<दकEventGeneratorदक>,$21=<NULL>,$22=<-1062731758>,$23=<NULL>, $24=<NULL>,$25=<-10 62731759>,$26=<NULL>,$27=<https>,$28=<MyInitUser>,$29=<MyTargetUser>, $30=<दकqffदक fik fik >,$31=<NULL>,$32=<NULL>,$33=<NULL>,$34=<NULL>, $35=<NULL> ,$36=<NULL>,$37=<2008-10-25 19:16:25.650000 +00:00:00>,$38=<2008-10-25 19:16:25.650000 +00:00:00>,$39=<NULL>,$40=<NULL>,$41=<0>,$42=<NULL>, $43=<NU LL>,$44=<NULL>,$45=<NULL>,$46=<NULL>,$47=<NULL>,$48=<NULL>,$49=<NULL>, $50=<NULL>,$51=<NULL>,$52=<NULL>,$53=<NULL>,$54=<NULL>,$55=<NULL>, $56=<NULL> ,$57=<NULL>,$58=<NULL>,$59=<NULL>,$60=<NULL>,$61=<NULL>,$62=<NULL>, $63=<NULL>,$64=<NULL>,$65=<NULL>,$66=<NULL>,$67=<NULL>,$68=<NULL>, $69=<NULL>,$7 0=<NULL>,$71=<NULL>,$72=<NULL>,$73=<NULL>,$74=<MyInitUserDomain>, $75=<NULL>,$76=<NULL>,$77=<NULL>,$78=<NULL>,$79=<NULL>,$80=<NULL>, $81=<NULL>,$82= <NULL>,$83=<MyTargetUserDomain>,$84=<NULL>,$85=<NULL>,$86=<NULL>, $87=<NULL>,$88=<0>,$89=<101000>,$90=<NULL>,$91=<NULL>,$92=<NULL>, $93=<NULL>,$94=< NULL>,$95=<NULL>,$96=<NULL>,$97=<NULL>,$98=<NULL>,$99=<NULL>, $100=<NULL>,$101=<NULL>,$102=<NULL>,$103=<NULL>,$104=<NULL>, $105=<NULL>,$106=<NULL>,$ 107=<NULL>,$108=<NULL>,$109=<NULL>,$110=<NULL>,$111=<NULL>,$112=<NULL>, $113=<NULL>,$114=<NULL>,$115=<NULL>,$116=<NULL>,$117=<NULL>, $118=<NULL>,$11 9=<NULL>,$120=<NULL>,$121=<NULL>,$122=<NULL>,$123=<NULL>,$124=<NULL>, $125=<NULL>,$126=<NULL>,$127=<NULL>,$128=<NULL>,$129=<NULL>, $130=<NULL>,$131= <NULL>,$132=<NULL>,$133=<NULL>,$134=<NULL>,$135=<NULL>,$136=<NULL>, $137=<NULL>,$138=<NULL>,$139=<NULL>,$140=<NULL>,$141=<NULL>, $142=<NULL>,$143=<N ULL>,$144=<NULL>,$145=<NULL>,$146=<NULL>,$147=<NULL>,$148=<NULL>, $149=<NULL>,$150=<NULL>,$151=<NULL>,$152=<NULL>,$153=<NULL>, $154=<NULL>,$155=<NUL L>,$156=<NULL>,$157=<NULL>,$158=<NULL>,$159=<NULL>,$160=<NULL>, $161=<NULL>,$162=<NULL>,$163=<NULL>,$164=<NULL>,$165=<NULL>, $166=<NULL>,$167=<NULL> ,$168=<NULL>,$169=<NULL>,$170=<NULL>,$171=<NULL>,$172=<NULL>, $173=<NULL>,$174=<NULL>,$175=<NULL>,$176=<NULL>,$177=<NULL>, $178=<NULL>,$179=<NULL>,$ 180=<NULL>,$181=<NULL>,$182=<NULL>,$183=<NULL>,$184=<NULL>,$185=<NULL>, $186=<NULL>,$187=<NULL>,$188=<NULL>,$189=<NULL>,$190=<NULL>, $191=<NULL>,$19 2=<NULL>,$193=<NULL>,$194=<NULL>,$195=<NULL>,$196=<NULL>,$197=<NULL>, $198=<NULL>,$199=<NULL>,$200=<NULL>,$201=<NULL>,$202=<NULL>, $203=<NULL>,$204= <NULL>,$205=<NULL>,$206=<NULL>,$207=<NULL>,$208=<0>,$209=<NULL>, $210=<fik >,$211=<NULL>,$212=<InitUserId1>,$213=<NULL>, $214=<TargetUserId1>,$215=< NULL>,$216=<NULL>,$217=<NULL>,$218=<NULL>,$219=<NULL>,$220=<NULL>, $221=<NULL>,$222=<-1062731760>,$223=<-1062731761>,$224=<NULL>, $225=<NULL>,$226=< 0>,$227=<0>,$228=<NULL>,$229=<NULL>,$230=<NULL>,$231=<NULL>, $232=<NULL>,$233=<NULL>,$234=<NULL>,$235=<NULL>,$236=<NULL>, $237=<NULL>,$238=<NULL>,$2 39=<NULL>,$240=<NULL>,$241=<NULL>,$242=<NULL>,$243=<NULL>,$244=<NULL>, $245=<NULL>,$246=<NULL>,$247=<NULL>,$248=<NULL>,$249=<NULL>, $250=<NULL>,$251 =<NULL>,$252=<NULL>,$253=<NULL>,$254=<NULL>,$255=<NULL>,$256=<NULL>, $257=<NULL>,$258=<NULL>,$259=<NULL>,$260=<NULL>,$261=<NULL>, $262=<NULL>,$263=< NULL>,$264=<NULL>,$265=<NULL>,$266=<NULL>,$267=<NULL>,$268=<NULL>, $269=<NULL>,$270=<NULL>,$271=<NULL>,$272=<NULL>,$273=<NULL>, $274=<NULL>,$275=<NU LL>,$276=<NULL>,$277=<NULL>,$278=<NULL>,$279=<NULL>,$280=<NULL>, $281=<NULL>,$282=<NULL>,$283=<NULL>,$284=<NULL>,$285=<NULL>, $286=<NULL>,$287=<NULL >,$288=<NULL>,$289=<NULL>,$290=<NULL>,$291=<NULL>,$292=<NULL>,$293=<NULL>,$294=<NULL>,$295=<NULL>,$296=<NULL>,$297=<NULL>,$298=<NULL>,$299=<NULL>, $300=<NULL>,$301=<NULL>,$302=<NULL>,$303=<NULL>,$304=<NULL>, $305=<NULL>,$306=<NULL>,$307=<NULL>,$308=<NULL>,$309=<NULL>, $310=<NULL>,$311=<NULL>,$3 12=<NULL>,$313=<NULL>,$314=<NULL>,$315=<NULL>,$316=<NULL>,$317=<NULL>, $318=<NULL>,$319=<NULL>,$320=<NULL>,$321=<NULL>,$322=<NULL>, $323=<NULL>,$324 =<NULL>,$325=<NULL>,$326=<NULL>,$327=<NULL>,$328=<NULL>,$329=<NULL>, $330=<NULL>)
On Sat, 25 Oct 2008, John wrote: > Could it be related to the INSERT being done on a table that has 330 > columns? > Yes, that's it (or at least part of it). In the simple case of using the same, known parameter types for each batch entry, the data returned from the server has no dependence on parameter count. The issue arises when binding a parameter with unspecified type. In the attached SimpleDeadLockDemo, you can see the driver issuing repeated Describe(statement=S_1) which results in a ParameterDescription message whose size depends on the number of parameters. With a large parameter count, the backend is going to fill its network buffer with this data and we'll get the deadlock you're seeing. In this case we've supplied an explicit parameter type at parse time, so describing it isn't going to tell us anything new. Even if it was going to provide us information, we only need to issue the describe once, not for every batch entry. I'm not sure how complicated a solution might be, but at least I understand what's going on now. Also attached (BatchDeadLock) is a test case that reliably locks up as you've described. Kris Jurka