Обсуждение: BUG #3995: pqSocketCheck doesn't return
The following bug has been logged online: Bug reference: 3995 Logged by: Kyoko Noro Email address: kyouko.noro@hp.com PostgreSQL version: 8.2.3 Operating system: Red Hat Enterprise Linux AS release 4 Description: pqSocketCheck doesn't return Details: Hello Support team I have a problem.May I ask your advice? Sometimes pqSocketCheck doesn't return and our appication hang.Do you have any bug report about the problem like this?
"Kyoko Noro" <kyouko.noro@hp.com> writes: > Sometimes pqSocketCheck doesn't return and our appication hang. pqSocketCheck isn't accessible from outside libpq, and furthermore it's often called with the *intention* of waiting for something to happen. You need to provide more details of what you are doing and what is happening (vs. what you expected to happen). regards, tom lane
Hello Tom, Thank you for your reply. >pqSocketCheck isn't accessible from outside libpq Yes, I see. SQL in our application is hang sometimes. I saw gdb backtrace as follows. (gdb) bt #0 0x900bc8bc in poll () #2 0x0149abd2 in pqSocketCheck () =1B$B!&!&!&=1B(B I've been searching some cases like this but I've not had yet. So If you have any bug reports, may I ask your advice? Best raguard, /kyoko noro -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, February 29, 2008 12:37 PM To: Noro, Kyouko Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #3995: pqSocketCheck doesn't return "Kyoko Noro" <kyouko.noro@hp.com> writes: > Sometimes pqSocketCheck doesn't return and our appication hang. pqSocketCheck isn't accessible from outside libpq, and furthermore it's oft= en called with the *intention* of waiting for something to happen. You nee= d to provide more details of what you are doing and what is happening (vs. = what you expected to happen). regards, tom lane
Hi, =20 Having spent some time analyzing the root cause, problem seems to be the aspect that 'poll ()' library function is not timed. Say the connection pooling is enabled whereby Driver manager attempts to reuse an existing connection having checked connection state executing a probe query. Flow is like having sent the query over the DB connection, which is actually a TCP connection, it does 'poll ()' on the associated 'fd' for POLLIN and POLLERR events waiting for the query result with no timeout. Also there is no KEEP-ALIVE done for the underlying TCP connection. =20 Considering the above data flow there are two scenarios possible: =20 1. When sending out the query data over the DB connection i.e. the underlying TCP connection, suppose there is no acknowledgment to the TCP chunk since DB has gone down and is unreachable. In this case, TCP stack will do retransmissions and finally the 'poll ()' call returns with error. However, it takes approx. 15 min. for the TCP stack to notify error to the application and finally 'poll ()' to return. =20 2. Consider another scenario where DB has gone down having acknowledged the query data at the TCP stack level but prior to successfully sending the query result. In this case, local TCP stack will not report any error since the TCP chunk is already being acknowledged and 'poll ()' system call could stuck forever waiting for the query response. For this particular scenario, an application thread could hang forever waiting for the query response. =20 With regards, Vivek Gupta