Обсуждение: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
BUG #5465: dblink TCP connection hangs blocking translation from being terminated
От
"Valentine Gogichashvili"
Дата:
The following bug has been logged online: Bug reference: 5465 Logged by: Valentine Gogichashvili Email address: valgog@gmail.com PostgreSQL version: 8.2.1 Operating system: Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp) Description: dblink TCP connection hangs blocking translation from being terminated Details: Hi all, we have an issue on our productive server. A stored procedure, that uses dblink to get some data from the remote database hangs not responding to kill signal and holds several locks on some tables as well as an advisory lock. So I have this transaction to be completed in order to have a possibility to operate the database normally. It was exactly on the time, that the procedure was accessing remote database, the machine hosting this remote database had a panic attack and rebooted. But the ESTABLISHED connection is still hanging on the production database machine: $ netstat | grep remote_db_host tcp 0 0 production_db_host:60248 remote_db_host:postgres ESTABLISHED $ lsof | grep remote_db_host postgres 1365 postgres 199u IPv4 23003779784 TCP production_db_host:60248->remote_db_host:postgres (ESTABLISHED) On the database session list one can see the hanging transaction: production_db=# select procpid, now() - query_start as running, waiting, substr(current_query,1,120) as current_query from pg_stat_activity where current_query not like '%----STATQ-----%' and current_query != '<IDLE>' order by query_start desc; procpid | running | waiting | current_query ---------+------------------------+---------+------------------------------- ---------------------------------------------------------------------------- ------------------------------------ 1365 | 2 days 00:17:57.992004 | f | SELECT * FROM get_remote_data() It seems like the dblink is waiting for the connection to be closed or reseted and also makes the hole transaction hang not processing kill signals. Does the dblink TCP connection have any timeout? How would it be possible to shutdown the DB in case this session process is not responding to normal kill signals? Will it hinder the database from shutting down normally? My previous experience with issuing immediate stops or killing with -9 had been quite catastrophic and I could not start the DB afterwards. What would you suggest in this case? With best regards, -- Valentine Gogichashvili
Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
От
Magnus Hagander
Дата:
On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili <valgog@gmail.com> wrote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05465 > Logged by: =A0 =A0 =A0 =A0 =A0Valentine Gogichashvili > Email address: =A0 =A0 =A0valgog@gmail.com > PostgreSQL version: 8.2.1 > Operating system: =A0 Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp) > Description: =A0 =A0 =A0 =A0dblink TCP connection hangs blocking translat= ion from > being terminated > Details: > > Hi all, > > we have an issue on our productive server. A stored procedure, that uses > dblink to get some data from the remote database hangs not responding to > kill signal and holds several locks on some tables as well as an advisory > lock. So I have this transaction to be completed in order to have a > possibility to operate the database normally. I believe this is a known issue in dblink, where it's not possible to cancel it when it's waiting in the TCP layer in the kernel. Unfortunately, there is no fix ATM - there was some work towards it for 9.0 at one point, but I think this is actually the first real bug-report on the issue... > It seems like the dblink is waiting for the connection to be closed or > reseted and also makes the hole transaction hang not processing kill > signals. > > Does the dblink TCP connection have any timeout? It does not. But it would detect a conneciton that goes away, so TCP keepalives should be able to deal with this problem. Once the kernel notices the other end is gone, dblink should notice it and roll back. > How would it be possible to shutdown the DB in case this session process = is > not responding to normal kill signals? Will it hinder the database from > shutting down normally? My previous experience with issuing immediate sto= ps > or killing with -9 had been quite catastrophic and I could not start the = DB > afterwards. What would you suggest in this case? kill -9 on a client will make the postmaster restart the whole process, so yes, it's a very heavy operation. --=20 Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
От
Joseph Conway
Дата:
Magnus Hagander wrote: > On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili > <valgog@gmail.com> wrote: >> The following bug has been logged online: >> >> Bug reference: 5465 >> Logged by: Valentine Gogichashvili >> Email address: valgog@gmail.com >> PostgreSQL version: 8.2.1 >> Operating system: Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp) >> Description: dblink TCP connection hangs blocking translation from >> being terminated >> Details: >> >> Hi all, >> >> we have an issue on our productive server. A stored procedure, that uses >> dblink to get some data from the remote database hangs not responding to >> kill signal and holds several locks on some tables as well as an advisory >> lock. So I have this transaction to be completed in order to have a >> possibility to operate the database normally. > > I believe this is a known issue in dblink, where it's not possible to > cancel it when it's waiting in the TCP layer in the kernel. > Unfortunately, there is no fix ATM - there was some work towards it > for 9.0 at one point, but I think this is actually the first real > bug-report on the issue... I thought the known issue was only on Windows though... Note that this is not dblink specific but rather libpq. >> How would it be possible to shutdown the DB in case this session process is >> not responding to normal kill signals? Will it hinder the database from >> shutting down normally? My previous experience with issuing immediate stops >> or killing with -9 had been quite catastrophic and I could not start the DB >> afterwards. What would you suggest in this case? > > kill -9 on a client will make the postmaster restart the whole > process, so yes, it's a very heavy operation. Can you grab the process with gdb and call elog() manually? Joe
Oh, found an type in the subject. Transaction, not Translation.
On May 19, 8:41=A0pm, m...@joeconway.com (Joseph Conway) wrote: > Magnus Hagander wrote: > > On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili > > <val...@gmail.com> wrote: > >> The following bug has been logged online: > > >> Bug reference: =A0 =A0 =A05465 > >> Logged by: =A0 =A0 =A0 =A0 =A0Valentine Gogichashvili > >> Email address: =A0 =A0 =A0val...@gmail.com > >> PostgreSQL version: 8.2.1 > >> Operating system: =A0 Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp) > >> Description: =A0 =A0 =A0 =A0dblink TCP connection hangs blocking trans= lation from > >> being terminated > >> Details: > > >> Hi all, > > >> we have an issue on our productive server. A stored procedure, that us= es > >> dblink to get some data from the remote database hangs not responding = to > >> kill signal and holds several locks on some tables as well as an advis= ory > >> lock. So I have this transaction to be completed in order to have a > >> possibility to operate the database normally. > > > I believe this is a known issue in dblink, where it's not possible to > > cancel it when it's waiting in the TCP layer in the kernel. > > Unfortunately, there is no fix ATM - there was some work towards it > > for 9.0 at one point, but I think this is actually the first real > > bug-report on the issue... > > I thought the known issue was only on Windows though... > Note that this is not dblink specific but rather libpq. > > >> How would it be possible to shutdown the DB in case this session proce= ss is > >> not responding to normal kill signals? Will it hinder the database from > >> shutting down normally? My previous experience with issuing immediate = stops > >> or killing with -9 had been quite catastrophic and I could not start t= he DB > >> afterwards. What would you suggest in this case? > > > kill -9 on a client will make the postmaster restart the whole > > process, so yes, it's a very heavy operation. > > Can you grab the process with gdb and call elog() manually? > > Joe > > -- > Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/p= gsql-bugs Unfortunately I could not install gdb on that machine :-( some dependencies are not installable and I cannot upgrade that production machine... -- Valentine