Обсуждение: total number of concurrent connections
Greetings, I've got a postgresql-8.1.10 instance running on a Linux system. Over the weekend it suddenly started rejecting client connection attempts with a "too many connected" error. At the time, we were using the default max_connections of 100. I've doubled this for the time being until we can determine what/why its misbehaving. The real problem that I'm having is that I can't seem to find any obvious way to determine what the current number of connections are to the server. Googling a bit suggests that running "select count(*) from pg_stat_activity;" would provide the answer, however I'm very suspicious of the accuracy of the results. At this very minute, it claims that there are just 3 connections, however I'm fairly confident that there must be a lot more, as we've go about 200 client systems running an in hour app that are continuously hitting the database (every 30-60 seconds), plus a few web apps that are seeing some activity (even over the weekend). So just 3 seems wrong to me, especially when I run 'netstat -a | grep -c postgr' and I get 183 back (granted many are in a TIME_WAIT state, but shouldn't that still count as a valid connection to the database?). For example, here's the current output: select * from pg_stat_activity ; datid | datname | procpid | usesysid | usename | current_query | query_start | backend_start | client_addr | client_port -------+---------+---------+----------+-----------+------------------------------+-------------+-------------------------------+-------------+------------- 16636 | nightly | 29579 | 16386 | lfriedman | <command string not enabled> | | 2009-07-26 13:07:38.139417-07 | | -1 16636 | nightly | 22822 | 10 | postgres | <command string not enabled> | | 2009-07-26 13:00:02.723319-07 | | -1 65770 | minidvs | 22662 | 16386 | lfriedman | <command string not enabled> | | 2009-07-26 09:48:22.958256-07 | | -1 (3 rows) Another thing that makes no sense is why the client_addr is seemingly null (is there some postgresql.conf option that I should be setting or changing?). I'll be very happy to hear that I'm misunderstanding something. Thanks in advance for guidance. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand https://netllama.linux-sxs.org
Lonni J Friedman <netllama@gmail.com> writes: > I've got a postgresql-8.1.10 instance running on a Linux system. Over > the weekend it suddenly started rejecting client connection attempts > with a "too many connected" error. ... > The real problem that I'm having is that I can't seem to find any > obvious way to determine what the current number of connections are to > the server. Googling a bit suggests that running "select count(*) > from pg_stat_activity;" would provide the answer, however I'm very > suspicious of the accuracy of the results. pg_stat_activity should be reasonably trustworthy, modulo the fact that the display might be a fraction of a second out-of-date. > especially when I run 'netstat -a | grep -c postgr' and I get 183 back > (granted many are in a TIME_WAIT state, but shouldn't that still count > as a valid connection to the database?). No, it doesn't. That's a recently-dead connection. It may still be of interest to the TCP stack, but Postgres has forgotten about it. > Another thing that makes no sense is why the client_addr is seemingly > null Expected for a Unix-socket connection. I think you've got a lot of extremely transient connections and you just happened to have a burst of them at the same instant. Aside from the question of not being sure where to set max_connections, this is a pretty bad idea from a performance standpoint. PG backends aren't all that lightweight, so it's not good to fire one up for just a single query, which is what it sounds like your app is doing. Consider using a connection pooler. regards, tom lane
First, thanks for replying! Comments/questions below. On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> I've got a postgresql-8.1.10 instance running on a Linux system. Over >> the weekend it suddenly started rejecting client connection attempts >> with a "too many connected" error. ... >> The real problem that I'm having is that I can't seem to find any >> obvious way to determine what the current number of connections are to >> the server. Googling a bit suggests that running "select count(*) >> from pg_stat_activity;" would provide the answer, however I'm very >> suspicious of the accuracy of the results. > > pg_stat_activity should be reasonably trustworthy, modulo the fact that > the display might be a fraction of a second out-of-date. Hrmm, that's not what I'm seeing. I'm finding that connections continue to appear in the table long after I've terminated a remote pqsl connection. I'm talking minutes or even hours. In one case, I rebooted a system that was supposedly connected, and it continued to appear in the table (or rather the IP associated with the system). > >> especially when I run 'netstat -a | grep -c postgr' and I get 183 back >> (granted many are in a TIME_WAIT state, but shouldn't that still count >> as a valid connection to the database?). > > No, it doesn't. That's a recently-dead connection. It may still be of > interest to the TCP stack, but Postgres has forgotten about it. > >> Another thing that makes no sense is why the client_addr is seemingly >> null > > Expected for a Unix-socket connection. > > I think you've got a lot of extremely transient connections and you just > happened to have a burst of them at the same instant. Aside from the > question of not being sure where to set max_connections, this is a > pretty bad idea from a performance standpoint. PG backends aren't all > that lightweight, so it's not good to fire one up for just a single > query, which is what it sounds like your app is doing. Consider using a > connection pooler. Thanks, I'll definitely look into that as time allows.
Lonni J Friedman <netllama@gmail.com> writes: > On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> pg_stat_activity should be reasonably trustworthy, modulo the fact that >> the display might be a fraction of a second out-of-date. > Hrmm, that's not what I'm seeing. I'm finding that connections > continue to appear in the table long after I've terminated a remote > pqsl connection. I'm talking minutes or even hours. What are you doing to "terminate" these remote connections? What it sounds like is the connected server process isn't being told about the termination, and so it sits there waiting for input that will never come. We do enable TCP keepalive if available, so unless your server is running a seriously obsolete OS, it will eventually figure out the client is gone --- but that takes order-of-hours with the standard TCP timeout settings. Between that and your unreasonably large number of TIME_WAIT connections, it definitely seems like you've got TCP-level connection reliability problems. TIME_WAIT state should go away pretty fast too if things are working properly at the network level. I wonder whether you have a router that is dropping connections it thinks are idle. Beyond that my TCP expertise does not extend. regards, tom lane
On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> pg_stat_activity should be reasonably trustworthy, modulo the fact that >>> the display might be a fraction of a second out-of-date. > >> Hrmm, that's not what I'm seeing. I'm finding that connections >> continue to appear in the table long after I've terminated a remote >> pqsl connection. I'm talking minutes or even hours. > > What are you doing to "terminate" these remote connections? What it > sounds like is the connected server process isn't being told about the > termination, and so it sits there waiting for input that will never > come. We do enable TCP keepalive if available, so unless your server > is running a seriously obsolete OS, it will eventually figure out the > client is gone --- but that takes order-of-hours with the standard TCP > timeout settings. Normally, just quitting from psql, but as part of today's experiment I rebooted the system that the table claimed was still connected. The server is running Linux with a reasonably recent 2.6.x kernel. > > Between that and your unreasonably large number of TIME_WAIT > connections, it definitely seems like you've got TCP-level connection > reliability problems. TIME_WAIT state should go away pretty fast too > if things are working properly at the network level. I wonder whether > you have a router that is dropping connections it thinks are idle. > Beyond that my TCP expertise does not extend. The TIME_WAIT entries do go away fairly quickly, but that's not what I'm looking at now. I'm talking about the content of the pg_stat_activity table.
Lonni J Friedman <netllama@gmail.com> writes: > On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> What are you doing to "terminate" these remote connections? What it >> sounds like is the connected server process isn't being told about the >> termination, and so it sits there waiting for input that will never >> come. > Normally, just quitting from psql, but as part of today's experiment I > rebooted the system that the table claimed was still connected. The > server is running Linux with a reasonably recent 2.6.x kernel. Hm, but what's the client-side OS? A reasonable OS should send a connection close notification (TCP RST) when the psql process dies, even if you managed to kill it in a way that prevented psql from closing the connection for itself. However, if that didn't happen for some reason, reboot would not make things better. It would just guarantee that the OS no longer had any memory of the connection either. It still sounds like your problems are fundamentally network-level problems and not Postgres problems... but it's hard to tell from here whether it's client-side software or network infrastructure doing it to you. regards, tom lane
On Mon, Jul 27, 2009 at 7:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> What are you doing to "terminate" these remote connections? What it >>> sounds like is the connected server process isn't being told about the >>> termination, and so it sits there waiting for input that will never >>> come. > >> Normally, just quitting from psql, but as part of today's experiment I >> rebooted the system that the table claimed was still connected. The >> server is running Linux with a reasonably recent 2.6.x kernel. > > Hm, but what's the client-side OS? A reasonable OS should send a > connection close notification (TCP RST) when the psql process dies, > even if you managed to kill it in a way that prevented psql from > closing the connection for itself. However, if that didn't happen > for some reason, reboot would not make things better. It would just > guarantee that the OS no longer had any memory of the connection either. The client side is Linux too. > > It still sounds like your problems are fundamentally network-level > problems and not Postgres problems... but it's hard to tell from > here whether it's client-side software or network infrastructure > doing it to you. ok, thanks -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand https://netllama.linux-sxs.org