Обсуждение: total number of concurrent connections

Поиск
Список
Период
Сортировка

total number of concurrent connections

От
Lonni J Friedman
Дата:
Greetings,
I've got a postgresql-8.1.10 instance running on a Linux system.  Over
the weekend it suddenly started rejecting client connection attempts
with a "too many connected" error.  At the time, we were using the
default max_connections of 100.  I've doubled this for the time being
until we can determine what/why its misbehaving.

The real problem that I'm having is that I can't seem to find any
obvious way to determine what the current number of connections are to
the server.  Googling a bit suggests that running "select count(*)
from pg_stat_activity;" would provide the answer, however I'm very
suspicious of the accuracy of the results.  At this very minute, it
claims that there are just 3 connections, however I'm fairly confident
that there must be a lot more, as we've go about 200 client systems
running an in hour app that are continuously hitting the database
(every 30-60 seconds), plus a few web apps that are seeing some
activity (even over the weekend).  So just 3 seems wrong to me,
especially when I run 'netstat -a | grep -c postgr' and I get 183 back
(granted many are in a TIME_WAIT state, but shouldn't that still count
as a valid connection to the database?).  For example, here's the
current output:

select * from pg_stat_activity ;
 datid | datname | procpid | usesysid |  usename  |
current_query         | query_start |         backend_start         |
client_addr | client_port

-------+---------+---------+----------+-----------+------------------------------+-------------+-------------------------------+-------------+-------------
 16636 | nightly |   29579 |    16386 | lfriedman | <command string
not enabled> |             | 2009-07-26 13:07:38.139417-07 |
  |          -1
 16636 | nightly |   22822 |       10 | postgres  | <command string
not enabled> |             | 2009-07-26 13:00:02.723319-07 |
  |          -1
 65770 | minidvs |   22662 |    16386 | lfriedman | <command string
not enabled> |             | 2009-07-26 09:48:22.958256-07 |
  |          -1
(3 rows)


Another thing that makes no sense is why the client_addr is seemingly
null (is there some postgresql.conf option that I should be setting or
changing?).

I'll be very happy to hear that I'm misunderstanding something.

Thanks in advance for guidance.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org

Re: total number of concurrent connections

От
Tom Lane
Дата:
Lonni J Friedman <netllama@gmail.com> writes:
> I've got a postgresql-8.1.10 instance running on a Linux system.  Over
> the weekend it suddenly started rejecting client connection attempts
> with a "too many connected" error. ...
> The real problem that I'm having is that I can't seem to find any
> obvious way to determine what the current number of connections are to
> the server.  Googling a bit suggests that running "select count(*)
> from pg_stat_activity;" would provide the answer, however I'm very
> suspicious of the accuracy of the results.

pg_stat_activity should be reasonably trustworthy, modulo the fact that
the display might be a fraction of a second out-of-date.

> especially when I run 'netstat -a | grep -c postgr' and I get 183 back
> (granted many are in a TIME_WAIT state, but shouldn't that still count
> as a valid connection to the database?).

No, it doesn't.  That's a recently-dead connection.  It may still be of
interest to the TCP stack, but Postgres has forgotten about it.

> Another thing that makes no sense is why the client_addr is seemingly
> null

Expected for a Unix-socket connection.

I think you've got a lot of extremely transient connections and you just
happened to have a burst of them at the same instant.  Aside from the
question of not being sure where to set max_connections, this is a
pretty bad idea from a performance standpoint.  PG backends aren't all
that lightweight, so it's not good to fire one up for just a single
query, which is what it sounds like your app is doing.  Consider using a
connection pooler.

            regards, tom lane

Re: total number of concurrent connections

От
Lonni J Friedman
Дата:
First, thanks for replying!  Comments/questions below.

On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> I've got a postgresql-8.1.10 instance running on a Linux system.  Over
>> the weekend it suddenly started rejecting client connection attempts
>> with a "too many connected" error. ...
>> The real problem that I'm having is that I can't seem to find any
>> obvious way to determine what the current number of connections are to
>> the server.  Googling a bit suggests that running "select count(*)
>> from pg_stat_activity;" would provide the answer, however I'm very
>> suspicious of the accuracy of the results.
>
> pg_stat_activity should be reasonably trustworthy, modulo the fact that
> the display might be a fraction of a second out-of-date.

Hrmm, that's not what I'm seeing.  I'm finding that connections
continue to appear in the table long after I've terminated a remote
pqsl connection.  I'm talking minutes or even hours.  In one case, I
rebooted a system that was supposedly connected, and it continued to
appear in the table (or rather the IP associated with the system).

>
>> especially when I run 'netstat -a | grep -c postgr' and I get 183 back
>> (granted many are in a TIME_WAIT state, but shouldn't that still count
>> as a valid connection to the database?).
>
> No, it doesn't.  That's a recently-dead connection.  It may still be of
> interest to the TCP stack, but Postgres has forgotten about it.
>
>> Another thing that makes no sense is why the client_addr is seemingly
>> null
>
> Expected for a Unix-socket connection.
>
> I think you've got a lot of extremely transient connections and you just
> happened to have a burst of them at the same instant.  Aside from the
> question of not being sure where to set max_connections, this is a
> pretty bad idea from a performance standpoint.  PG backends aren't all
> that lightweight, so it's not good to fire one up for just a single
> query, which is what it sounds like your app is doing.  Consider using a
> connection pooler.

Thanks, I'll definitely look into that as time allows.

Re: total number of concurrent connections

От
Tom Lane
Дата:
Lonni J Friedman <netllama@gmail.com> writes:
> On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> pg_stat_activity should be reasonably trustworthy, modulo the fact that
>> the display might be a fraction of a second out-of-date.

> Hrmm, that's not what I'm seeing.  I'm finding that connections
> continue to appear in the table long after I've terminated a remote
> pqsl connection.  I'm talking minutes or even hours.

What are you doing to "terminate" these remote connections?  What it
sounds like is the connected server process isn't being told about the
termination, and so it sits there waiting for input that will never
come.  We do enable TCP keepalive if available, so unless your server
is running a seriously obsolete OS, it will eventually figure out the
client is gone --- but that takes order-of-hours with the standard TCP
timeout settings.

Between that and your unreasonably large number of TIME_WAIT
connections, it definitely seems like you've got TCP-level connection
reliability problems.  TIME_WAIT state should go away pretty fast too
if things are working properly at the network level.  I wonder whether
you have a router that is dropping connections it thinks are idle.
Beyond that my TCP expertise does not extend.

            regards, tom lane

Re: total number of concurrent connections

От
Lonni J Friedman
Дата:
On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> On Sun, Jul 26, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> pg_stat_activity should be reasonably trustworthy, modulo the fact that
>>> the display might be a fraction of a second out-of-date.
>
>> Hrmm, that's not what I'm seeing.  I'm finding that connections
>> continue to appear in the table long after I've terminated a remote
>> pqsl connection.  I'm talking minutes or even hours.
>
> What are you doing to "terminate" these remote connections?  What it
> sounds like is the connected server process isn't being told about the
> termination, and so it sits there waiting for input that will never
> come.  We do enable TCP keepalive if available, so unless your server
> is running a seriously obsolete OS, it will eventually figure out the
> client is gone --- but that takes order-of-hours with the standard TCP
> timeout settings.

Normally, just quitting from psql, but as part of today's experiment I
rebooted the system that the table claimed was still connected.  The
server is running Linux with a reasonably recent 2.6.x kernel.

>
> Between that and your unreasonably large number of TIME_WAIT
> connections, it definitely seems like you've got TCP-level connection
> reliability problems.  TIME_WAIT state should go away pretty fast too
> if things are working properly at the network level.  I wonder whether
> you have a router that is dropping connections it thinks are idle.
> Beyond that my TCP expertise does not extend.

The TIME_WAIT entries do go away fairly quickly, but that's not what
I'm looking at now.  I'm talking about the content of the
pg_stat_activity table.

Re: total number of concurrent connections

От
Tom Lane
Дата:
Lonni J Friedman <netllama@gmail.com> writes:
> On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> What are you doing to "terminate" these remote connections?  What it
>> sounds like is the connected server process isn't being told about the
>> termination, and so it sits there waiting for input that will never
>> come.

> Normally, just quitting from psql, but as part of today's experiment I
> rebooted the system that the table claimed was still connected.  The
> server is running Linux with a reasonably recent 2.6.x kernel.

Hm, but what's the client-side OS?  A reasonable OS should send a
connection close notification (TCP RST) when the psql process dies,
even if you managed to kill it in a way that prevented psql from
closing the connection for itself.  However, if that didn't happen
for some reason, reboot would not make things better.  It would just
guarantee that the OS no longer had any memory of the connection either.

It still sounds like your problems are fundamentally network-level
problems and not Postgres problems... but it's hard to tell from
here whether it's client-side software or network infrastructure
doing it to you.

            regards, tom lane

Re: total number of concurrent connections

От
Lonni J Friedman
Дата:
On Mon, Jul 27, 2009 at 7:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> On Mon, Jul 27, 2009 at 4:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> What are you doing to "terminate" these remote connections?  What it
>>> sounds like is the connected server process isn't being told about the
>>> termination, and so it sits there waiting for input that will never
>>> come.
>
>> Normally, just quitting from psql, but as part of today's experiment I
>> rebooted the system that the table claimed was still connected.  The
>> server is running Linux with a reasonably recent 2.6.x kernel.
>
> Hm, but what's the client-side OS?  A reasonable OS should send a
> connection close notification (TCP RST) when the psql process dies,
> even if you managed to kill it in a way that prevented psql from
> closing the connection for itself.  However, if that didn't happen
> for some reason, reboot would not make things better.  It would just
> guarantee that the OS no longer had any memory of the connection either.

The client side is Linux too.

>
> It still sounds like your problems are fundamentally network-level
> problems and not Postgres problems... but it's hard to tell from
> here whether it's client-side software or network infrastructure
> doing it to you.

ok, thanks



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org