Обсуждение: Need help in performance tuning.

От:
Harpreet singh Wadhwa
Дата:

Hi,


I want to fine tune my postgresql to increase number of connects it
can handle in a minutes time.
Decrease the response time per request etc.
The exact case will be to handle around 100 concurrent requests.

Can any one please help me in this.
Any hardware suggestions are also welcomed.


Regards
Harpreet

От:
"Kevin Grittner"
Дата:

Harpreet singh Wadhwa <> wrote:

> I want to fine tune my postgresql to increase number of connects
> it can handle in a minutes time.
> Decrease the response time per request etc.
> The exact case will be to handle around 100 concurrent requests.

I have found that connection pooling is crucial.

The "concurrent requests" phrase worries me a bit -- you should be
focusing more on "concurrent connections" and perhaps "requests per
second".  With most hardware, you will get faster response time and
better overall throughput by funneling 100 connections through a
connection pool which limits the number of concurrent requests to
just enough to keep all your hardware resources busy, queuing any
requests beyond that for submission when a pending request
completes.

> Any hardware suggestions are also welcomed.

If you don't have the hardware yet, you'd need to provide a bit more
information to get advice on what hardware you need.

-Kevin


От:
Craig Ringer
Дата:

On 9/07/2010 3:20 AM, Harpreet singh Wadhwa wrote:
> Hi,
>
>
> I want to fine tune my postgresql to increase number of connects it
> can handle in a minutes time.
> Decrease the response time per request etc.
> The exact case will be to handle around 100 concurrent requests.

If you're not using a connection pool, start using one.

Do you really need 100 *active* working query threads at one time?
Because if you do, you're going to need a scary-big disk subsystem and a
lot of processors.

Most people actually only need a few queries executing simultaneously,
they just need lots of connections to the database open concurrently
and/or lots of queries queued up for processing. For that purpose, a
connection pool is ideal.

You will get BETTER performance from postgresql with FEWER connections
to the "real" database that're all doing active work. If you need lots
and lots of connections you should use a connection pool to save the
main database the overhead of managing that.

--
Craig Ringer

От:
Samuel Gendler
Дата:

On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
<> wrote:
> If you're not using a connection pool, start using one.
>
> Do you really need 100 *active* working query threads at one time? Because
> if you do, you're going to need a scary-big disk subsystem and a lot of
> processors.

I see this issue and subsequent advice cross this list awfully
frequently.  Is there in architectural reason why postgres itself
cannot pool incoming connections in order to eliminate the requirement
for an external pool?

От:
Tom Lane
Дата:

Samuel Gendler <> writes:
> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
> <> wrote:
>> If you're not using a connection pool, start using one.

> I see this issue and subsequent advice cross this list awfully
> frequently.  Is there in architectural reason why postgres itself
> cannot pool incoming connections in order to eliminate the requirement
> for an external pool?

Perhaps not, but there's no obvious benefit either.  Since there's
More Than One Way To Do It, it seems more practical to keep that as a
separate problem that can be solved by a choice of add-on packages.

            regards, tom lane

От:
Craig Ringer
Дата:

On 09/07/10 12:42, Tom Lane wrote:
> Samuel Gendler <> writes:
>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>> <> wrote:
>>> If you're not using a connection pool, start using one.
>
>> I see this issue and subsequent advice cross this list awfully
>> frequently.  Is there in architectural reason why postgres itself
>> cannot pool incoming connections in order to eliminate the requirement
>> for an external pool?
>
> Perhaps not, but there's no obvious benefit either.  Since there's
> More Than One Way To Do It, it seems more practical to keep that as a
> separate problem that can be solved by a choice of add-on packages.

Admittedly I'm relatively ignorant of the details, but I increasingly
think PostgreSQL will need a different big architectural change in the
coming years, as the typical performance characteristics of machines
change:

It'll need to separate "running queries" from "running processes", or
start threading backends, so that one way or the other a single query
can benefit from the capabilities of multiple CPUs. The same separation,
or a move to async I/O, might be needed to get one query to concurrently
read multiple partitions of a table, or otherwise get maximum benefit
from high-capacity I/O subsystems when running just a few big, expensive
queries.

Otherwise I'm wondering if PostgreSQL will begin really suffering in
performance on workloads where queries are big and expensive but there
are relatively few of them running at a time.

My point? *if* I'm not full of hot air and there's some truth to my
blather above, any change like that might be accompanied by a move to
separate query execution state from connection state, so that idle
connections have a much lower resource cost.

OK, that's my hand-waving for the day done.

--
Craig Ringer

От:
Harpreet singh Wadhwa
Дата:

Thanx you all for the replies.
I got a gist on where should I head towards
like I should rely a bit on postgres for performance and rest on my
tomcat and application.
And will try connection pooling on postgres part.

And if I come back for any query (related to this topic) then this
time it will be more precise (with real time data of my testing). ;-)

Regards
haps

On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer
<> wrote:
> On 09/07/10 12:42, Tom Lane wrote:
>> Samuel Gendler <> writes:
>>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>>> <> wrote:
>>>> If you're not using a connection pool, start using one.
>>
>>> I see this issue and subsequent advice cross this list awfully
>>> frequently.  Is there in architectural reason why postgres itself
>>> cannot pool incoming connections in order to eliminate the requirement
>>> for an external pool?
>>
>> Perhaps not, but there's no obvious benefit either.  Since there's
>> More Than One Way To Do It, it seems more practical to keep that as a
>> separate problem that can be solved by a choice of add-on packages.
>
> Admittedly I'm relatively ignorant of the details, but I increasingly
> think PostgreSQL will need a different big architectural change in the
> coming years, as the typical performance characteristics of machines
> change:
>
> It'll need to separate "running queries" from "running processes", or
> start threading backends, so that one way or the other a single query
> can benefit from the capabilities of multiple CPUs. The same separation,
> or a move to async I/O, might be needed to get one query to concurrently
> read multiple partitions of a table, or otherwise get maximum benefit
> from high-capacity I/O subsystems when running just a few big, expensive
> queries.
>
> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.
>
> My point? *if* I'm not full of hot air and there's some truth to my
> blather above, any change like that might be accompanied by a move to
> separate query execution state from connection state, so that idle
> connections have a much lower resource cost.
>
> OK, that's my hand-waving for the day done.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
Craig Ringer
Дата:

> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.

Oh, I should note at this point that I'm *not* whining that "someone"
should volunteer to do this, or that "the postgresql project" should
just "make it happen".

I'm fully aware that Pg is a volunteer project and that even if these
speculations were in a vaguely reasonable direction, that doesn't mean
anyone has the time/skills/knowledge/interest to undertake such major
architectural change. I certainly know I have zero right to ask/expect
anyone to - I'm very, very grateful to all those who already spend time
helping out and enhancing Pg. With particular props to Tom Lane for
patience on the -general list and heroic bug-fixing persistence.

Sorry for the rely-to-self, I just realized my post could've been taken
as a whine about Pg's architecture and some kind of demand that someone
do something about it. That couldn't be further from my intent.

--
Craig Ringer

От:
Brad Nicholson
Дата:

On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
> Samuel Gendler <> writes:
> > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
> > <> wrote:
> >> If you're not using a connection pool, start using one.
>
> > I see this issue and subsequent advice cross this list awfully
> > frequently.  Is there in architectural reason why postgres itself
> > cannot pool incoming connections in order to eliminate the requirement
> > for an external pool?
>
> Perhaps not, but there's no obvious benefit either.  Since there's
> More Than One Way To Do It, it seems more practical to keep that as a
> separate problem that can be solved by a choice of add-on packages.

This sounds similar to the approach to taken with Replication for years
before being moved into core.

Just like replication, pooling has different approaches.  I do think
that in both cases, having a solution that works, easily, out of the
"box" will meet the needs of most users.

There is also the issue of perception/adoption here as well.  One of my
colleagues mentioned that at PG East that he repeatedly heard people
talking (negatively) about the over reliance on add-on packages to deal
with core DB functionality.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



От:
Harpreet singh Wadhwa
Дата:

Thanx you all for the replies.
I got a gist on where should I head towards
like I should rely a bit on postgres for performance and rest on my
tomcat and application.
And will try connection pooling on postgres part.

And if I come back for any query (related to this topic) then this
time it will be more precise (with real time data of my testing). ;-)

Regards
haps

On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer
<> wrote:
> On 09/07/10 12:42, Tom Lane wrote:
>> Samuel Gendler <> writes:
>>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>>> <> wrote:
>>>> If you're not using a connection pool, start using one.
>>
>>> I see this issue and subsequent advice cross this list awfully
>>> frequently.  Is there in architectural reason why postgres itself
>>> cannot pool incoming connections in order to eliminate the requirement
>>> for an external pool?
>>
>> Perhaps not, but there's no obvious benefit either.  Since there's
>> More Than One Way To Do It, it seems more practical to keep that as a
>> separate problem that can be solved by a choice of add-on packages.
>
> Admittedly I'm relatively ignorant of the details, but I increasingly
> think PostgreSQL will need a different big architectural change in the
> coming years, as the typical performance characteristics of machines
> change:
>
> It'll need to separate "running queries" from "running processes", or
> start threading backends, so that one way or the other a single query
> can benefit from the capabilities of multiple CPUs. The same separation,
> or a move to async I/O, might be needed to get one query to concurrently
> read multiple partitions of a table, or otherwise get maximum benefit
> from high-capacity I/O subsystems when running just a few big, expensive
> queries.
>
> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.
>
> My point? *if* I'm not full of hot air and there's some truth to my
> blather above, any change like that might be accompanied by a move to
> separate query execution state from connection state, so that idle
> connections have a much lower resource cost.
>
> OK, that's my hand-waving for the day done.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
"Kevin Grittner"
Дата:

Brad Nicholson <> wrote:

> Just like replication, pooling has different approaches.  I do
> think that in both cases, having a solution that works, easily,
> out of the "box" will meet the needs of most users.

Any thoughts on the "minimalist" solution I suggested a couple weeks
ago?:

http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php

So far, there has been no comment by anyone....

-Kevin

От:
Matthew Wakeling
Дата:

On Fri, 9 Jul 2010, Kevin Grittner wrote:
> Any thoughts on the "minimalist" solution I suggested a couple weeks
> ago?:
>
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php
>
> So far, there has been no comment by anyone....

Interesting idea. As far as I can see, you are suggesting solving the too
many connections problem by allowing lots of connections, but only
allowing a certain number to do anything at a time?

A proper connection pool provides the following advantages over this:

1. Pool can be on a separate machine or machines, spreading load.
2. Pool has a lightweight footprint per connection, whereas Postgres
    doesn't.
3. A large amount of the overhead is sometimes connection setup, which
    this would not solve. A pool has cheap setup.
4. This could cause Postgres backends to be holding onto large amounts of
    memory while being prevented from doing anything, which is a bad use of
    resources.
5. A fair amount of the overhead is caused by context-switching between
    backends. The more backends, the less useful any CPU caches.
6. There are some internal workings of Postgres that involve keeping all
    the backends informed about something going on. The more backends, the
    greater this overhead is. (This was pretty bad with the sinval queue
    overflowing a while back, but a bit better now. It still causes some
    overhead).
7. That lock would have a metric *($!-load of contention.

Matthew

--
 Unfortunately, university regulations probably prohibit me from eating
 small children in front of the lecture class.
                                        -- Computer Science Lecturer

От:
"Kevin Grittner"
Дата:

In case there's any doubt, the questions below aren't rhetorical.

Matthew Wakeling <> wrote:

> Interesting idea. As far as I can see, you are suggesting solving
> the too many connections problem by allowing lots of connections,
> but only allowing a certain number to do anything at a time?

Right.

> A proper connection pool provides the following advantages over
> this:
>
> 1. Pool can be on a separate machine or machines, spreading load.

Sure, but how would you do that with a built-in implementation?

> 2. Pool has a lightweight footprint per connection, whereas
>    Postgres doesn't.

I haven't compared footprint of, say, a pgpool connection on the
database server to that of an idle PostgreSQL connection.  Do you
have any numbers?

> 3. A large amount of the overhead is sometimes connection setup,
>    which this would not solve. A pool has cheap setup.

This would probably be most useful where the client held a
connection for a long time, not for the "login for each database
transaction" approach.  I'm curious how often you think application
software uses that approach.

> 4. This could cause Postgres backends to be holding onto large
>    amounts of memory while being prevented from doing anything,
>    which is a bad use of resources.

Isn't this point 2 again?  If not, what are you getting at?  Again,
do you have numbers for the comparison, assuming the connection
pooler is running on the database server?

> 5. A fair amount of the overhead is caused by context-switching
>    between backends. The more backends, the less useful any CPU
>    caches.

Would this be true while a backend was blocked?  Would this not be
true for a connection pool client-side connection?

> 6. There are some internal workings of Postgres that involve
>    keeping all the backends informed about something going on. The
>    more backends, the greater this overhead is. (This was pretty
>    bad with the sinval queue overflowing a while back, but a bit
>    better now. It still causes some overhead).

Hmmm...  I hadn't thought about that.  Again, any numbers (e.g.,
profile information) on this?

> 7. That lock would have a metric *($!-load of contention.

Here I doubt you.  It would be held for such short periods that I
suspect that collisions would be relatively infrequent compared to
some of the other locks we use.  As noted in the email, it may
actually normally be an "increment and test" within an existing
locked block.  Also, assuming that any "built in" connection pool
would run on the database server, why would you think the contention
for this would be worse than for whatever is monitoring connection
count in the pooler?

-Kevin

От:
"Jorge Montero"
Дата:

If your app is running under Tomcat, connection pooling is extremely easy to set up from there: It has connection pooling mechanisms built in. Request your db connections using said mechanisms, instead of doing it manually, make a couple of changes to server.xml, and the problem goes away. Hundreds, if not thousands of concurrent users might end up running with less than 10 connections.
 


>>> Harpreet singh Wadhwa <> 7/9/2010 3:55 AM >>>
Thanx you all for the replies.
I got a gist on where should I head towards
like I should rely a bit on postgres for performance and rest on my
tomcat and application.
And will try connection pooling on postgres part.

And if I come back for any query (related to this topic) then this
time it will be more precise (with real time data of my testing). ;-)

Regards
haps

On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer
<> wrote:
> On 09/07/10 12:42, Tom Lane wrote:
>> Samuel Gendler <> writes:
>>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>>> <> wrote:
>>>> If you're not using a connection pool, start using one.
>>
>>> I see this issue and subsequent advice cross this list awfully
>>> frequently.  Is there in architectural reason why postgres itself
>>> cannot pool incoming connections in order to eliminate the requirement
>>> for an external pool?
>>
>> Perhaps not, but there's no obvious benefit either.  Since there's
>> More Than One Way To Do It, it seems more practical to keep that as a
>> separate problem that can be solved by a choice of add-on packages.
>
> Admittedly I'm relatively ignorant of the details, but I increasingly
> think PostgreSQL will need a different big architectural change in the
> coming years, as the typical performance characteristics of machines
> change:
>
> It'll need to separate "running queries" from "running processes", or
> start threading backends, so that one way or the other a single query
> can benefit from the capabilities of multiple CPUs. The same separation,
> or a move to async I/O, might be needed to get one query to concurrently
> read multiple partitions of a table, or otherwise get maximum benefit
> from high-capacity I/O subsystems when running just a few big, expensive
> queries.
>
> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.
>
> My point? *if* I'm not full of hot air and there's some truth to my
> blather above, any change like that might be accompanied by a move to
> separate query execution state from connection state, so that idle
> connections have a much lower resource cost.
>
> OK, that's my hand-waving for the day done.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
От:
Matthew Wakeling
Дата:

On Fri, 9 Jul 2010, Kevin Grittner wrote:
>> Interesting idea. As far as I can see, you are suggesting solving
>> the too many connections problem by allowing lots of connections,
>> but only allowing a certain number to do anything at a time?
>
> Right.

I think in some situations, this arrangement would be an advantage.
However, I do not think it will suit the majority of situations, and could
reduce the performance when the user doesn't need the functionality,
either because they have a pool already, or they don't have many
connections.

No, I don't have any numbers.

>> 1. Pool can be on a separate machine or machines, spreading load.
>
> Sure, but how would you do that with a built-in implementation?

That's my point exactly. If you have an external pool solution, you can
put it somewhere else - maybe on multiple somewhere elses.

>> 3. A large amount of the overhead is sometimes connection setup,
>>    which this would not solve. A pool has cheap setup.
>
> This would probably be most useful where the client held a
> connection for a long time, not for the "login for each database
> transaction" approach.  I'm curious how often you think application
> software uses that approach.

What you say is true. I don't know how often that is, but it seems to be
those times that people come crying to the mailing list.

>> 4. This could cause Postgres backends to be holding onto large
>>    amounts of memory while being prevented from doing anything,
>>    which is a bad use of resources.
>
> Isn't this point 2 again?

Kind of. Yes. Point 2 was simple overhead. This point was that the backend
may have done a load of query-related allocation, and then been stopped.

>> 7. That lock would have a metric *($!-load of contention.
>
> Here I doubt you.  It would be held for such short periods that I
> suspect that collisions would be relatively infrequent compared to
> some of the other locks we use.  As noted in the email, it may
> actually normally be an "increment and test" within an existing
> locked block.

Fair enough. It may be much less of a problem than I had previously
thought.

Matthew

--
 Change is inevitable, except from vending machines.

От:
"Kevin Grittner"
Дата:

Matthew Wakeling <> wrote:
> On Fri, 9 Jul 2010, Kevin Grittner wrote:
>>> Interesting idea. As far as I can see, you are suggesting
>>> solving the too many connections problem by allowing lots of
>>> connections, but only allowing a certain number to do anything
>>> at a time?
>>
>> Right.
>
> I think in some situations, this arrangement would be an
> advantage.  However, I do not think it will suit the majority of
> situations, and could reduce the performance when the user doesn't
> need the functionality, either because they have a pool already,
> or they don't have many connections.

Oh, totally agreed, except that I think we can have essentially nil
impact if they don't exceed a configured limit.  In my experience,
pooling is more effective the closer you put it to the client.  I
suppose the strongest argument that could be made against building
in some sort of pooling is that it doesn't encourage people to look
for client-side solutions.  However, we seem to get a lot of posts
from people who don't do this, are not able to easily manage it, and
who would benefit from even a simple solution like this.

-Kevin

От:
"Jorge Montero"
Дата:

If anything was built in the database to handle such connections, I'd recommend a big, bold warning, recommending the use of client-side pooling if available. For something like, say, a web-server, pooling connections to the database provides a massive performance advantage regardless of how good the database is at handling way more active queries than the hardware can handle: The assignment of a connection to a thread tends to be at least an order of magnitude cheaper than establishing a new connection for each new thread, and destroying it when it dies. This is especially true if the client architecture relies in relatively short lived threads.
 
While there are a few cases where pooling is counter productive, this only happens in relatively few scenarios. This is why every java application server out there wil strongly recommend using its own facilities to connect to a database: The performance is almost always better, and it provides less headaches to the DBAs.
 
Now, if remote clients are accessing your database directly, setting up a pool inbetween might not be as straightforward or give you the same gains across the board, and that might be the only case where letting the db do its own pooling makes sense.

>>> "Kevin Grittner" <> 7/9/2010 12:52 PM >>>
Matthew Wakeling <> wrote:
> On Fri, 9 Jul 2010, Kevin Grittner wrote:
>>> Interesting idea. As far as I can see, you are suggesting
>>> solving the too many connections problem by allowing lots of
>>> connections, but only allowing a certain number to do anything
>>> at a time?
>>
>> Right.
>
> I think in some situations, this arrangement would be an
> advantage.  However, I do not think it will suit the majority of
> situations, and could reduce the performance when the user doesn't
> need the functionality, either because they have a pool already,
> or they don't have many connections.

Oh, totally agreed, except that I think we can have essentially nil
impact if they don't exceed a configured limit.  In my experience,
pooling is more effective the closer you put it to the client.  I
suppose the strongest argument that could be made against building
in some sort of pooling is that it doesn't encourage people to look
for client-side solutions.  However, we seem to get a lot of posts
from people who don't do this, are not able to easily manage it, and
who would benefit from even a simple solution like this.

-Kevin

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
От:
"Kevin Grittner"
Дата:

"Jorge Montero" <> wrote:

> If anything was built in the database to handle such connections,
> I'd recommend a big, bold warning, recommending the use of client-
> side pooling if available.

+1

-Kevin

От:
Robert Haas
Дата:

On Fri, Jul 9, 2010 at 12:42 AM, Tom Lane <> wrote:
> Samuel Gendler <> writes:
>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>> <> wrote:
>>> If you're not using a connection pool, start using one.
>
>> I see this issue and subsequent advice cross this list awfully
>> frequently.  Is there in architectural reason why postgres itself
>> cannot pool incoming connections in order to eliminate the requirement
>> for an external pool?
>
> Perhaps not, but there's no obvious benefit either.  Since there's
> More Than One Way To Do It, it seems more practical to keep that as a
> separate problem that can be solved by a choice of add-on packages.

I'm not buying it.  A separate connection pooler increases overhead
and management complexity, and, I believe, limits our ability to
implement optimizations like parallel query execution.  I'm glad there
are good ones available, but the fact that they're absolutely
necessary for good performance in some environments is not a feature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Greg Smith
Дата:

Matthew Wakeling wrote:
> If you have an external pool solution, you can put it somewhere else -
> maybe on multiple somewhere elses.

This is the key point to observe:  if you're at the point where you have
so many connections that you need a pool, the last place you want to put
that is on the overloaded database server itself.  Therefore, it must be
an external piece of software to be effective, rather than being part of
the server itself.  Database servers are relatively expensive computing
hardware due to size/quantity/quality of disks required.  You can throw
a pooler (or poolers) on any cheap 1U server.  This is why a built-in
pooler, while interesting, is not particularly functional for how people
normally scale up real-world deployments.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
"Kevin Grittner"
Дата:

Greg Smith <> wrote:

> if you're at the point where you have so many connections that you
> need a pool, the last place you want to put that is on the
> overloaded database server itself.  Therefore, it must be an
> external piece of software to be effective, rather than being part
> of the server itself.

It *is* the last place you want to put it, but putting it there can
be much better than not putting it *anywhere*, which is what we've
often seen.

-Kevin

От:
Mark Kirkwood
Дата:

On 10/07/10 00:56, Brad Nicholson wrote:
> On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
>
>>
>> Perhaps not, but there's no obvious benefit either.  Since there's
>> More Than One Way To Do It, it seems more practical to keep that as a
>> separate problem that can be solved by a choice of add-on packages.
>>
> This sounds similar to the approach to taken with Replication for years
> before being moved into core.
>
> Just like replication, pooling has different approaches.  I do think
> that in both cases, having a solution that works, easily, out of the
> "box" will meet the needs of most users.
>
> There is also the issue of perception/adoption here as well.  One of my
> colleagues mentioned that at PG East that he repeatedly heard people
> talking (negatively) about the over reliance on add-on packages to deal
> with core DB functionality.
>
>

It would be interesting to know more about what they thought an 'over
reliance' was and which packages they meant.

While clearly in the case of replication something needed to be done to
make it better and easier, it is not obvious that the situation with
connection pools is analogous. For instance we make extensive use of
PgBouncer, and it seems to do the job fine and is ridiculously easy to
install and setup. So would having (something like) this in core be an
improvement? Clearly if the 'in core' product is better then it is
desirable... similarly if the packaged product is better... well let's
have that then!

I've certainly observed a 'fear of package installation' on the part of
some folk, which is often a hangover from the 'Big IT shop' mentality
where it requires blood signatures and child sacrifice to get anything
new installed.

regards

Mark

P.s Also note that Database Vendors like pooling integrated in the core
of *their* product because it is another thing to charge a license for.
Unfortunately this can also become an entrenched mentality of 'must be
in core' on the part of consultants etc!

От:
Josh Berkus
Дата:

> It *is* the last place you want to put it, but putting it there can
> be much better than not putting it *anywhere*, which is what we've
> often seen.

Well, what you proposed is an admission control mechanism, which is
*different* from a connection pool, although the two overlap.  A
connection pool solves 4 problems when it's working:

a) limiting the number of database server processes
b) limiting the number of active concurrent queries
c) reducing response times for allocating a new connection
d) allowing management of connection routes to the database
(redirection, failover, etc.)

What you were proposing is only (b).  While (b) alone is better than
nothing, it only solves some kinds of problems.  Database backend
processes are *not* free, and in general when I see users with "too many
connections" failures they are not because of too many concurrent
queries, but rather because of too many idle connections (I've seen up
to 1800 on a server).  Simply adding (b) for crappy applications would
make the problem worse, not better, because of the large number of
pending queries which the developer would fail to deal with, or monitor.

So while adding (b) to core alone would be very useful for some users,
ironically it's generally for the more advanced users which are not the
ones we're trying to help on this thread.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

От:
Craig Ringer
Дата:

On 10/07/2010 9:25 AM, Josh Berkus wrote:
>
>> It *is* the last place you want to put it, but putting it there can
>> be much better than not putting it *anywhere*, which is what we've
>> often seen.
>
> Well, what you proposed is an admission control mechanism, which is
> *different* from a connection pool, although the two overlap.  A
> connection pool solves 4 problems when it's working:
>
> a) limiting the number of database server processes
> b) limiting the number of active concurrent queries
> c) reducing response times for allocating a new connection
> d) allowing management of connection routes to the database
> (redirection, failover, etc.)

I agree with you: for most Pg users (a) is really, really important. As
you know, in PostgreSQL each connection maintains not only general
connection state (GUC settings, etc) and if in a transaction,
transaction state, but also a query executor (full backend). That gets
nasty not only in memory use, but in impact on active query performance,
as all those query executors have to participate in global signalling
for lock management etc.

So an in-server pool that solved (b) but not (a) would IMO not be
particularly useful for the majority of users.

That said, I don't think it follows that (a) cannot be solved in-core.
How much architectural change would be required to do it efficiently
enough, though...

--
Craig Ringer

От:
Robert Haas
Дата:

On Fri, Jul 9, 2010 at 11:33 PM, Craig Ringer
<> wrote:
> On 10/07/2010 9:25 AM, Josh Berkus wrote:
>>
>>> It *is* the last place you want to put it, but putting it there can
>>> be much better than not putting it *anywhere*, which is what we've
>>> often seen.
>>
>> Well, what you proposed is an admission control mechanism, which is
>> *different* from a connection pool, although the two overlap.  A
>> connection pool solves 4 problems when it's working:
>>
>> a) limiting the number of database server processes
>> b) limiting the number of active concurrent queries
>> c) reducing response times for allocating a new connection
>> d) allowing management of connection routes to the database
>> (redirection, failover, etc.)
>
> I agree with you: for most Pg users (a) is really, really important. As you
> know, in PostgreSQL each connection maintains not only general connection
> state (GUC settings, etc) and if in a transaction, transaction state, but
> also a query executor (full backend). That gets nasty not only in memory
> use, but in impact on active query performance, as all those query executors
> have to participate in global signalling for lock management etc.
>
> So an in-server pool that solved (b) but not (a) would IMO not be
> particularly useful for the majority of users.
>
> That said, I don't think it follows that (a) cannot be solved in-core. How
> much architectural change would be required to do it efficiently enough,
> though...

Right, let's not confuse Kevin's argument that we should have
connection pooling in core with advocacy for any particular patch or
feature suggestion that he may have offered on some other thread.  A
very simple in-core connection pooler might look something like this:
when a session terminates, the backend doesn't exit.  Instead, it
waits for the postmaster to reassign it to a new connection, which the
postmaster does in preference to starting new backends when possible.
But if a backend doesn't get assigned a new connection within a
certain period of time, then it goes ahead and exits anyway.

You might argue that this is not really a connection pooler at all
because there's no admission control, but the point is you're avoiding
the overhead of creating and destroying backends unnecessarily.  Of
course, I'm also relying on the unsubstantiated assumption that it's
possible to pass a socket connection between processes.

Another approach to the performance problem is to try to find ways of
reducing the overhead associated with having a large number of
backends in the system.  That's not a connection pooler either, but it
might reduce the need for one.

Still another approach is admission control based on transactions,
backends, queries, memory usage, I/O, or what have you.

None of these things are mutually exclusive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Samuel Gendler
Дата:


Sent from my iPhone

On Jul 9, 2010, at 18:25, Josh Berkus <> wrote:

>
> So while adding (b) to core alone would be very useful for some users,
> ironically it's generally for the more advanced users which are not
> the
> ones we're trying to help on this thread.

It would seem from evidence presented on this thread that the more
appropriate conversation would maybe be with package maintainers, to
perhaps get them to include a connection pool or provide a package
that comes with a pool preconfigured and installed, along with
improving existing documentation so that it encourages the use of a
pool as a first class installation choice since it seems to be
something of a first class problem for a lot of novice users.

Just to give some background on my perspective - my prior familiarity
with a connection pool was entirely on the client side, where I've
been using them for years go keep resource consumption down on the
client.. But it never occurred to me to consider one on the other end
of those connections, despite the fact that I usually have a cluster
of app hosts all talking to the same db. I assumed low connection
count was desirable, but not mandatory, since surely the db server
limited its own resource consumption, much the way a well written
client app will. I basically assumed that the postgres devs used the
same logic I did when I pooled my connections at the client side in
order to minimize resource consumption there. I've got no truck with
the reasons presented against doing so, since they make perfectly good
sense to me.

However, I suspect there are lots of engineers like myself - folks
working without the benefit of a dedicated dba or a dba who is new to
the postgres platform - who make naive assumptions that aren't
immediately or obviously corrected by the docs (I may be sticking my
foot in my mouth here. I haven't read the standard docs in a very long
time). With this issue in particular, the fix is fairly trivial and
brings other benefits as well. But it sucks to discover it only after
you've started to get errors on a running app, no matter how easy the
fix.

So probably this is really only a bug in communication and can be
fixed there. That's great. Easier to fix bugs are hard to find. I have
yet to contribute to postgres development, so I guess, if no one
objects, I'll see what I can do about improving the documentation of
this issue, both in the official docs and just making sure it gets
better billing in other sources of postgres documentation. But you'll
have to bear with me, as I do have a more-than-full-time other job,
and no experience with the pg developer community other than a couple
of weeks on the mailing lists. But I do like to contribute to projects
I use. It always winds up making me a more proficient user.

(for the record, if I wasn't limited to my phone at the moment I would
actually check the state of existing documentation before sending
this, so if I'm talking out of my ass on the lack of documentation,
please go easy on me. I mean no offense)

--sam


>

От:
Jesper Krogh
Дата:

On 2010-07-10 00:59, Greg Smith wrote:
> Matthew Wakeling wrote:
>> If you have an external pool solution, you can put it somewhere
>> else - maybe on multiple somewhere elses.
>
> This is the key point to observe:  if you're at the point where you
> have so many connections that you need a pool, the last place you
> want to put that is on the overloaded database server itself.
> Therefore, it must be an external piece of software to be effective,
> rather than being part of the server itself.  Database servers are
> relatively expensive computing hardware due to size/quantity/quality
> of disks required.  You can throw a pooler (or poolers) on any cheap
> 1U server.  This is why a built-in pooler, while interesting, is not
> particularly functional for how people normally scale up real-world
> deployments.

That may be totally correct for the 10% of the userbase
that are in a squeezed situation, but for the 90% that isn't (or isn't aware
of being there), the build-in would be a direct benefit. For the 20%
living near the "edge" it may be the difference between "just working" and
extra hassle.

I think it is a fair assumption that the majority of PG's users solves
the problems without an connection pooler, and the question
is if it is beneficial to let them scale better without doing anything?

I have also provided a case where Kevin proposal "might" be a
benefit but a connection pooler cannot solve it:
 
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01438.php
(at least as I see it, but I'm fully aware that there is stuff I dont know of)

I dont think a build-in connection-poller (or similiar) would in any
way limit the actions and abillities of an external one?

* Both numbers wildly guessed..
--
Jesper



От:
Greg Smith
Дата:

Jesper Krogh wrote:
> I dont think a build-in connection-poller (or similiar) would in any
> way limit the actions and abillities of an external one?

Two problems to recognize.  First is that building something in has the
potential to significantly limit use and therefore advancement of work
on external pools, because of the "let's use the built in one instead of
installing something extra" mentality.  I'd rather have a great external
project (which is what we have with pgBouncer) than a mediocre built-in
one that becomes the preferred way just by nature of being in the core.
If work on a core pooler was started right now, it would be years before
that reached feature/performance parity, and during that time its
existence would be a net loss compared to the current status quo for
many who used it.

The second problem is the limited amount of resources to work on
improvements to PostgreSQL.  If you want to improve the reach of
PostgreSQL, I consider projects like materialized views and easy
built-in partitioning to be orders of magnitude more useful things to
work on than the marginal benefit of merging the features of the
external pool software inside the database.  I consider the whole topic
a bit of a distraction compared to instead working on *any* of the
highly rated ideas at http://postgresql.uservoice.com/forums/21853-general

As a random technical note, I would recommend that anyone who is
thinking about a pooler in core take a look at how pgBouncer uses
libevent to respond to requests, a design model inspired by that of
memcached.  I haven't looked at it deeply yet, but my gut guess is that
this proven successful model would be hard to graft on top of the
existing PostgreSQL process design, and doing anything but that is
unlikely to perform as well.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Josh Berkus
Дата:

> Right, let's not confuse Kevin's argument that we should have
> connection pooling in core with advocacy for any particular patch or
> feature suggestion that he may have offered on some other thread.  A
> very simple in-core connection pooler might look something like this:
> when a session terminates, the backend doesn't exit.  Instead, it
> waits for the postmaster to reassign it to a new connection, which the
> postmaster does in preference to starting new backends when possible.
> But if a backend doesn't get assigned a new connection within a
> certain period of time, then it goes ahead and exits anyway.

This would, in my opinion, be an excellent option for PostgreSQL and
would save a LOT of newbie pain.  Going back to my list, it would help
with both problems (a) and (c).  It wouldn't be as good as pgbouncer,
but it would be "good enough" for a lot of users.

HOWEVER, there is the issue that such a mechanism isn't "free".  There
are issue with sharing backends around GUCs, user configuration,
security, and prepared plans -- all issues which presently cause people
difficulty with pgbouncer.  I think the costs are worth it, but we'd
have to make some attempt to tackle those issues as well.  And, of
course, we'd need to let DBAs turn the pooling off.

I'd envision parameters:

pool_connections true/false
pool_connection_timeout 60s

>  I'm also relying on the unsubstantiated assumption that it's
> possible to pass a socket connection between processes.

Doesn't pgpool do this?

> Still another approach is admission control based on transactions,
> backends, queries, memory usage, I/O, or what have you.

That's a different problem, and on its own doesn't help newbies.  It's
complimetary to pooling, though, so would be nice to have.

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

От:
Tom Lane
Дата:

Josh Berkus <> writes:
>> I'm also relying on the unsubstantiated assumption that it's
>> possible to pass a socket connection between processes.

> Doesn't pgpool do this?

No, and in fact that's exactly why the proposed implementation isn't
ever going to be in core: it's not possible to do it portably.
(And no, I'm not interested in hearing about how you can do it on
platforms X, Y, and/or Z.)

I agree with the comments to the effect that this is really a packaging
and documentation problem.  There is no need for us to re-invent the
existing solutions, but there is a need for making sure that they are
readily available and people know when to use them.

            regards, tom lane

От:
"Pierre C"
Дата:

> Two problems to recognize.  First is that building something in has the
> potential to significantly limit use and therefore advancement of work
> on external pools, because of the "let's use the built in one instead of
> installing something extra" mentality.  I'd rather have a great external
> project (which is what we have with pgBouncer) than a mediocre built-in
> one that becomes the preferred way just by nature of being in the core.

I would prefer having supplier A build a great product that seamlessly
interfaces with supplier B's great product, rather than having supplier M$
buy A, develop a half-working brain-dead version of B into A and market it
as the new hot stuff, sinking B in the process. Anyway, orthogonal feature
sets (like database and pooler) implemented in separate applications fit
the open source development model quite well I think. Merge everything in,
you get PHP.

От:
Matthew Wakeling
Дата:

On Sat, 10 Jul 2010, Tom Lane wrote:
>> Doesn't pgpool do this?
>
> No, and in fact that's exactly why the proposed implementation isn't
> ever going to be in core: it's not possible to do it portably.

I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
load of non-portable stuff? It seems to work on a whole load of platforms.

Matthew

--
 I would like to think that in this day and age people would know better than
 to open executables in an e-mail. I'd also like to be able to flap my arms
 and fly to the moon.                                    -- Tim Mullen

От:
Craig Ringer
Дата:

On 12/07/10 17:45, Matthew Wakeling wrote:
>
> I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
> load of non-portable stuff? It seems to work on a whole load of platforms.

A lot of what Apache HTTPd does is handled via the Apache Portable
Runtime (APR). It contains a lot of per-platform handlers for various
functionality.

http://apr.apache.org/docs/apr/1.4/modules.html

I don't know if the socket passing is provided as part of APR or is part
of Apache HTTPd its self, but I wouldn't be at all surprised if it was
in APR.

Personally I'm now swayed by arguments presented here that trying to
push pooling into core isn't really desirable, and that better
packaging/bundling of existing solutions would be better.

Perhaps documenting the pluses/minuses of the current pooling options
and providing clear recommendations on which to use for different use
cases would help, since half the trouble is users not knowing they need
a pool or being confused as to which to select.

This discussion reminds me a bit of Hibernate's built-in client-side
connection pool. It has one, but it's a unloved stepchild that even the
Hibernate devs suggest should be avoided in favour of a couple of
external 3rd party options.

A built-in pool seems like a great idea, but there are multiple existing
ones because they solve different problems in different ways. Unless a
built-in one could solve ALL those needs, or be so vastly simpler (due
to code re-use, easier configuration, etc) that it's worth building one
that won't fit everyone's needs, then it's best to stick to the existing
external options.

So rather than asking "should core have a connection pool" perhaps
what's needed is to ask "what can an in-core pool do that an external
pool cannot do?"

Admission control / resource limit features would be great to have in
core, and can't really be done fully in external modules ... but could
be designed in ways that would allow external poolers to add
functionality on top. Josh Berkus has made some good points on why this
isn't as easy as it looks, though:


http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895

--
Craig Ringer

От:
"Kevin Grittner"
Дата:

Craig Ringer <> wrote:

> So rather than asking "should core have a connection pool" perhaps
> what's needed is to ask "what can an in-core pool do that an
> external pool cannot do?"

(1)  It can prevent the most pessimal performance problems resulting
from lack of an external connection pool (or a badly configured one)
by setting a single GUC.  Configuration tools could suggest a good
value during initial setup.

(2)  It can be used without installing and configuring a more
sophisticated and complex product.

(3)  It might reduce latency because it avoids having to receive,
parse, and resend data in both directions -- eliminating one "hop".
I know the performance benefit would usually accrue to the external
connection pooler, but there might be some circumstances where a
built-in pool could win.

(4)  It's one more checkbox which can be ticked off on some RFPs.

That said, I fully agree that if we can include good documentation
on the external poolers and we can get packagers to include poolers
in their distribution, that gets us a much bigger benefit.  A
built-in solution would only be worthwhile if it was simple enough
and lightweight enough not to be a burden on execution time or
maintenance.  Maybe that's too big an if.

-Kevin

От:
Hannu Krosing
Дата:

On Mon, 2010-07-12 at 18:58 +0800, Craig Ringer wrote:
> On 12/07/10 17:45, Matthew Wakeling wrote:
> >
> > I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
> > load of non-portable stuff? It seems to work on a whole load of platforms.
>
> A lot of what Apache HTTPd does is handled via the Apache Portable
> Runtime (APR). It contains a lot of per-platform handlers for various
> functionality.
>
> http://apr.apache.org/docs/apr/1.4/modules.html
>
> I don't know if the socket passing is provided as part of APR or is part
> of Apache HTTPd its self, but I wouldn't be at all surprised if it was
> in APR.
>
> Personally I'm now swayed by arguments presented here that trying to
> push pooling into core isn't really desirable, and that better
> packaging/bundling of existing solutions would be better.

"better packaging/bundling of existing solutions" is good in it's own
right,weather there will eventually be some support for pooling in core
or not.

> Perhaps documenting the pluses/minuses of the current pooling options
> and providing clear recommendations on which to use for different use
> cases would help, since half the trouble is users not knowing they need
> a pool or being confused as to which to select.
>
> This discussion reminds me a bit of Hibernate's built-in client-side
> connection pool. It has one, but it's a unloved stepchild that even the
> Hibernate devs suggest should be avoided in favour of a couple of
> external 3rd party options.

Yes, pooling _is_ often better handled as a (set of) separate options,
just because of the reason that here one size does definitely not fit
all;

And efficient in-core pooler probably will look very much like pgbouncer
running in a separate thread spawned by postmaster anyway.

Let's hope there will be some support in core for having user defined
helper processes soon(ish), so tweaking pgbouncer to run as one will be
reasonably easy :)

> A built-in pool seems like a great idea, but there are multiple existing
> ones because they solve different problems in different ways. Unless a
> built-in one could solve ALL those needs, or be so vastly simpler (due
> to code re-use, easier configuration, etc) that it's worth building one
> that won't fit everyone's needs, then it's best to stick to the existing
> external options.
>
> So rather than asking "should core have a connection pool" perhaps
> what's needed is to ask "what can an in-core pool do that an external
> pool cannot do?"

Probably nothing. OTOH there are some things that an external pool can
do that a built-in one can't, like running on a separate host and
pooling more than 32000 client connections there.

Cascaded pooling seems also impossible with built-in pooling

> Admission control / resource limit features would be great to have in
> core, and can't really be done fully in external modules ... but could
> be designed in ways that would allow external poolers to add
> functionality on top. Josh Berkus has made some good points on why this
> isn't as easy as it looks, though:
>
>
> http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895
>
> --
> Craig Ringer
>


--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Bruce Momjian
Дата:

Craig Ringer wrote:
> It'll need to separate "running queries" from "running processes", or
> start threading backends, so that one way or the other a single query
> can benefit from the capabilities of multiple CPUs. The same separation,
> or a move to async I/O, might be needed to get one query to concurrently
> read multiple partitions of a table, or otherwise get maximum benefit
> from high-capacity I/O subsystems when running just a few big, expensive
> queries.
>
> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.

Agreed.  We certainly are going to have to go in that direction someday.
We have TODO items for these.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

От:
Dimitri Fontaine
Дата:

Tom Lane <> writes:
> I agree with the comments to the effect that this is really a packaging
> and documentation problem.  There is no need for us to re-invent the
> existing solutions, but there is a need for making sure that they are
> readily available and people know when to use them.

On this topic, I think we're getting back to the idea of having non-core
daemon helpers that should get "supervised" the way postmaster already
does with backends wrt starting and stoping them at the right time.

So a supervisor daemon with a supervisor API that would have to support
autovacuum as a use case, then things like pgagent, PGQ and pgbouncer,
would be very welcome.

What about starting a new thread about that? Or you already know you
won't want to push the extensibility of PostgreSQL there?

Regards,
--
dim

От:
Magnus Hagander
Дата:

On Tue, Jul 13, 2010 at 16:42, Dimitri Fontaine <> wrote:
> Tom Lane <> writes:
>> I agree with the comments to the effect that this is really a packaging
>> and documentation problem.  There is no need for us to re-invent the
>> existing solutions, but there is a need for making sure that they are
>> readily available and people know when to use them.
>
> On this topic, I think we're getting back to the idea of having non-core
> daemon helpers that should get "supervised" the way postmaster already
> does with backends wrt starting and stoping them at the right time.
>
> So a supervisor daemon with a supervisor API that would have to support
> autovacuum as a use case, then things like pgagent, PGQ and pgbouncer,
> would be very welcome.
>
> What about starting a new thread about that? Or you already know you
> won't want to push the extensibility of PostgreSQL there?

+1 on this idea in general, if we can think up a good API - this seems
very useful to me, and you have some good examples there of cases
where it'd definitely be a help.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

От:
Hannu Krosing
Дата:

On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
> Samuel Gendler <> writes:
> > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
> > <> wrote:
> >> If you're not using a connection pool, start using one.
>
> > I see this issue and subsequent advice cross this list awfully
> > frequently.  Is there in architectural reason why postgres itself
> > cannot pool incoming connections in order to eliminate the requirement
> > for an external pool?
>
> Perhaps not, but there's no obvious benefit either.  Since there's
> More Than One Way To Do It, it seems more practical to keep that as a
> separate problem that can be solved by a choice of add-on packages.

One example where you need a separate connection pool is pooling really
large number of connections, which you may want to do on another host
than the database itself is running.

For example pgbouncer had to add option to use incoming unix sockets,
because they run into the IP socket port number limit (a little above
31k, or more exactly 63k/2.
And unix sockets can be used only on local host .

--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Scott Marlowe
Дата:

On Thu, Jul 8, 2010 at 11:48 PM, Hannu Krosing <> wrote:
> One example where you need a separate connection pool is pooling really
> large number of connections, which you may want to do on another host
> than the database itself is running.

Definitely.  Often it's best placed on the individual webservers that
are making requests, each running its own pool.

От:
"Kevin Grittner"
Дата:

Scott Marlowe <> wrote:
> Hannu Krosing <> wrote:
>> One example where you need a separate connection pool is pooling
>> really large number of connections, which you may want to do on
>> another host than the database itself is running.
>
> Definitely.  Often it's best placed on the individual webservers
> that are making requests, each running its own pool.

Each running its own pool?  You've just made a case for an
admissions policy based on active database transactions or active
queries (or both) on the server having a benefit when used with this
pooling arrangement.  This collection of pools can't know when the
CPUs have enough to keep them busy and adding more will degrade
performance.

-Kevin

От:
Hannu Krosing
Дата:

On Wed, 2010-07-14 at 08:58 -0500, Kevin Grittner wrote:
> Scott Marlowe <> wrote:
> > Hannu Krosing <> wrote:
> >> One example where you need a separate connection pool is pooling
> >> really large number of connections, which you may want to do on
> >> another host than the database itself is running.
> >
> > Definitely.  Often it's best placed on the individual webservers
> > that are making requests, each running its own pool.
>
> Each running its own pool?  You've just made a case for an
> admissions policy based on active database transactions or active
> queries (or both) on the server having a benefit when used with this
> pooling arrangement.  This collection of pools can't know when the
> CPUs have enough to keep them busy and adding more will degrade
> performance.

I guess this setup is for OLTP load (read "lots of short transactions
with low timeout limits"), where you can just open 2-5 connections per
CPU for mostly-in-memory database, maybe a little more when disk
accesses are involved. If you have more, then they just wait a few
milliseconds, if you have less, you don't have anything else to run
anyway.


--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Rajesh Kumar Mallah
Дата:

Hi,

Sorry, if posting here was not proper instead of starting new thread
(I am really not sure if its bad thing to do)

I would like to share my recent experience on implementation of 
client side pooling using  pgbouncer. By client side i mean that
the the pgbouncer process in not on same machine as postgresql server.
In first trial pgbouncer and postgresql were in same machine & phbouncer
was connecting to postgresql using unix domain sockets. But i shifted it
laters owing to high CPU usage > 50%. ( using top)
Now i have shifted pgbouncer into a virtual machine (openvz container)
in the application server hardware and all my applications on other virtual machines
(web applications) connect to pgbouncer on this virtual machine.

I tested the setup with pgbench in two scenarios

1. connecting to DB server directly
2. connecting to DB via pgbouncer

the no of clients was 10 ( -c 10)  carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling  factor -s 100.

since client count was less there was no queuing of requests in pgbouncer
i would prefer to say  it was in 'passthrough' mode.

the result was that

1. direct ~ 2000 tps
2. via pgbouncer ~ 1200 tps

----------------------------------------------------------------------------------------------------------------------------------------------
Experience on deploying to production environment with real world load/usage pattern
----------------------------------------------------------------------------------------------------------------------------------------------

Pgbouncer was put in same machine as postgresql connecting via unix domain
to server and tcp sockets with clients.

1. There was drastic reduction in CPU loads  from  30 to 10 ldavg
2. There were no clients waiting, pool size was 150 and number of active
    connections was 100-120.
3. Application performance was worse (inspite of 0 clients waiting )


I am still waiting to see what is the effect of shifting out pgbounce from dbserver
to appserver, but with pgbench results i am not very hopeful. I am curious why
inspite of 0 clients waiting pgbounce introduces a drop in tps.

Warm Regds
Rajesh Kumar Mallah.
CTO - tradeindia.com.



Keywords: pgbouncer performance










On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner <> wrote:
Craig Ringer <> wrote:

> So rather than asking "should core have a connection pool" perhaps
> what's needed is to ask "what can an in-core pool do that an
> external pool cannot do?"

(1)  It can prevent the most pessimal performance problems resulting
from lack of an external connection pool (or a badly configured one)
by setting a single GUC.  Configuration tools could suggest a good
value during initial setup.

(2)  It can be used without installing and configuring a more
sophisticated and complex product.

(3)  It might reduce latency because it avoids having to receive,
parse, and resend data in both directions -- eliminating one "hop".
I know the performance benefit would usually accrue to the external
connection pooler, but there might be some circumstances where a
built-in pool could win.

(4)  It's one more checkbox which can be ticked off on some RFPs.

That said, I fully agree that if we can include good documentation
on the external poolers and we can get packagers to include poolers
in their distribution, that gets us a much bigger benefit.  A
built-in solution would only be worthwhile if it was simple enough
and lightweight enough not to be a burden on execution time or
maintenance.  Maybe that's too big an if.

-Kevin

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

От:
Rajesh Kumar Mallah
Дата:

note: my postgresql server & pgbouncer were not in virtualised environment
in the first setup. Only application server has many openvz containers.
От:
Rajesh Kumar Mallah
Дата:

Nice suggestion to try ,
I will put pgbouncer on raw hardware and run pgbench from same hardware.

regds
rajesh kumar mallah.



Why in VM (openvz container) ?

Did you also try it in the same OS as your appserver ?

Perhaps even connecting from appserver via unix seckets ?

> and all my applications on other virtual machines
 
От:
Greg Smith
Дата:

Rajesh Kumar Mallah wrote:
> the no of clients was 10 ( -c 10)  carrying out 10000 transactions
> each (-t 10000) .
> pgbench db was initilised with scaling  factor -s 100.
>
> since client count was less there was no queuing of requests in pgbouncer
> i would prefer to say  it was in 'passthrough' mode.

Of course pgbouncer is going decrease performance in this situation.
You've added a whole layer to things that all traffic has to pass
through, without a setup that gains any benefit from the connection
pooling.  Try making the client count 1000 instead if you want a useful
test.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Rajesh Kumar Mallah
Дата:

ok ,
now the question is , is it possible to dig out from from postgresql database
server if connection pooling is needed ? In our case eg i have kept
max_connections = 300  if i reduce below 250 i get error "max connection reached....."
on connecting to db directly,  if i put pgbouncer i get less performance
(even if no clients waiting)

without pooling the dbserver CPU usage increases but performance of apps
is also become good.

Regds
Rajesh Kumar Mallah.

On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <> wrote:
Rajesh Kumar Mallah wrote:
the no of clients was 10 ( -c 10)  carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling  factor -s 100.

since client count was less there was no queuing of requests in pgbouncer
i would prefer to say  it was in 'passthrough' mode.

Of course pgbouncer is going decrease performance in this situation.  You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling.  Try making the client count 1000 instead if you want a useful test.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
  www.2ndQuadrant.us


От:
Rajesh Kumar Mallah
Дата:



On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <> wrote:
Rajesh Kumar Mallah wrote:
the no of clients was 10 ( -c 10)  carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling  factor -s 100.

since client count was less there was no queuing of requests in pgbouncer
i would prefer to say  it was in 'passthrough' mode.

Of course pgbouncer is going decrease performance in this situation.  You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling.  Try making the client count 1000 instead if you want a useful test.

Dear Greg,

my  max_client is 300 shall i test  with client count 250 ?
if so what should be the scaling factor while initializing
the pgbench db?


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
  www.2ndQuadrant.us


От:
Scott Carey
Дата:

On Jul 9, 2010, at 8:33 PM, Craig Ringer wrote:

> On 10/07/2010 9:25 AM, Josh Berkus wrote:
>>
>>> It *is* the last place you want to put it, but putting it there can
>>> be much better than not putting it *anywhere*, which is what we've
>>> often seen.
>>
>> Well, what you proposed is an admission control mechanism, which is
>> *different* from a connection pool, although the two overlap.  A
>> connection pool solves 4 problems when it's working:
>>
>> a) limiting the number of database server processes
>> b) limiting the number of active concurrent queries
>> c) reducing response times for allocating a new connection
>> d) allowing management of connection routes to the database
>> (redirection, failover, etc.)
>
> I agree with you: for most Pg users (a) is really, really important. As
> you know, in PostgreSQL each connection maintains not only general
> connection state (GUC settings, etc) and if in a transaction,
> transaction state, but also a query executor (full backend). That gets
> nasty not only in memory use, but in impact on active query performance,
> as all those query executors have to participate in global signalling
> for lock management etc.
>
> So an in-server pool that solved (b) but not (a) would IMO not be
> particularly useful for the majority of users.
>
> That said, I don't think it follows that (a) cannot be solved in-core.
> How much architectural change would be required to do it efficiently
> enough, though...
>

a, b, and c can all be handled in core.  But that would be a radical re-architecture to do it right.  Postgres assumes
thatthe client connection, authentication, and query processing all happen in one place in one process on one thread.
Mostserver software built and designed today avoids that model in order to decouple its critical resources from the #
ofclient connections.  Most server software designed today tries to control its resources and not let the behavior of
clientsdictate resource usage. 

Even Apache HTTPD is undergoing a radical re-design so that it can handle more connections and more easily decouple
connectionsfrom concurrent processing to keep up with competitors. 

I'm not saying that Postgres core should change -- again thats a radical re-architecture.  But it should be recognized
thatit is not like most other server applications -- it can't control its resources very well and needs help to do so.
Fromusing a connection pool to manually setting work_mem differently for different clients or workloads, resource
managementis not what it does well.  It does a LOT of things very very well, just not that. 


> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


От:
Hannu Krosing
Дата:

On Sun, 2010-07-18 at 21:48 +0530, Rajesh Kumar Mallah wrote:
> Hi,
>
> Sorry, if posting here was not proper instead of starting new thread
> (I am really not sure if its bad thing to do)
>
> I would like to share my recent experience on implementation of
> client side pooling using  pgbouncer. By client side i mean that
> the the pgbouncer process in not on same machine as postgresql server.
> In first trial pgbouncer and postgresql were in same machine &
> phbouncer
> was connecting to postgresql using unix domain sockets. But i shifted
> it
> laters owing to high CPU usage > 50%. ( using top)
> Now i have shifted pgbouncer into a virtual machine (openvz container)
> in the application server hardware

Why in VM (openvz container) ?

Did you also try it in the same OS as your appserver ?

Perhaps even connecting from appserver via unix seckets ?

> and all my applications on other virtual machines
> (web applications) connect to pgbouncer on this virtual machine.
>
> I tested the setup with pgbench in two scenarios
>
> 1. connecting to DB server directly
> 2. connecting to DB via pgbouncer
>
> the no of clients was 10 ( -c 10)  carrying out 10000 transactions
> each (-t 10000) .
> pgbench db was initilised with scaling  factor -s 100.
>
> since client count was less there was no queuing of requests in
> pgbouncer
> i would prefer to say  it was in 'passthrough' mode.
>
> the result was that
>
> 1. direct ~ 2000 tps
> 2. via pgbouncer ~ 1200 tps

Are you sure you are not measuring how much sunning pgbouncer slows down
pgbench directly, by competing for CPU resources and not by adding
latency to requests ?


>
----------------------------------------------------------------------------------------------------------------------------------------------
> Experience on deploying to production environment with real world
> load/usage pattern
>
----------------------------------------------------------------------------------------------------------------------------------------------
>
> Pgbouncer was put in same machine as postgresql connecting via unix
> domain
> to server and tcp sockets with clients.
>
> 1. There was drastic reduction in CPU loads  from  30 to 10 ldavg
> 2. There were no clients waiting, pool size was 150 and number of
> active
>     connections was 100-120.
> 3. Application performance was worse (inspite of 0 clients waiting )
>
>
> I am still waiting to see what is the effect of shifting out pgbounce
> from dbserver
> to appserver, but with pgbench results i am not very hopeful. I am
> curious why inspite of 0 clients waiting pgbounce introduces a drop in
> tps.

If you have less clients than pgbouncer connections, you can't have any
clients waiting in pgbouncer, as each of them is allocated it's own
connection right away.

What you were measuring was

1. pgbench and pgbouncer competeing for the same CPU
2. overhead from 2 hops to db (app-proxy-db) instead of 1 (app-db)

> Warm Regds
> Rajesh Kumar Mallah.
> CTO - tradeindia.com.
>
>
>
> Keywords: pgbouncer performance
>
>
>
>
>
>
>
>
>
>
> On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner
> <> wrote:
>         Craig Ringer <> wrote:
>
>
>         > So rather than asking "should core have a connection pool"
>         perhaps
>         > what's needed is to ask "what can an in-core pool do that an
>         > external pool cannot do?"
>
>
>         (1)  It can prevent the most pessimal performance problems
>         resulting
>         from lack of an external connection pool (or a badly
>         configured one)
>         by setting a single GUC.  Configuration tools could suggest a
>         good
>         value during initial setup.
>
>         (2)  It can be used without installing and configuring a more
>         sophisticated and complex product.
>
>         (3)  It might reduce latency because it avoids having to
>         receive,
>         parse, and resend data in both directions -- eliminating one
>         "hop".
>         I know the performance benefit would usually accrue to the
>         external
>         connection pooler, but there might be some circumstances where
>         a
>         built-in pool could win.
>
>         (4)  It's one more checkbox which can be ticked off on some
>         RFPs.
>
>         That said, I fully agree that if we can include good
>         documentation
>         on the external poolers and we can get packagers to include
>         poolers
>         in their distribution, that gets us a much bigger benefit.  A
>         built-in solution would only be worthwhile if it was simple
>         enough
>         and lightweight enough not to be a burden on execution time or
>         maintenance.  Maybe that's too big an if.
>
>         -Kevin
>
>
>         --
>         Sent via pgsql-performance mailing list
>         ()
>         To make changes to your subscription:
>         http://www.postgresql.org/mailpref/pgsql-performance
>
>



От:
Rajesh Kumar Mallah
Дата:

Looks like ,

pgbench cannot be used for testing with pgbouncer if number of
pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer.
pgbench keeps waiting doing nothing. I am using pgbench  of postgresql 8.1.
Are there changes to pgbench in this aspect ?

regds
Rajesh Kumar Mallah.

On Sun, Jul 18, 2010 at 11:38 PM, Rajesh Kumar Mallah <> wrote:


On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <> wrote:
Rajesh Kumar Mallah wrote:
the no of clients was 10 ( -c 10)  carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling  factor -s 100.

since client count was less there was no queuing of requests in pgbouncer
i would prefer to say  it was in 'passthrough' mode.

Of course pgbouncer is going decrease performance in this situation.  You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling.  Try making the client count 1000 instead if you want a useful test.

Dear Greg,

my  max_client is 300 shall i test  with client count 250 ?
if so what should be the scaling factor while initializing
the pgbench db?


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
  www.2ndQuadrant.us



От:
Tatsuo Ishii
Дата:

> pgbench cannot be used for testing with pgbouncer if number of
> pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer.
> pgbench keeps waiting doing nothing. I am using pgbench  of postgresql 8.1.
> Are there changes to pgbench in this aspect ?

Pgbench won't start actual transaction until all connections to
PostgreSQL (in this case pgbounser I guess) successfully
established. IMO You sholud try other benchmark tools.

BTW, I think you should use -C option with pgbench for this kind of
testing. -C establishes connection for each transaction, which is
pretty much similar to the real world application which do not use
connection pooling. You will be supprised how PostgreSQL connection
overhead is large.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

От:
Rajesh Kumar Mallah
Дата:

 Thanks for the thought but it (-C) does not work .


BTW, I think you should use -C option with pgbench for this kind of
testing. -C establishes connection for each transaction, which is
pretty much similar to the real world application which do not use
connection pooling. You will be supprised how PostgreSQL connection
overhead is large.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

От:
Tatsuo Ishii
Дата:

From: Rajesh Kumar Mallah <>
Subject: Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Date: Mon, 19 Jul 2010 08:06:09 +0530
Message-ID: <>

>  Thanks for the thought but it (-C) does not work .

Still you need:

pgbench's -c <= (pool_size + reserve_pool_size)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

От:
Robert Haas
Дата:

On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
<> wrote:
> On 12/07/10 17:45, Matthew Wakeling wrote:
>>
>> I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
>> load of non-portable stuff? It seems to work on a whole load of platforms.
>
> A lot of what Apache HTTPd does is handled via the Apache Portable
> Runtime (APR). It contains a lot of per-platform handlers for various
> functionality.

Apache just has all of the worker processes call accept() on the
socket, and whichever one the OS hands it off to gets the job.  The
problem is harder for us because a backend can't switch identities
once it's been assigned to a database.  I haven't heard an adequate
explanation of why that couldn't be changed, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Robert Haas
Дата:

On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
<> wrote:
> So rather than asking "should core have a connection pool" perhaps
> what's needed is to ask "what can an in-core pool do that an external
> pool cannot do?"

Avoid sending every connection through an extra hop.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
"Joshua D. Drake"
Дата:

On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> <> wrote:
> > So rather than asking "should core have a connection pool" perhaps
> > what's needed is to ask "what can an in-core pool do that an external
> > pool cannot do?"
>
> Avoid sending every connection through an extra hop.

Let's extend this shall we:

Avoid adding yet another network hop
Remove of a point of failure
Reduction of administrative overhead
Integration into our core authentication mechanisms
Greater flexibility in connection control

And, having connection pooling in core does not eliminate the use of an
external pool where it makes since.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
От:
Andres Freund
Дата:

On Thu, Jul 22, 2010 at 02:33:43PM -0400, Robert Haas wrote:
> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> <> wrote:
> > On 12/07/10 17:45, Matthew Wakeling wrote:
> >>
> >> I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
> >> load of non-portable stuff? It seems to work on a whole load of platforms.
> >
> > A lot of what Apache HTTPd does is handled via the Apache Portable
> > Runtime (APR). It contains a lot of per-platform handlers for various
> > functionality.
>
> Apache just has all of the worker processes call accept() on the
> socket, and whichever one the OS hands it off to gets the job.
As an inconsequential detail - afaik they keep the os from doing that
by protecting it with a mutex for various reasons (speed - as some
implementations wake up and see theres nothing to do, multiple
sockets, fairness)

> The problem is harder for us because a backend can't switch identities
> once it's been assigned to a database.  I haven't heard an adequate
> explanation of why that couldn't be changed, though.
Possibly it might decrease the performance significantly enough by
reducing the cache locality (syscache, prepared plans)?

Andres

От:
Hannu Krosing
Дата:

On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> <> wrote:
> > So rather than asking "should core have a connection pool" perhaps
> > what's needed is to ask "what can an in-core pool do that an external
> > pool cannot do?"
>
> Avoid sending every connection through an extra hop.

not really. in-core != magically-in-right-backend-process


there will still be "an extra hop",only it will be local, between pooler
and backend process.

similar to what currently happens with pgbouncer when you deploy it on
same server and use unix sockets

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>



От:
Hannu Krosing
Дата:

On Thu, 2010-07-22 at 12:15 -0700, Joshua D. Drake wrote:
> On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> > <> wrote:
> > > So rather than asking "should core have a connection pool" perhaps
> > > what's needed is to ask "what can an in-core pool do that an external
> > > pool cannot do?"
> >
> > Avoid sending every connection through an extra hop.
>
> Let's extend this shall we:
>
> Avoid adding yet another network hop

postgreSQL is multi-process, so you either have a separate "pooler
process" or need to put pooler functionality in postmaster, bothw ways
you still have a two-hop scenario for connect. you may be able to pass
the socket to child process and also keep it, but doing this for both
client and db sides seems really convoluted.

Or is there a prortable way to pass sockets back and forth between
parent and child processes ?

If so, then pgbouncer could use it as well.

> Remove of a point of failure

rather move the point of failure from external pooler to internal
pooler ;)

> Reduction of administrative overhead

Possibly. But once you start actually using it, you still need to
configure and monitor it and do other administrator-y tasks.

> Integration into our core authentication mechanisms

True, although for example having SSL on client side connection will be
so slow that it hides any performance gains from pooling, at least for
short-lived connections.

> Greater flexibility in connection control

Yes, poolers can be much more flexible than default postgresql. See for
example pgbouncers PAUSE , RECONFIGURE and RESUME commands

> And, having connection pooling in core does not eliminate the use of an
> external pool where it makes since.

Probably the easiest way to achieve "pooling in core" would be adding an
option to start pgbouncer under postmaster control.

You probably can't get much leaner than pgbouncer.

> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>
>


--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Scott Carey
Дата:

On Jul 22, 2010, at 11:36 AM, Robert Haas wrote:

> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> <> wrote:
>> So rather than asking "should core have a connection pool" perhaps
>> what's needed is to ask "what can an in-core pool do that an external
>> pool cannot do?"
>
> Avoid sending every connection through an extra hop.
>

Dynamically adjust settings based on resource usage of the DB.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


От:
Robert Haas
Дата:

On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <> wrote:
> On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
>> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
>> <> wrote:
>> > So rather than asking "should core have a connection pool" perhaps
>> > what's needed is to ask "what can an in-core pool do that an external
>> > pool cannot do?"
>>
>> Avoid sending every connection through an extra hop.
>
> not really. in-core != magically-in-right-backend-process

Well, how about if we arrange it so it IS in the right backend
process?  I don't believe magic is required.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Robert Haas
Дата:

On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <> wrote:
>> The problem is harder for us because a backend can't switch identities
>> once it's been assigned to a database.  I haven't heard an adequate
>> explanation of why that couldn't be changed, though.
> Possibly it might decrease the performance significantly enough by
> reducing the cache locality (syscache, prepared plans)?

Those things are backend-local.  The worst case scenario is you've got
to flush them all when you reinitialize, in which case you still save
the overhead of creating a new process.  The best case scenario is
that you can keep some of them around, in which case, great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
"Joshua D. Drake"
Дата:

On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> <> wrote:
> > So rather than asking "should core have a connection pool" perhaps
> > what's needed is to ask "what can an in-core pool do that an external
> > pool cannot do?"
>
> Avoid sending every connection through an extra hop.

Let's extend this shall we:

Avoid adding yet another network hop
Remove of a point of failure
Reduction of administrative overhead
Integration into our core authentication mechanisms
Greater flexibility in connection control

And, having connection pooling in core does not eliminate the use of an
external pool where it makes since.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



От:
Joshua Tolley
Дата:

On Thu, Jul 22, 2010 at 02:44:04PM -0700, Scott Carey wrote:
> On Jul 22, 2010, at 11:36 AM, Robert Haas wrote:
> > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> > <> wrote:
> >> So rather than asking "should core have a connection pool" perhaps
> >> what's needed is to ask "what can an in-core pool do that an external
> >> pool cannot do?"
> >
> > Avoid sending every connection through an extra hop.
> >
>
> Dynamically adjust settings based on resource usage of the DB.
>

Relatively minor, but it would be convenient to avoid having to query
$external_pooler to determine the client_addr of an incoming connection.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

От:
"Joshua D. Drake"
Дата:

On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote:
>
> > Let's extend this shall we:
> >
> > Avoid adding yet another network hop
>
> postgreSQL is multi-process, so you either have a separate "pooler
> process" or need to put pooler functionality in postmaster, bothw ways
> you still have a two-hop scenario for connect. you may be able to pass
> the socket to child process and also keep it, but doing this for both
> client and db sides seems really convoluted.

Which means, right now there is three hops. Reducing one is good.

> Or is there a prortable way to pass sockets back and forth between
> parent and child processes ?
>
> If so, then pgbouncer could use it as well.
>
> > Remove of a point of failure
>
> rather move the point of failure from external pooler to internal
> pooler ;)

Yes but at that point, it doesn't matter.

>
> > Reduction of administrative overhead
>
> Possibly. But once you start actually using it, you still need to
> configure and monitor it and do other administrator-y tasks.

Yes, but it is inclusive.

>
> > Integration into our core authentication mechanisms
>
> True, although for example having SSL on client side connection will be
> so slow that it hides any performance gains from pooling, at least for
> short-lived connections.

Yes, but right now you can't use *any* pooler with LDAP for example. We
could if pooling was in core. Your SSL argument doesn't really work
because its true with or without pooling.

> > Greater flexibility in connection control
>
> Yes, poolers can be much more flexible than default postgresql. See for
> example pgbouncers PAUSE , RECONFIGURE and RESUME commands

:D

>
> > And, having connection pooling in core does not eliminate the use of an
> > external pool where it makes since.
>
> Probably the easiest way to achieve "pooling in core" would be adding an
> option to start pgbouncer under postmaster control.

Yeah but that won't happen. Also I think we may have a libevent
dependency that we have to work out.

>
> You probably can't get much leaner than pgbouncer.

Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but
even it has limitations (such as auth).

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
От:
Hannu Krosing
Дата:

On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote:
> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <> wrote:
> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> >> <> wrote:
> >> > So rather than asking "should core have a connection pool" perhaps
> >> > what's needed is to ask "what can an in-core pool do that an external
> >> > pool cannot do?"
> >>
> >> Avoid sending every connection through an extra hop.
> >
> > not really. in-core != magically-in-right-backend-process
>
> Well, how about if we arrange it so it IS in the right backend
> process?  I don't believe magic is required.

Do you have any design in mind, how you can make it so ?

---------------
Hannu




От:
Robert Haas
Дата:

On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing <> wrote:
> On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote:
>> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <> wrote:
>> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
>> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
>> >> <> wrote:
>> >> > So rather than asking "should core have a connection pool" perhaps
>> >> > what's needed is to ask "what can an in-core pool do that an external
>> >> > pool cannot do?"
>> >>
>> >> Avoid sending every connection through an extra hop.
>> >
>> > not really. in-core != magically-in-right-backend-process
>>
>> Well, how about if we arrange it so it IS in the right backend
>> process?  I don't believe magic is required.
>
> Do you have any design in mind, how you can make it so ?

Well, if we could change the backends so that they could fully
reinitialize themselves (disconnect from a database to which they are
bound, etc.), I don't see why we couldn't use the Apache approach.
There's a danger of memory leaks but that's why Apache has
MaxRequestsPerChild, and it works pretty darn well.  Of course,
passing file descriptors would be even nicer (you could pass the
connection off to a child that was already bound to the correct
database, perhaps) but has pointed out more than once, that's not
portable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
"Joshua D. Drake"
Дата:

On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote:
>
> > Let's extend this shall we:
> >
> > Avoid adding yet another network hop
>
> postgreSQL is multi-process, so you either have a separate "pooler
> process" or need to put pooler functionality in postmaster, bothw ways
> you still have a two-hop scenario for connect. you may be able to pass
> the socket to child process and also keep it, but doing this for both
> client and db sides seems really convoluted.

Which means, right now there is three hops. Reducing one is good.

> Or is there a prortable way to pass sockets back and forth between
> parent and child processes ?
>
> If so, then pgbouncer could use it as well.
>
> > Remove of a point of failure
>
> rather move the point of failure from external pooler to internal
> pooler ;)

Yes but at that point, it doesn't matter.

>
> > Reduction of administrative overhead
>
> Possibly. But once you start actually using it, you still need to
> configure and monitor it and do other administrator-y tasks.

Yes, but it is inclusive.

>
> > Integration into our core authentication mechanisms
>
> True, although for example having SSL on client side connection will be
> so slow that it hides any performance gains from pooling, at least for
> short-lived connections.

Yes, but right now you can't use *any* pooler with LDAP for example. We
could if pooling was in core. Your SSL argument doesn't really work
because its true with or without pooling.

> > Greater flexibility in connection control
>
> Yes, poolers can be much more flexible than default postgresql. See for
> example pgbouncers PAUSE , RECONFIGURE and RESUME commands

:D

>
> > And, having connection pooling in core does not eliminate the use of an
> > external pool where it makes since.
>
> Probably the easiest way to achieve "pooling in core" would be adding an
> option to start pgbouncer under postmaster control.

Yeah but that won't happen. Also I think we may have a libevent
dependency that we have to work out.

>
> You probably can't get much leaner than pgbouncer.

Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but
even it has limitations (such as auth).

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


От:
Andres Freund
Дата:

On Fri, Jul 23, 2010 at 01:28:53PM -0400, Robert Haas wrote:
> On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing <> wrote:
> > On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote:
> >> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <> wrote:
> >> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> >> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> >> >> <> wrote:
> >> >> > So rather than asking "should core have a connection pool" perhaps
> >> >> > what's needed is to ask "what can an in-core pool do that an external
> >> >> > pool cannot do?"
> >> >>
> >> >> Avoid sending every connection through an extra hop.
> >> >
> >> > not really. in-core != magically-in-right-backend-process
> >>
> >> Well, how about if we arrange it so it IS in the right backend
> >> process?  I don't believe magic is required.
> >
> > Do you have any design in mind, how you can make it so ?
>
> Well, if we could change the backends so that they could fully
> reinitialize themselves (disconnect from a database to which they are
> bound, etc.), I don't see why we couldn't use the Apache approach.
> There's a danger of memory leaks but that's why Apache has
> MaxRequestsPerChild, and it works pretty darn well.  Of course,
> passing file descriptors would be even nicer (you could pass the
> connection off to a child that was already bound to the correct
> database, perhaps) but has pointed out more than once, that's not
> portable.
Its not *that bad* though. To my knowledge its 2 or 3 implementations that
one would need to implement to support most if not all platforms.

- sendmsg/cmsg/SCM_RIGHTS based implementation (most if not all *nixes
  including solaris, linux, (free|open|net)bsd, OSX, AIX, HPUX, others)
- WSADuplicateSocket (windows)
- if needed: STREAMS based stuff (I_SENDFD) (at least solaris,  hpux, aix, tru64,
  irix, unixware allow this)


Note that I am still not convinced that its a good idea...

Andres

От:
Greg Smith
Дата:

Joshua Tolley wrote:
> Relatively minor, but it would be convenient to avoid having to query
> $external_pooler to determine the client_addr of an incoming connection.
>

You suggest this as a minor concern, but I consider it to be one of the
most compelling arguments in favor of in-core pooling.  A constant pain
with external poolers is the need to then combine two sources of data in
order to track connections fully, which is something that everyone runs
into eventually and finds annoying.  It's one of the few things that
doesn't go away no matter how much fiddling you do with pgBouncer, it's
always getting in the way a bit.  And it seems to seriously bother
systems administrators and developers, not just the DBAs.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Andres Freund
Дата:

On Sat, Jul 24, 2010 at 01:23:08AM -0400, Greg Smith wrote:
> Joshua Tolley wrote:
> >Relatively minor, but it would be convenient to avoid having to query
> >$external_pooler to determine the client_addr of an incoming connection.
>
> You suggest this as a minor concern, but I consider it to be one of
> the most compelling arguments in favor of in-core pooling.  A
> constant pain with external poolers is the need to then combine two
> sources of data in order to track connections fully, which is
> something that everyone runs into eventually and finds annoying.
> It's one of the few things that doesn't go away no matter how much
> fiddling you do with pgBouncer, it's always getting in the way a
> bit.  And it seems to seriously bother systems administrators and
> developers, not just the DBAs.
But you have to admit that this problem won't vanish as people will
continue to use poolers on other machines for resource reasons.
So providing a capability to do something sensible here seems to be
useful independent of in-core pooling.

Andres

От:
Craig Ringer
Дата:

On 24/07/10 01:28, Robert Haas wrote:

> Well, if we could change the backends so that they could fully
> reinitialize themselves (disconnect from a database to which they are
> bound, etc.), I don't see why we couldn't use the Apache approach.

This would offer the bonus on the side that it'd be more practical to
implement database changes for a connection, akin to MySQL's "USE".
Inefficient, sure, but possible.

I don't care about that current limitation very much. I think anyone
changing databases all the time probably has the wrong design and should
be using schema. I'm sure there are times it'd be good to be able to
switch databases on one connection, though.


My question with all this remains: is it worth the effort when external
poolers already solve the problem. Can PostgreSQL offer tools and
interfaces to permit external poolers to solve the problems they have,
rather than trying to bring them in-core? For example, with auth, can
the Pg server offer features to help poolers implement passthrough
authentication against the real Pg server?

Perhaps Pg could expose auth features over SQL, permitting appropriately
privileged users to verify credentials with SQL-level calls. Poolers
could pass supplied user credentials through to the real Pg server for
verification. For bonus points, an SQL interface could be provided that
lets the super-priveleged auth managing connection be used to change the
login role of another running backend/connection, so the pooler could
hand out connections with different login user ids without having to
maintain a pool per user id.

( That'd probably also permit implementation of a "CHANGE USER" command,
where the client changed login roles on the fly by passing the
credentials of the new role. That'd be *awesome* for application server
connection pools. )

--
Craig Ringer

От:
Craig Ringer
Дата:

On 24/07/10 13:23, Greg Smith wrote:
> Joshua Tolley wrote:
>> Relatively minor, but it would be convenient to avoid having to query
>> $external_pooler to determine the client_addr of an incoming connection.
>>
>
> You suggest this as a minor concern, but I consider it to be one of the
> most compelling arguments in favor of in-core pooling.  A constant pain
> with external poolers is the need to then combine two sources of data in
> order to track connections fully, which is something that everyone runs
> into eventually and finds annoying.  It's one of the few things that
> doesn't go away no matter how much fiddling you do with pgBouncer, it's
> always getting in the way a bit.  And it seems to seriously bother
> systems administrators and developers, not just the DBAs.


Putting a pooler in core won't inherently fix this, and won't remove the
need to solve it for cases where the pooler can't be on the same machine.

9.0 has application_name to let apps identify themselves. Perhaps a
"pooled_client_ip", to be set by a pooler rather than the app, could be
added to address this problem in a way that can be used by all poolers
new and existing, not just any new in-core pooling system.

If a privileged set of pooler functions is was considered, as per my
other recent mail, the pooler could use a management connection to set
the client ip before handing the connection to the client, so the client
couldn't change pooled_client_ip its self by accident or through malice.
But even without that, it'd be awfully handy.

--
Craig Ringer

От:
Dimitri Fontaine
Дата:

Craig Ringer <> writes:
> 9.0 has application_name to let apps identify themselves. Perhaps a
> "pooled_client_ip", to be set by a pooler rather than the app, could be
> added to address this problem in a way that can be used by all poolers
> new and existing, not just any new in-core pooling system.

X-Forwarded-For ?

--
dim

От:
Hannu Krosing
Дата:

On Fri, 2010-07-23 at 09:52 -0700, Joshua D. Drake wrote:
> On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote:
> >
> > > Let's extend this shall we:
> > >
> > > Avoid adding yet another network hop
> >
> > postgreSQL is multi-process, so you either have a separate "pooler
> > process" or need to put pooler functionality in postmaster, bothw ways
> > you still have a two-hop scenario for connect. you may be able to pass
> > the socket to child process and also keep it, but doing this for both
> > client and db sides seems really convoluted.
>
> Which means, right now there is three hops. Reducing one is good.

No, it is still two, as postmaster passes the socket to spwaned child
postgresql process after login.

the process is as follows

Client --connects--> postmaster --spawns--> postgreSQL server process

then socket is passed to be used directly so the use is


Client --talks-to---> postgreSQL server process

when using spooler it becomes


Client --connects-to--> Spooler --passes-requests-to-->  postgreSQL

I see no way to have spooler select the postgreSQL process, pass the
client connection in a way that taks directly to postgrSQL server
process AND be able to get the server connection back once the client is
finishe with either the request, transaction or connection (depending on
pooling mode).



>
> > Or is there a prortable way to pass sockets back and forth between
> > parent and child processes ?
> >
> > If so, then pgbouncer could use it as well.
> >
> > > Remove of a point of failure
> >
> > rather move the point of failure from external pooler to internal
> > pooler ;)
>
> Yes but at that point, it doesn't matter.
>
> >
> > > Reduction of administrative overhead
> >
> > Possibly. But once you start actually using it, you still need to
> > configure and monitor it and do other administrator-y tasks.
>
> Yes, but it is inclusive.
>
> >
> > > Integration into our core authentication mechanisms
> >
> > True, although for example having SSL on client side connection will be
> > so slow that it hides any performance gains from pooling, at least for
> > short-lived connections.
>
> Yes, but right now you can't use *any* pooler with LDAP for example. We
> could if pooling was in core. Your SSL argument doesn't really work
> because its true with or without pooling.

As main slowdown in SSL is connection setup, so you can get the network
security and pooling speedup if you run pool on client side and make the
pooler-server connection over SSL.


> > > Greater flexibility in connection control
> >
> > Yes, poolers can be much more flexible than default postgresql. See for
> > example pgbouncers PAUSE , RECONFIGURE and RESUME commands
>
> :D
>
> >
> > > And, having connection pooling in core does not eliminate the use of an
> > > external pool where it makes since.
> >
> > Probably the easiest way to achieve "pooling in core" would be adding an
> > option to start pgbouncer under postmaster control.
>
> Yeah but that won't happen.

I guess it could happen as part of opening up the "postgresql controlled
process" part to be configurable and able to run third party stuff.

Another thing to run under postmaster control would be pgqd .

> Also I think we may have a libevent
> dependency that we have to work out.
>
> >
> > You probably can't get much leaner than pgbouncer.
>
> Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but
> even it has limitations (such as auth).

As pgbouncer is single-threaded and the main goal has been performance
there is not much enthusiasm about having _any_ auth method included
which cant be completed in a few cpu cycles. It may be possible to add
threads to wait for LDAP/Kerberos/... response or do SSL handshakes, but
i have not seen any interest from Marko to do it himself.

Maybe there is a way to modularise the auth part of postmaster in a way
that could be used from third party products through some nice API which
postmaster-controlled pgbouncer can start using.


--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Hannu Krosing
Дата:

On Sat, 2010-07-24 at 14:36 +0800, Craig Ringer wrote:
> On 24/07/10 13:23, Greg Smith wrote:
> > Joshua Tolley wrote:
> >> Relatively minor, but it would be convenient to avoid having to query
> >> $external_pooler to determine the client_addr of an incoming connection.
> >>
> >
> > You suggest this as a minor concern, but I consider it to be one of the
> > most compelling arguments in favor of in-core pooling.  A constant pain
> > with external poolers is the need to then combine two sources of data in
> > order to track connections fully, which is something that everyone runs
> > into eventually and finds annoying.  It's one of the few things that
> > doesn't go away no matter how much fiddling you do with pgBouncer, it's
> > always getting in the way a bit.  And it seems to seriously bother
> > systems administrators and developers, not just the DBAs.
>
>
> Putting a pooler in core won't inherently fix this, and won't remove the
> need to solve it for cases where the pooler can't be on the same machine.
>
> 9.0 has application_name to let apps identify themselves. Perhaps a
> "pooled_client_ip", to be set by a pooler rather than the app, could be
> added to address this problem in a way that can be used by all poolers
> new and existing, not just any new in-core pooling system.
>
> If a privileged set of pooler functions is was considered, as per my
> other recent mail, the pooler could use a management connection to set
> the client ip before handing the connection to the client, so the client
> couldn't change pooled_client_ip its self by accident or through malice.
> But even without that, it'd be awfully handy.

Or maybe we can add some command extensions to the protocol for passing
extra info, so that instead of sending just the "(run_query:query)"
command over socket we could send both the extra info and execute
"(set_params:(proxy_client_ip:a.b.c.d)(proxy_client_post:n)(something
else))(run_query:query)" in one packet (for performance) and have these
things be available in logging and pg_stat_activity

I see no need to try to somehow restrict these if you can always be sure
that they are set by the direct client. proxy can decide to pass some of
these from the real client but it would be a decision made by proxy, not
mandated by some proxying rules.






--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
   Services, Consulting and Training



От:
Robert Haas
Дата:

On Sat, Jul 24, 2010 at 2:23 AM, Craig Ringer
<> wrote:
> On 24/07/10 01:28, Robert Haas wrote:
>
>> Well, if we could change the backends so that they could fully
>> reinitialize themselves (disconnect from a database to which they are
>> bound, etc.), I don't see why we couldn't use the Apache approach.
>
> This would offer the bonus on the side that it'd be more practical to
> implement database changes for a connection, akin to MySQL's "USE".
> Inefficient, sure, but possible.

Yep.

> I don't care about that current limitation very much. I think anyone
> changing databases all the time probably has the wrong design and should
> be using schema. I'm sure there are times it'd be good to be able to
> switch databases on one connection, though.

I pretty much agree with this.  I think this is merely slightly nice
on its own, but I think it might be a building-block to other things
that we might want to do down the road.  Markus Wanner's Postgres-R
replication uses worker processes; autovacuum does as well; and then
there's parallel query.  I can't help thinking that not needing to
fork a new backend every time you want to connect to a new database
has got to be useful.

> My question with all this remains: is it worth the effort when external
> poolers already solve the problem.

Whether it's worth the effort is something anyone who is thinking
about working on this will have to decide for themselves.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Tom Lane
Дата:

Robert Haas <> writes:
> On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <> wrote:
>>> The problem is harder for us because a backend can't switch identities
>>> once it's been assigned to a database. �I haven't heard an adequate
>>> explanation of why that couldn't be changed, though.

>> Possibly it might decrease the performance significantly enough by
>> reducing the cache locality (syscache, prepared plans)?

> Those things are backend-local.  The worst case scenario is you've got
> to flush them all when you reinitialize, in which case you still save
> the overhead of creating a new process.

"Flushing them all" is not zero-cost; it's not too hard to believe that
it could actually be slower than forking a clean new backend.

What's much worse, it's not zero-bug.  We've got little bitty caches
all over the backend, including (no doubt) some caching behavior in
third-party code that wouldn't get the word about whatever API you
invented to deal with this.

            regards, tom lane

От:
Robert Haas
Дата:

On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <> wrote:
> Robert Haas <> writes:
>> On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <> wrote:
>>>> The problem is harder for us because a backend can't switch identities
>>>> once it's been assigned to a database.  I haven't heard an adequate
>>>> explanation of why that couldn't be changed, though.
>
>>> Possibly it might decrease the performance significantly enough by
>>> reducing the cache locality (syscache, prepared plans)?
>
>> Those things are backend-local.  The worst case scenario is you've got
>> to flush them all when you reinitialize, in which case you still save
>> the overhead of creating a new process.
>
> "Flushing them all" is not zero-cost; it's not too hard to believe that
> it could actually be slower than forking a clean new backend.

I'm not so sure I believe that.  Is a sinval overrun slower than
forking a clean new backend?  Is DISCARD ALL slower that forking a
clean new backend?  How much white space is there between either of
those and what would be needed here?  I guess it could be slower, but
I wouldn't want to assume that without evidence.

> What's much worse, it's not zero-bug.  We've got little bitty caches
> all over the backend, including (no doubt) some caching behavior in
> third-party code that wouldn't get the word about whatever API you
> invented to deal with this.

I think this is probably the biggest issue with the whole idea, and I
agree there would be some pain involved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Tom Lane
Дата:

Robert Haas <> writes:
> On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <> wrote:
>> "Flushing them all" is not zero-cost; it's not too hard to believe that
>> it could actually be slower than forking a clean new backend.

> I'm not so sure I believe that.

I'm not asserting it's true, just suggesting it's entirely possible.
Other than the fork() cost itself and whatever authentication activity
there might be, practically all the startup cost of a new backend can be
seen as cache-populating activities.  You'd have to redo all of that,
*plus* pay the costs of getting rid of the previous cache entries.
Maybe the latter costs less than a fork(), or maybe not.  fork() is
pretty cheap on modern Unixen.

>> What's much worse, it's not zero-bug.

> I think this is probably the biggest issue with the whole idea, and I
> agree there would be some pain involved.

Yeah, if it weren't for that I'd say "sure let's try it".  But I'm
afraid we'd be introducing significant headaches in return for a gain
that's quite speculative.

            regards, tom lane

От:
Robert Haas
Дата:

On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane <> wrote:
> Robert Haas <> writes:
>> On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <> wrote:
>>> "Flushing them all" is not zero-cost; it's not too hard to believe that
>>> it could actually be slower than forking a clean new backend.
>
>> I'm not so sure I believe that.
>
> I'm not asserting it's true, just suggesting it's entirely possible.
> Other than the fork() cost itself and whatever authentication activity
> there might be, practically all the startup cost of a new backend can be
> seen as cache-populating activities.  You'd have to redo all of that,
> *plus* pay the costs of getting rid of the previous cache entries.
> Maybe the latter costs less than a fork(), or maybe not.  fork() is
> pretty cheap on modern Unixen.
>
>>> What's much worse, it's not zero-bug.
>
>> I think this is probably the biggest issue with the whole idea, and I
>> agree there would be some pain involved.
>
> Yeah, if it weren't for that I'd say "sure let's try it".  But I'm
> afraid we'd be introducing significant headaches in return for a gain
> that's quite speculative.

I agree that the gain is minimal of itself; after all, how often do
you need to switch databases, and what's the big deal if the
postmaster has to fork a new backend?  Where I see it as a potentially
big win is when it comes to things like parallel query.  I can't help
thinking that's going to be a lot less efficient if you're forever
forking new backends.  Perhaps the point here is that you'd actually
sort of like to NOT flush all those caches unless it turns out that
you're switching databases - many installations probably operate with
essentially one big database, so chances are good that even if you
distributed connections / parallel queries to backends round-robin,
you'd potentially save quite a bit of overhead.  Of course, for the
guy who has TWO big databases, you might hope to be a little smarter,
but that's another problem altogether.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Tom Lane
Дата:

Robert Haas <> writes:
> On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane <> wrote:
>> Other than the fork() cost itself and whatever authentication activity
>> there might be, practically all the startup cost of a new backend can be
>> seen as cache-populating activities. �You'd have to redo all of that,
>> *plus* pay the costs of getting rid of the previous cache entries.
>> Maybe the latter costs less than a fork(), or maybe not. �fork() is
>> pretty cheap on modern Unixen.

> I agree that the gain is minimal of itself; after all, how often do
> you need to switch databases, and what's the big deal if the
> postmaster has to fork a new backend?  Where I see it as a potentially
> big win is when it comes to things like parallel query.  I can't help
> thinking that's going to be a lot less efficient if you're forever
> forking new backends.

Color me unconvinced.  To do parallel queries with pre-existing worker
processes, you'd need to hook up with a worker that was (at least) in
your own database, and then somehow feed it the query plan that it needs
to execute.  I'm thinking fork() could easily be cheaper.  But obviously
this is all speculation (... and Windows is going to be a whole 'nother
story in any case ...)

            regards, tom lane

От:
Craig Ringer
Дата:

On 28/07/10 04:40, Tom Lane wrote:
> Robert Haas <> writes:
>> On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <> wrote:
>>>> The problem is harder for us because a backend can't switch identities
>>>> once it's been assigned to a database.  I haven't heard an adequate
>>>> explanation of why that couldn't be changed, though.
>
>>> Possibly it might decrease the performance significantly enough by
>>> reducing the cache locality (syscache, prepared plans)?
>
>> Those things are backend-local.  The worst case scenario is you've got
>> to flush them all when you reinitialize, in which case you still save
>> the overhead of creating a new process.
>
> "Flushing them all" is not zero-cost; it's not too hard to believe that
> it could actually be slower than forking a clean new backend.
>
> What's much worse, it's not zero-bug.  We've got little bitty caches
> all over the backend, including (no doubt) some caching behavior in
> third-party code that wouldn't get the word about whatever API you
> invented to deal with this.

In addition to caches, there may be some places where memory is just
expected to leak. Since it's a one-off allocation, nobody really cares;
why bother cleaning it up when it's quicker to just let the OS do it
when the backend terminates?

Being able to reset a backend for re-use would require that per-session
memory use be as neatly managed as per-query and per-transaction memory,
with no leaked stragglers left lying around.

Such cleanup (and management) has its own costs. Plus, you have a
potentially increasingly fragmented memory map to deal with the longer
the backend lives. Overall, there are plenty of advantages to letting
the OS clean it up.

... however, if the requirement is introduced that a given backend may
only be re-used for connections to the same database, lots of things get
simpler. You have to be able to change the current user (which would be
a bonus anyway), reset GUCs, etc, but how much else is there to do?

That way you can maintain per-database pools of idle workers (apache
prefork style) with ageing-out of backends that're unused. Wouldn't this
do the vast majority of what most pools are needed for anyway? And
wouldn't it potentially save quite a bit of load by avoiding having
backends constantly switching databases, flushing caches, etc?

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

От:
Vitalii Tymchyshyn
Дата:

28.07.10 04:56, Tom Lane написав(ла):
>
> I'm not asserting it's true, just suggesting it's entirely possible.
> Other than the fork() cost itself and whatever authentication activity
> there might be, practically all the startup cost of a new backend can be
> seen as cache-populating activities.  You'd have to redo all of that,
> *plus* pay the costs of getting rid of the previous cache entries.
> Maybe the latter costs less than a fork(), or maybe not.  fork() is
> pretty cheap on modern Unixen.
>
>
Actually as for me, the problem is that one can't raise number of
database connections high without overloading CPU/memory/disk, so
external pooling is needed. If postgresql had something like
max_active_queries setting that limit number of connections that are not
in IDLE [in transaction] state, one could raise max connections high
(and I don't think idle process itself has much overhead) and limit
max_active_queries to get maximum performance and won't use external
pooling. Of course this won't help if the numbers are really high, but
could work out the most common cases.

Best regards, Vitalii Tymchyshyn

От:
Josh Berkus
Дата:

On 7/27/10 6:56 PM, Tom Lane wrote:
> Yeah, if it weren't for that I'd say "sure let's try it".  But I'm
> afraid we'd be introducing significant headaches in return for a gain
> that's quite speculative.

Well, the *gain* isn't speculative.  For example, I am once again
dealing with the issue that PG backend processes on Solaris never give
up their RAM, resulting in pathological swapping situations if you have
many idle connections.  This requires me to install pgpool, which is
overkill (since it has load balancing, replication, and more) just to
make sure that connections get recycled so that I don't have 300 idle
connections eating up 8GB of RAM.

Relative to switching databases, I'd tend to say that, like pgbouncer
and pgpool, we don't need to support that.  Each user/database combo can
have their own "pool".  While not ideal, this would be good enough for
90% of users.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

От:
Robert Haas
Дата:

On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus <> wrote:
> On 7/27/10 6:56 PM, Tom Lane wrote:
>> Yeah, if it weren't for that I'd say "sure let's try it".  But I'm
>> afraid we'd be introducing significant headaches in return for a gain
>> that's quite speculative.
>
> Well, the *gain* isn't speculative.  For example, I am once again
> dealing with the issue that PG backend processes on Solaris never give
> up their RAM, resulting in pathological swapping situations if you have
> many idle connections.  This requires me to install pgpool, which is
> overkill (since it has load balancing, replication, and more) just to
> make sure that connections get recycled so that I don't have 300 idle
> connections eating up 8GB of RAM.
>
> Relative to switching databases, I'd tend to say that, like pgbouncer
> and pgpool, we don't need to support that.  Each user/database combo can
> have their own "pool".  While not ideal, this would be good enough for
> 90% of users.

However, if we don't support that, we can't do any sort of pooling-ish
thing without the ability to pass file descriptors between processes;
and Tom seems fairly convinced there's no portable way to do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Tom Lane
Дата:

Robert Haas <> writes:
> However, if we don't support that, we can't do any sort of pooling-ish
> thing without the ability to pass file descriptors between processes;
> and Tom seems fairly convinced there's no portable way to do that.

Well, what it would come down to is: are we prepared to not support
pooling on platforms without such a capability?  It's certainly possible
to do it on many modern platforms, but I don't believe we can make it
happen everywhere.  Generally we've tried to avoid having major features
that don't work everywhere ...

            regards, tom lane

От:
Andres Freund
Дата:

On Wed, Jul 28, 2010 at 04:10:08PM -0400, Tom Lane wrote:
> Robert Haas <> writes:
> > However, if we don't support that, we can't do any sort of pooling-ish
> > thing without the ability to pass file descriptors between processes;
> > and Tom seems fairly convinced there's no portable way to do that.
>
> Well, what it would come down to is: are we prepared to not support
> pooling on platforms without such a capability?  It's certainly possible
> to do it on many modern platforms, but I don't believe we can make it
> happen everywhere.  Generally we've tried to avoid having major features
> that don't work everywhere ...
Which platforms do you have in mind here? All of the platforms I found
documented to be supported seem to support at least one of SCM_RIGHTS,
WSADuplicateSocket or STREAMS/FD_INSERT.
Most if not all beside windows support SCM_RIGHTS. The ones I am
dubious about support FD_INSERT...

Andres

От:
Robert Haas
Дата:

On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane <> wrote:
> Robert Haas <> writes:
>> However, if we don't support that, we can't do any sort of pooling-ish
>> thing without the ability to pass file descriptors between processes;
>> and Tom seems fairly convinced there's no portable way to do that.
>
> Well, what it would come down to is: are we prepared to not support
> pooling on platforms without such a capability?  It's certainly possible
> to do it on many modern platforms, but I don't believe we can make it
> happen everywhere.  Generally we've tried to avoid having major features
> that don't work everywhere ...

I suppose it depends on the magnitude of the benefit.  And how many
platforms aren't covered.  And how much code is required.  In short,
until someone writes a patch, who knows?  I think the core question we
should be thinking about is what would be the cleanest method of
resetting a backend - either for the same database or for a different
one, whichever seems easier.  And by cleanest, I mean least likely to
introduce bugs.  If we can get to the point where we have something to
play around with, even if it's kind of kludgey or doesn't quite work,
it'll give us some idea of whether further effort is worthwhile and
how it should be directed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Josh Berkus
Дата:

> introduce bugs.  If we can get to the point where we have something to
> play around with, even if it's kind of kludgey or doesn't quite work,
> it'll give us some idea of whether further effort is worthwhile and
> how it should be directed.

Should I put this on the TODO list,  then, in hopes that someone steps
forward?

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com