Обсуждение: Connection Pooling, a year later

Поиск
Список
Период
Сортировка

Connection Pooling, a year later

От
Michael Owens
Дата:
I was browsing through the TODO list and stumbled upon the connection pooling 
item discussed over a year ago. Is it safe to assume that since it's still 
listed in the TODO that it's still a desirable feature? Although I don't know 
claim to know jack about the postgresql source (although I am trying to learn 
my way around) it does seem like the initial idea proposed by Alfred 
Perlstein could offer considerable improvements with respect to postgresql's 
ability to handle larger numbers of connections. Passing file descriptors 
between processes aside, is it feasible to at least get the backend to where 
it could theoretically handle this --- multiple, simultaneous clients.

Even if the descriptor passing were not the best (most portable) way to get 
information from the client - postmaster to the backend, there still might be 
another way by which data from the client socket can be transmitted to a 
backend, thereby allowing the data from multiple clients to be sent to a 
single backend. For instance, the postmaster could accept() and also 
multiplex (select()/poll(), or use kqueue on BSD) across all file 
descriptors, sending and receiving data to/from the clients. After calling 
select(), It assigns/maps the client socket descriptor to one of the 
backends, which it is connected to via domain socket. All subsequent data 
coming from that client descriptor will be passed directly to that backend 
via the domain socket. Information sent from the backend to the postmaster 
through that socket includes the descriptor number of the client to whom that 
information is intended. The postmaster would multiplex across descriptors to 
clients and backends alike, effectively connecting the M clients to N 
backends, without having to pass descriptors.

I doubt if what I am going to say in this paragraph is new to any of you, so 
don't thing I'm being preachy, or that I think this is a novel idea. But 
allowing multiple backends to process multiple clients, if only considered 
from a mathematical standpoint, seems like a way to increase (perhaps even 
dramatically) the maximum number of connections postgresql can service while 
maintaining a fixed number of backends, and therefore a known performance 
level. The central issue is the fact that the database server reaches 
marginal returns with the number of active backend processes, beyond which 
the overhead from shared resource contention, context switching, etc., 
degrades performance. This ideal number, varying from machine to machine, can 
of course be controlled by setting the max number of connections. However, 
once reached, other clients are then locked out. When that number is reached, 
the real question becomes whether or not the server is really being used to 
its full potential. And most likely it is not --- because there is still a 
significant amount of idle time in each backend. And while the maximum number 
of connections have been reached, the overall utilization is less than what 
the machine can be performing, all the while locking other clients out. Then 
the pendulum can swing the other way: you set the max number of connections 
well beyond the ideal number (creating a buffer of sorts) to allow for this 
scenario, so those blocked clients are let in. The problem is that this also 
opens another worse-case scenario: what if all of the maximum connections are 
active, sending the number of active backends well beyond the ideal limit?

By having the postmaster map multiple clients to a fixed number of backends, 
you achieve the happy medium: You never exceed the ideal number of active 
backends, and at the same time you are not limited to only accepting a fixed 
number of connections. Accepting connections can now be based on load 
(however you wish to define it), not number.  You now make decisions based on 
utlization.

If it were shown that even half of a backend's life consisted of idle time, 
leasing out that idle time to another active connection would potentially 
double the average number of simultaneous requests without (theoretically) 
incurring any significant degradation in performance.

I have worked on code the uses this model, and would be glad to 
adapt/contribute it to postgresql. Currently it does pass files descriptors 
from a listening process to a queue process using send/recvmsg(). I have no 
trouble with it on Linux and BSD, but I don't pretend to know anything about 
portability. However, even it this is an issue, I would be willing to adapt 
the model mentioned above. Currently, the design I have is to pass in 
complete transactions, so that the N to M mapping can be achieved as an 
application server. The downside is that state cannot be maintained within 
the database (backend) as they are shared between clients on the application 
server. Every request the server makes to postgresql must be a standalone 
transaction.

So I guess my question is whether or not there is still interest in this, and 
whether there are still great difficulties in making the necessary changes to 
the backend so that it could handle multiple clients. If it does seem 
possible, and there is interest, then I would be willing to take a stab at it 
with the code I have developed.


Re: Connection Pooling, a year later

От
Tom Lane
Дата:
Michael Owens <owensmk@earthlink.net> writes:
> Even if the descriptor passing were not the best (most portable) way to get 
> information from the client - postmaster to the backend, there still might be
> another way by which data from the client socket can be transmitted to a 
> backend, thereby allowing the data from multiple clients to be sent to a 
> single backend. For instance, the postmaster could accept() and also 
> multiplex (select()/poll(), or use kqueue on BSD) across all file 
> descriptors, sending and receiving data to/from the clients.

That would turn the postmaster into a performance bottleneck, since it
would have to do work for every client interaction.

Another objection is that the postmaster would need to have two open
file descriptors (the original client connection and a pipe to the
backend) for every active connection.  On systems with a fairly low
max-files-per-process setting (~ 50 is not uncommon) that would put
a serious crimp in the number of connections that could be supported.

On the whole it strikes me as more practical to do connection pooling
on the client side...
        regards, tom lane


Re: Connection Pooling, a year later

От
mlw
Дата:
I don't get the deal with connection pooling.

Sure, there are some efficiencies in reducing the number of back-end postgres
processes, but at what I see as a huge complication.

Having experimented with Oracle's connection pooling, and watching either it or
PHP(Apache) crash because of a bug in the query state tracking, I figured I'd
buy some more RAM and forget about the process memory and call myself lucky.

If you have a web server and use (in PHP) pg_pConnect, you will get a
postgresql process for each http process on your web servers.

Beside memory, are there any real costs associated with having a good number of
idle PostgreSQL processes sitting around? 

Tom, Bruce?


Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> I don't get the deal with connection pooling.
> 
> Sure, there are some efficiencies in reducing the number of
> back-end postgres processes, but at what I see as a huge
> complication.
> 
> Having experimented with Oracle's connection pooling, and watching
> either it or PHP(Apache) crash because of a bug in the query
> state tracking, I figured I'd buy some more RAM and forget about
> the process memory and call myself lucky.
> 
> If you have a web server and use (in PHP) pg_pConnect, you will
> get a postgresql process for each http process on your web
> servers.
> 
> Beside memory, are there any real costs associated with having
> a good number of idle PostgreSQL processes sitting around?

I think it is the startup cost that most people want to avoid, and our's
is higher than most db's that use threads; at least I think so.

It would just be nice to have it done internally rather than have all
the clients do it, iff it can be done cleanly.

-- Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
mlw
Дата:
Bruce Momjian wrote:
> 
> > I don't get the deal with connection pooling.
> >
> > Sure, there are some efficiencies in reducing the number of
> > back-end postgres processes, but at what I see as a huge
> > complication.
> >
> > Having experimented with Oracle's connection pooling, and watching
> > either it or PHP(Apache) crash because of a bug in the query
> > state tracking, I figured I'd buy some more RAM and forget about
> > the process memory and call myself lucky.
> >
> > If you have a web server and use (in PHP) pg_pConnect, you will
> > get a postgresql process for each http process on your web
> > servers.
> >
> > Beside memory, are there any real costs associated with having
> > a good number of idle PostgreSQL processes sitting around?
> 
> I think it is the startup cost that most people want to avoid, and our's
> is higher than most db's that use threads; at least I think so.
> 
> It would just be nice to have it done internally rather than have all
> the clients do it, iff it can be done cleanly.

You can usually avoid most (all?) of the startup cost by using persistent
connections with PHP.

My concern is, and do you know, besides the memory used by idle postgres
processes, are there any performance reasons why connection pooling a fewer
number of processes, would perform better than a larger number of idle
persistent processes?

Unless it does, I would say that connection pooling is pointless.


Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> > I think it is the startup cost that most people want to avoid, and our's
> > is higher than most db's that use threads; at least I think so.
> > 
> > It would just be nice to have it done internally rather than have all
> > the clients do it, iff it can be done cleanly.
> 
> You can usually avoid most (all?) of the startup cost by using persistent
> connections with PHP.

Yes, that is assuming you are using PHP.  If you are using something
else, you connection pooling in there too.  All those client interfaces
reimplementing connection pooling seems like a waste to me.

> My concern is, and do you know, besides the memory used by idle postgres
> processes, are there any performance reasons why connection pooling a fewer
> number of processes, would perform better than a larger number of idle
> persistent processes?
> 
> Unless it does, I would say that connection pooling is pointless.

No, idle backends take minimal resources.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> If implemented, surely the best place to put it would be in libpq?  You
> could always add a function to lib pq to create a 'pooled' connection,
> rather than a normal connection.  Basically then the PHP guys would just use
> that instead of their own pg_connect function.  I guess it would mean that
> lots of people who use the pgsql client wouldn't have to rewrite their own
> connection sharing code.
> 
> However, where would you put all the options for the pool?  Like max
> processes, min processes, etc.
> 
> I have learnt that half the problem with connection pooling is transactions
> that fail to be rolled back...

The trick for that is to call COMMIT before you pass the backend to a
new person.  Now, if you want to abort a left-over transaction, you can
do an ABORT but that is going to show up in the server logs because an
ABORT without a transaction causes an error message.

We also have RESET ALL for connection pooling use.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
"Christopher Kings-Lynne"
Дата:
> If you have a web server and use (in PHP) pg_pConnect, you will get a
> postgresql process for each http process on your web servers.
>
> Beside memory, are there any real costs associated with having a
> good number of
> idle PostgreSQL processes sitting around?

If implemented, surely the best place to put it would be in libpq?  You
could always add a function to lib pq to create a 'pooled' connection,
rather than a normal connection.  Basically then the PHP guys would just use
that instead of their own pg_connect function.  I guess it would mean that
lots of people who use the pgsql client wouldn't have to rewrite their own
connection sharing code.

However, where would you put all the options for the pool?  Like max
processes, min processes, etc.

I have learnt that half the problem with connection pooling is transactions
that fail to be rolled back...

Chris



Re: Connection Pooling, a year later

От
"Mark Pritchard"
Дата:
> I think it is the startup cost that most people want to avoid, and our's
> is higher than most db's that use threads; at least I think so.
>
> It would just be nice to have it done internally rather than have all
> the clients do it, iff it can be done cleanly.

I'd add that client side connection pooling isn't effective in some cases
anyway - one application we work with has 4 physical application servers
running around 6 applications. Each of the applications was written by a
different vendor, and thus a pool size of five gives you 120 open
connections.

From another message, implementing it in libpq doesn't solve for JDBC
connectivity either.

My knowledge of the PostgreSQL internals is rather limited, but could you
not kick off a number of backends and use the already existing block of
shared memory to grab and process requests?

Cheers,

Mark Pritchard



Re: Connection Pooling, a year later

От
Lincoln Yeoh
Дата:
At 10:57 PM 12/17/01 -0500, Bruce Momjian wrote:
>Yes, that is assuming you are using PHP.  If you are using something
>else, you connection pooling in there too.  All those client interfaces
>reimplementing connection pooling seems like a waste to me.

But trying to connect and reconnect to an RDBMS 100 times a sec sounds
broken (plus authentication etc).

I personally think the fix for that should be at the client side. At worst
it should be in an intermediate application (listener). Otherwise it's like
trying to turn a db server into a webserver, quite a bit of work there.

>> My concern is, and do you know, besides the memory used by idle postgres
>> processes, are there any performance reasons why connection pooling a fewer
>> number of processes, would perform better than a larger number of idle
>> persistent processes?
>> 
>> Unless it does, I would say that connection pooling is pointless.
>
>No, idle backends take minimal resources.

I'd personally will be happy with a large number of backends then. Probably
more deterministic having everything fully loaded to the max. 

Cheerio,
Link.



Re: Connection Pooling, a year later

От
Lincoln Yeoh
Дата:
At 11:49 PM 12/17/01 -0500, Bruce Momjian wrote:
>new person.  Now, if you want to abort a left-over transaction, you can
>do an ABORT but that is going to show up in the server logs because an
>ABORT without a transaction causes an error message.

I do a lot of rollbacks typically. Would that cause errors?

I prefer doing rollbacks to commits when in doubt.

Regards,
Link.




Re: Connection Pooling, a year later

От
mlw
Дата:
Bruce Momjian wrote:
> 
> > If implemented, surely the best place to put it would be in libpq?  You
> > could always add a function to lib pq to create a 'pooled' connection,
> > rather than a normal connection.  Basically then the PHP guys would just use
> > that instead of their own pg_connect function.  I guess it would mean that
> > lots of people who use the pgsql client wouldn't have to rewrite their own
> > connection sharing code.
> >
> > However, where would you put all the options for the pool?  Like max
> > processes, min processes, etc.
> >
> > I have learnt that half the problem with connection pooling is transactions
> > that fail to be rolled back...
> 
> The trick for that is to call COMMIT before you pass the backend to a
> new person.  Now, if you want to abort a left-over transaction, you can
> do an ABORT but that is going to show up in the server logs because an
> ABORT without a transaction causes an error message.
> 
> We also have RESET ALL for connection pooling use.

The problem with connection polling, and it can be a very difficult problem, is
the state of the connection. What we saw with The Oracle connection pooling was
if a SQL query took too long, and/or the PHP front end timed out or crashed
(The XML library does this sometimes) that the Oracle connection was in a
strange state. Sometimes the connection stayed active with respect to the
pooling software, but brain dead. The apache processes which was lucky enough
to get that pooled connection either errored or hung.

There were a large number of virtually untraceable problems related to the
previous query and the previous client's behavior.

I know I am being alarmist, but my experience with connection pooling left a
bad taste in my mouth. I can see a persistent connection used per process, but
pooling "n" processes by a "x < n" connections is problematic. The pooling
software has to be able to detect and act upon the real "unexpected" status of
the back-end, not just what it thinks it is.

Most high performance software already has a notion of persistent connection,
which has been debugged and tuned. If there is no real benefit to reducing the
number of back-end processes, I think connection pooling is something that will
more problematic than productive.


Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> At 11:49 PM 12/17/01 -0500, Bruce Momjian wrote:
> >new person.  Now, if you want to abort a left-over transaction, you can
> >do an ABORT but that is going to show up in the server logs because an
> >ABORT without a transaction causes an error message.
> 
> I do a lot of rollbacks typically. Would that cause errors?
> 
> I prefer doing rollbacks to commits when in doubt.

No problem, it is just that rollbacks when you are not in a transaction
cause a log error message.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> No problem, it is just that rollbacks when you are not in a transaction
> cause a log error message.

I don't see any difference in the behavior: you get a notice either way.

regression=# commit;
NOTICE:  COMMIT: no transaction in progress
COMMIT
regression=# rollback;
NOTICE:  ROLLBACK: no transaction in progress
ROLLBACK
regression=#

My recommendation would generally be to do a ROLLBACK not a COMMIT, on
the grounds that if the previous user failed to complete his transaction
you probably want to abort it, not assume that it's safe to commit.

However, this safety-first approach might be unworkable if you have a
large body of existing code that all assumes it needn't issue COMMIT
explicitly.
        regards, tom lane


Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > No problem, it is just that rollbacks when you are not in a transaction
> > cause a log error message.
> 
> I don't see any difference in the behavior: you get a notice either way.
> 
> regression=# commit;
> NOTICE:  COMMIT: no transaction in progress
> COMMIT
> regression=# rollback;
> NOTICE:  ROLLBACK: no transaction in progress
> ROLLBACK
> regression=#
> 
> My recommendation would generally be to do a ROLLBACK not a COMMIT, on
> the grounds that if the previous user failed to complete his transaction
> you probably want to abort it, not assume that it's safe to commit.
> 
> However, this safety-first approach might be unworkable if you have a
> large body of existing code that all assumes it needn't issue COMMIT
> explicitly.

Sorry, I should have said do a "BEGIN;COMMIT;".  That only generates an
error message if a transaction was left open, and it commits the
left-open transaction.

We can add a SILENT keyword to COMMIT/ROLLBACK if people really want it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Bruce Momjian wrote:


> It would just be nice to have it done internally rather than have all
> the clients do it, iff it can be done cleanly.

Serious client applications that need it already do it.  Firing up an 
Oracle or most other db's isn't that lightweight a deal, either, it's 
not useful only for PG..

Personally I'd just view it as getting in the way, but then I use a 
webserver that's provided connection pooling for client threads for the 
last seven years ...

I agree with Tom that the client seems to be the best place to do this.

Among other things it isn't that difficult.  If you know how to fire up 
one connection, you know how to fire up N of them and adding logic to 
pool them afterwards is easy enough.
-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Bruce Momjian wrote:


> Yes, that is assuming you are using PHP.  If you are using something
> else, you connection pooling in there too.  All those client interfaces
> reimplementing connection pooling seems like a waste to me.


Effective pooling's pretty specific to your environment, though, so any 
general mechanism would have to provide a wide-ranging suite of 
parameters governing the number to pool, how long each handle should 
live, what to do if a handle's released by a client while in the midst 
of a transaction (AOLserver rolls back the transaction, other clients 
might want to do something else, i.e. fire a callback or the like), etc etc.

I think it would be fairly complex and for those high-throughput 
applications already written with client-side pooling no improvement.

And those are the only applications that need it.

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Mark Pritchard wrote:

>>I think it is the startup cost that most people want to avoid, and our's
>>is higher than most db's that use threads; at least I think so.
>>
>>It would just be nice to have it done internally rather than have all
>>the clients do it, iff it can be done cleanly.
>>
> 
> I'd add that client side connection pooling isn't effective in some cases
> anyway - one application we work with has 4 physical application servers
> running around 6 applications. Each of the applications was written by a
> different vendor, and thus a pool size of five gives you 120 open
> connections.

Tuning a central pooling mechanism to run well in this kind of situation 
isn't going to be a trivial task, either.  The next thing you'll want is 
some way to prioritize the various clients so your more serious 
applications have a better chance of getting a pool.

Or you'll want to set up subpools so they don't compete with each other, 
in effect replicating what's done now, but adding more complexity to the 
central service.

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Bruce Momjian wrote:


> 
> The trick for that is to call COMMIT before you pass the backend to a
> new person.


The failure to COMMIT is a programmer error - ROLLBACK's much safer.  At  least that's what we decided in the AOLserver
community,and that's 
 
what the drivers for Oracle and PG (the two I maintain) implement.

> Now, if you want to abort a left-over transaction, you can
> do an ABORT but that is going to show up in the server logs because an
> ABORT without a transaction causes an error message.


The connection pooling mechanism needs to track the transaction state 
and only ROLLBACK a handle that's not in autocommit state or in the 
midst of a BEGIN/END transaction (again, Oracle vs. PG)..

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Oleg Bartunov
Дата:
Does schema support will resolve this discussion ?
If I understand correctly, initial arguments for connection pooling
was restriction in number of persistent connections. it's right in
current postgresql that if one wants keep connection for performance
reason to several databases the total number of connections will
doubled, trippled and so on. But if I understand schema support will
eventually put away these problem because we could keep only one
pool of connections to the *one* database.
Oleg

On Tue, 18 Dec 2001, Don Baccus wrote:

> Bruce Momjian wrote:
>
>
> > Yes, that is assuming you are using PHP.  If you are using something
> > else, you connection pooling in there too.  All those client interfaces
> > reimplementing connection pooling seems like a waste to me.
>
>
> Effective pooling's pretty specific to your environment, though, so any
> general mechanism would have to provide a wide-ranging suite of
> parameters governing the number to pool, how long each handle should
> live, what to do if a handle's released by a client while in the midst
> of a transaction (AOLserver rolls back the transaction, other clients
> might want to do something else, i.e. fire a callback or the like), etc etc.
>
> I think it would be fairly complex and for those high-throughput
> applications already written with client-side pooling no improvement.
>
> And those are the only applications that need it.
>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> Bruce Momjian wrote:
> 
> 
> > 
> > The trick for that is to call COMMIT before you pass the backend to a
> > new person.
> 
> 
> The failure to COMMIT is a programmer error - ROLLBACK's much safer.  At 
>   least that's what we decided in the AOLserver community, and that's 
> what the drivers for Oracle and PG (the two I maintain) implement.


Then you can issue a "BEGIN;ROLLBACK;" when you pass the session to the
next user, and "RESET ALL;" of course.

> > Now, if you want to abort a left-over transaction, you can
> > do an ABORT but that is going to show up in the server logs because an
> > ABORT without a transaction causes an error message.
> 
> 
> The connection pooling mechanism needs to track the transaction state 
> and only ROLLBACK a handle that's not in autocommit state or in the 
> midst of a BEGIN/END transaction (again, Oracle vs. PG)..

Seems like a lot of work to keep track of transaction state in the
client;  seems easier to just unconditionally issue the begin;rollback.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Bruce Momjian wrote:


> Seems like a lot of work to keep track of transaction state in the
> client;  seems easier to just unconditionally issue the begin;rollback.


Well, in the Oracle and PG drivers for AOLserver it wasn't, but then 
again applications code in that environment doesn't call libpq directly 
but through an abstraction layer that works with all DBs (the layer 
does, the query strings obviously sometimes don't!).  The db primitives 
then call an RDBMS-specific driver, which can call thread-safe RDMBS 
client libraries directly or call an external driver (possibly the 
external ODBC driver) for RDBMS's without a thread-safe client library.

So we can track things easily.  Along with other things, for instance 
retrying queries in one backend after another backend has bombed out and 
given the nice little message saying "another backend has closed, please 
retry your query".  Luckily it was pretty easy to kill PG 6.5 so I could 
test and debug this feature...

I suspect that major applications that support multiple RDBMS's take a 
somewhat similar approach.  In the context of providing an abstract 
database API for one's client code, adding persistent connection pooling 
seems pretty minor.

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Bruce Momjian
Дата:
> I suspect that major applications that support multiple RDBMS's take a 
> somewhat similar approach.  In the context of providing an abstract 
> database API for one's client code, adding persistent connection pooling 
> seems pretty minor.

Yes, with that abstraction layer, it is quite easy.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Connection Pooling, a year later

От
Andrew McMillan
Дата:
On Tue, 2001-12-18 at 13:46, Michael Owens wrote:
> 
> By having the postmaster map multiple clients to a fixed number of backends, 
> you achieve the happy medium: You never exceed the ideal number of active 
> backends, and at the same time you are not limited to only accepting a fixed 
> number of connections. Accepting connections can now be based on load 
> (however you wish to define it), not number.  You now make decisions based on 
> utlization.
> 
> If it were shown that even half of a backend's life consisted of idle time, 
> leasing out that idle time to another active connection would potentially 
> double the average number of simultaneous requests without (theoretically) 
> incurring any significant degradation in performance.
> 

Have you looked at the client-side connection pooling solutions out
there?

DBBalancer ( http://dbbalancer.sourceforge.net/ ) tries to sit very
transparently between your application and PostgreSQL, letting you
implement connection pooling with almost no application changes.

There was another one I came across too, but that one requires you to
make more wide-reaching changes to the application.

In my applications I have found DBBalancer to be roughly the same level
of performance as PHP persistent connections, but a lot fewer
connections are needed in the pool because they are only needed when
Apache is delivering dynamic content - not the associated static
stylesheets and images.

Regards,                Andrew.
-- 
--------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/        PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201    MOB: +64(21)635-694    OFFICE: +64(4)499-2267      Are you enrolled at
http://schoolreunions.co.nz/yet?
 



Re: Connection Pooling, a year later

От
Michael Owens
Дата:
As long as each client's call is composed of a standalone transaction, there 
is no problem with external connection pools. But what about when a client's 
transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling 
is not safe: it offers no assurance of what may be interjected into an open 
transaction between calls. For example, each is a separate call to a shared 
connection:

Client A:  BEGIN WORK; SELECT last_name from customer for update where <X>;

Client B:  BEGIN WORK; SELECT street from customer for update where <Y>;

Client A:  update customer set lastname=<modified value> where <X>; COMMIT 
WORK;


Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's 
lock is still hanging around. While Client B's commit will close it, Client B 
has lost the assurance of its lock, defeating the purpose of SELECT FOR 
UPDATE.

If this is corrent, then external connection pools limit what you can do with 
the database to a single call. Any transaction spanning more than one call is 
unsafe, because it is not isolated from other clients sharing the same 
connection.



On Tuesday 18 December 2001 11:30 pm, Andrew McMillan wrote:
> On Tue, 2001-12-18 at 13:46, Michael Owens wrote:
> > By having the postmaster map multiple clients to a fixed number of
> > backends, you achieve the happy medium: You never exceed the ideal number
> > of active backends, and at the same time you are not limited to only
> > accepting a fixed number of connections. Accepting connections can now be
> > based on load (however you wish to define it), not number.  You now make
> > decisions based on utlization.
> >
> > If it were shown that even half of a backend's life consisted of idle
> > time, leasing out that idle time to another active connection would
> > potentially double the average number of simultaneous requests without
> > (theoretically) incurring any significant degradation in performance.
>
> Have you looked at the client-side connection pooling solutions out
> there?
>
> DBBalancer ( http://dbbalancer.sourceforge.net/ ) tries to sit very
> transparently between your application and PostgreSQL, letting you
> implement connection pooling with almost no application changes.
>
> There was another one I came across too, but that one requires you to
> make more wide-reaching changes to the application.
>
> In my applications I have found DBBalancer to be roughly the same level
> of performance as PHP persistent connections, but a lot fewer
> connections are needed in the pool because they are only needed when
> Apache is delivering dynamic content - not the associated static
> stylesheets and images.
>
> Regards,
>                     Andrew.


Re: Connection Pooling, a year later

От
Don Baccus
Дата:
Michael Owens wrote:

> As long as each client's call is composed of a standalone transaction, there 
> is no problem with external connection pools. But what about when a client's 
> transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling 
> is not safe: it offers no assurance of what may be interjected into an open 
> transaction between calls. For example, each is a separate call to a shared 
> connection:
> 
> Client A:  BEGIN WORK; SELECT last_name from customer for update where <X>;
> 
> Client B:  BEGIN WORK; SELECT street from customer for update where <Y>;
> 
> Client A:  update customer set lastname=<modified value> where <X>; COMMIT 
> WORK;
> 
> 
> Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's 
> lock is still hanging around. While Client B's commit will close it, Client B 
> has lost the assurance of its lock, defeating the purpose of SELECT FOR 
> UPDATE.
> 
> If this is corrent, then external connection pools limit what you can do with 
> the database to a single call. Any transaction spanning more than one call is 
> unsafe, because it is not isolated from other clients sharing the same 
> connection.


The general idea is that you grab a handle and hold onto it until you're 
done.  This makes the above scenario impossible.

Forgetting to commit or rollback before relenquishing the handle is 
another scenario that can lead to problems but that's already been 
discussed in detail.

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: Connection Pooling, a year later

От
Michael Owens
Дата:
On Wednesday 19 December 2001 01:04 pm, Don Baccus wrote:


> The general idea is that you grab a handle and hold onto it until you're
> done.  This makes the above scenario impossible.
>
> Forgetting to commit or rollback before relenquishing the handle is
> another scenario that can lead to problems but that's already been
> discussed in detail.

But then the shared connection is unshared, sitting idle while the client 
works in between calls, thus introducing idle time among a fixed number of 
connections. The server is doing less than it could.

I agree that this connection pool has improved things in eliminating backend 
startup time. But idle time still exists for the clients performing multiple 
calls, proportional to the product of the number of multiple call clients and 
the number of calls they make, plus the idle time between them.

However this probably only ever happens on update. Inserts and selects can be 
done in one call. And, I suppose updates comprise only a small fraction of 
the requests sent to the database. Even then, you can probably eliminate some 
multiple calls by using things such as procedures.

Factoring all that in, you can probably do as well by optimizing your 
particular database/application than by writing code.

I relent. Thanks for your thoughts.


Re: Connection Pooling, a year later

От
Andrew McMillan
Дата:
On Thu, 2001-12-20 at 07:22, Michael Owens wrote:
> As long as each client's call is composed of a standalone transaction, there 
> is no problem with external connection pools. But what about when a client's 
> transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling 
> is not safe: it offers no assurance of what may be interjected into an open 
> transaction between calls. For example, each is a separate call to a shared 
> connection:
> 
> Client A:  BEGIN WORK; SELECT last_name from customer for update where <X>;
> 
> Client B:  BEGIN WORK; SELECT street from customer for update where <Y>;
> 
> Client A:  update customer set lastname=<modified value> where <X>; COMMIT 
> WORK;
> 
> 
> Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's 
> lock is still hanging around. While Client B's commit will close it, Client B 
> has lost the assurance of its lock, defeating the purpose of SELECT FOR 
> UPDATE.
> 
> If this is corrent, then external connection pools limit what you can do with 
> the database to a single call. Any transaction spanning more than one call is 
> unsafe, because it is not isolated from other clients sharing the same 
> connection.

Oh, I see.  You are absolutely correct that client-side pooling wouldn't
work in that situation of course.

As an application developer nobody has forced me into such a corner yet,
however.  Long running transactions are something I avoid like the
plague.

Cheers,                Andrew.
-- 
--------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/        PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201    MOB: +64(21)635-694    OFFICE: +64(4)499-2267      Are you enrolled at
http://schoolreunions.co.nz/yet?