Обсуждение: Connection Pooling, a year later
I was browsing through the TODO list and stumbled upon the connection pooling item discussed over a year ago. Is it safe to assume that since it's still listed in the TODO that it's still a desirable feature? Although I don't know claim to know jack about the postgresql source (although I am trying to learn my way around) it does seem like the initial idea proposed by Alfred Perlstein could offer considerable improvements with respect to postgresql's ability to handle larger numbers of connections. Passing file descriptors between processes aside, is it feasible to at least get the backend to where it could theoretically handle this --- multiple, simultaneous clients. Even if the descriptor passing were not the best (most portable) way to get information from the client - postmaster to the backend, there still might be another way by which data from the client socket can be transmitted to a backend, thereby allowing the data from multiple clients to be sent to a single backend. For instance, the postmaster could accept() and also multiplex (select()/poll(), or use kqueue on BSD) across all file descriptors, sending and receiving data to/from the clients. After calling select(), It assigns/maps the client socket descriptor to one of the backends, which it is connected to via domain socket. All subsequent data coming from that client descriptor will be passed directly to that backend via the domain socket. Information sent from the backend to the postmaster through that socket includes the descriptor number of the client to whom that information is intended. The postmaster would multiplex across descriptors to clients and backends alike, effectively connecting the M clients to N backends, without having to pass descriptors. I doubt if what I am going to say in this paragraph is new to any of you, so don't thing I'm being preachy, or that I think this is a novel idea. But allowing multiple backends to process multiple clients, if only considered from a mathematical standpoint, seems like a way to increase (perhaps even dramatically) the maximum number of connections postgresql can service while maintaining a fixed number of backends, and therefore a known performance level. The central issue is the fact that the database server reaches marginal returns with the number of active backend processes, beyond which the overhead from shared resource contention, context switching, etc., degrades performance. This ideal number, varying from machine to machine, can of course be controlled by setting the max number of connections. However, once reached, other clients are then locked out. When that number is reached, the real question becomes whether or not the server is really being used to its full potential. And most likely it is not --- because there is still a significant amount of idle time in each backend. And while the maximum number of connections have been reached, the overall utilization is less than what the machine can be performing, all the while locking other clients out. Then the pendulum can swing the other way: you set the max number of connections well beyond the ideal number (creating a buffer of sorts) to allow for this scenario, so those blocked clients are let in. The problem is that this also opens another worse-case scenario: what if all of the maximum connections are active, sending the number of active backends well beyond the ideal limit? By having the postmaster map multiple clients to a fixed number of backends, you achieve the happy medium: You never exceed the ideal number of active backends, and at the same time you are not limited to only accepting a fixed number of connections. Accepting connections can now be based on load (however you wish to define it), not number. You now make decisions based on utlization. If it were shown that even half of a backend's life consisted of idle time, leasing out that idle time to another active connection would potentially double the average number of simultaneous requests without (theoretically) incurring any significant degradation in performance. I have worked on code the uses this model, and would be glad to adapt/contribute it to postgresql. Currently it does pass files descriptors from a listening process to a queue process using send/recvmsg(). I have no trouble with it on Linux and BSD, but I don't pretend to know anything about portability. However, even it this is an issue, I would be willing to adapt the model mentioned above. Currently, the design I have is to pass in complete transactions, so that the N to M mapping can be achieved as an application server. The downside is that state cannot be maintained within the database (backend) as they are shared between clients on the application server. Every request the server makes to postgresql must be a standalone transaction. So I guess my question is whether or not there is still interest in this, and whether there are still great difficulties in making the necessary changes to the backend so that it could handle multiple clients. If it does seem possible, and there is interest, then I would be willing to take a stab at it with the code I have developed.
Michael Owens <owensmk@earthlink.net> writes: > Even if the descriptor passing were not the best (most portable) way to get > information from the client - postmaster to the backend, there still might be > another way by which data from the client socket can be transmitted to a > backend, thereby allowing the data from multiple clients to be sent to a > single backend. For instance, the postmaster could accept() and also > multiplex (select()/poll(), or use kqueue on BSD) across all file > descriptors, sending and receiving data to/from the clients. That would turn the postmaster into a performance bottleneck, since it would have to do work for every client interaction. Another objection is that the postmaster would need to have two open file descriptors (the original client connection and a pipe to the backend) for every active connection. On systems with a fairly low max-files-per-process setting (~ 50 is not uncommon) that would put a serious crimp in the number of connections that could be supported. On the whole it strikes me as more practical to do connection pooling on the client side... regards, tom lane
I don't get the deal with connection pooling. Sure, there are some efficiencies in reducing the number of back-end postgres processes, but at what I see as a huge complication. Having experimented with Oracle's connection pooling, and watching either it or PHP(Apache) crash because of a bug in the query state tracking, I figured I'd buy some more RAM and forget about the process memory and call myself lucky. If you have a web server and use (in PHP) pg_pConnect, you will get a postgresql process for each http process on your web servers. Beside memory, are there any real costs associated with having a good number of idle PostgreSQL processes sitting around? Tom, Bruce?
> I don't get the deal with connection pooling. > > Sure, there are some efficiencies in reducing the number of > back-end postgres processes, but at what I see as a huge > complication. > > Having experimented with Oracle's connection pooling, and watching > either it or PHP(Apache) crash because of a bug in the query > state tracking, I figured I'd buy some more RAM and forget about > the process memory and call myself lucky. > > If you have a web server and use (in PHP) pg_pConnect, you will > get a postgresql process for each http process on your web > servers. > > Beside memory, are there any real costs associated with having > a good number of idle PostgreSQL processes sitting around? I think it is the startup cost that most people want to avoid, and our's is higher than most db's that use threads; at least I think so. It would just be nice to have it done internally rather than have all the clients do it, iff it can be done cleanly. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > I don't get the deal with connection pooling. > > > > Sure, there are some efficiencies in reducing the number of > > back-end postgres processes, but at what I see as a huge > > complication. > > > > Having experimented with Oracle's connection pooling, and watching > > either it or PHP(Apache) crash because of a bug in the query > > state tracking, I figured I'd buy some more RAM and forget about > > the process memory and call myself lucky. > > > > If you have a web server and use (in PHP) pg_pConnect, you will > > get a postgresql process for each http process on your web > > servers. > > > > Beside memory, are there any real costs associated with having > > a good number of idle PostgreSQL processes sitting around? > > I think it is the startup cost that most people want to avoid, and our's > is higher than most db's that use threads; at least I think so. > > It would just be nice to have it done internally rather than have all > the clients do it, iff it can be done cleanly. You can usually avoid most (all?) of the startup cost by using persistent connections with PHP. My concern is, and do you know, besides the memory used by idle postgres processes, are there any performance reasons why connection pooling a fewer number of processes, would perform better than a larger number of idle persistent processes? Unless it does, I would say that connection pooling is pointless.
> > I think it is the startup cost that most people want to avoid, and our's > > is higher than most db's that use threads; at least I think so. > > > > It would just be nice to have it done internally rather than have all > > the clients do it, iff it can be done cleanly. > > You can usually avoid most (all?) of the startup cost by using persistent > connections with PHP. Yes, that is assuming you are using PHP. If you are using something else, you connection pooling in there too. All those client interfaces reimplementing connection pooling seems like a waste to me. > My concern is, and do you know, besides the memory used by idle postgres > processes, are there any performance reasons why connection pooling a fewer > number of processes, would perform better than a larger number of idle > persistent processes? > > Unless it does, I would say that connection pooling is pointless. No, idle backends take minimal resources. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> If implemented, surely the best place to put it would be in libpq? You > could always add a function to lib pq to create a 'pooled' connection, > rather than a normal connection. Basically then the PHP guys would just use > that instead of their own pg_connect function. I guess it would mean that > lots of people who use the pgsql client wouldn't have to rewrite their own > connection sharing code. > > However, where would you put all the options for the pool? Like max > processes, min processes, etc. > > I have learnt that half the problem with connection pooling is transactions > that fail to be rolled back... The trick for that is to call COMMIT before you pass the backend to a new person. Now, if you want to abort a left-over transaction, you can do an ABORT but that is going to show up in the server logs because an ABORT without a transaction causes an error message. We also have RESET ALL for connection pooling use. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> If you have a web server and use (in PHP) pg_pConnect, you will get a > postgresql process for each http process on your web servers. > > Beside memory, are there any real costs associated with having a > good number of > idle PostgreSQL processes sitting around? If implemented, surely the best place to put it would be in libpq? You could always add a function to lib pq to create a 'pooled' connection, rather than a normal connection. Basically then the PHP guys would just use that instead of their own pg_connect function. I guess it would mean that lots of people who use the pgsql client wouldn't have to rewrite their own connection sharing code. However, where would you put all the options for the pool? Like max processes, min processes, etc. I have learnt that half the problem with connection pooling is transactions that fail to be rolled back... Chris
> I think it is the startup cost that most people want to avoid, and our's > is higher than most db's that use threads; at least I think so. > > It would just be nice to have it done internally rather than have all > the clients do it, iff it can be done cleanly. I'd add that client side connection pooling isn't effective in some cases anyway - one application we work with has 4 physical application servers running around 6 applications. Each of the applications was written by a different vendor, and thus a pool size of five gives you 120 open connections. From another message, implementing it in libpq doesn't solve for JDBC connectivity either. My knowledge of the PostgreSQL internals is rather limited, but could you not kick off a number of backends and use the already existing block of shared memory to grab and process requests? Cheers, Mark Pritchard
At 10:57 PM 12/17/01 -0500, Bruce Momjian wrote: >Yes, that is assuming you are using PHP. If you are using something >else, you connection pooling in there too. All those client interfaces >reimplementing connection pooling seems like a waste to me. But trying to connect and reconnect to an RDBMS 100 times a sec sounds broken (plus authentication etc). I personally think the fix for that should be at the client side. At worst it should be in an intermediate application (listener). Otherwise it's like trying to turn a db server into a webserver, quite a bit of work there. >> My concern is, and do you know, besides the memory used by idle postgres >> processes, are there any performance reasons why connection pooling a fewer >> number of processes, would perform better than a larger number of idle >> persistent processes? >> >> Unless it does, I would say that connection pooling is pointless. > >No, idle backends take minimal resources. I'd personally will be happy with a large number of backends then. Probably more deterministic having everything fully loaded to the max. Cheerio, Link.
At 11:49 PM 12/17/01 -0500, Bruce Momjian wrote: >new person. Now, if you want to abort a left-over transaction, you can >do an ABORT but that is going to show up in the server logs because an >ABORT without a transaction causes an error message. I do a lot of rollbacks typically. Would that cause errors? I prefer doing rollbacks to commits when in doubt. Regards, Link.
Bruce Momjian wrote: > > > If implemented, surely the best place to put it would be in libpq? You > > could always add a function to lib pq to create a 'pooled' connection, > > rather than a normal connection. Basically then the PHP guys would just use > > that instead of their own pg_connect function. I guess it would mean that > > lots of people who use the pgsql client wouldn't have to rewrite their own > > connection sharing code. > > > > However, where would you put all the options for the pool? Like max > > processes, min processes, etc. > > > > I have learnt that half the problem with connection pooling is transactions > > that fail to be rolled back... > > The trick for that is to call COMMIT before you pass the backend to a > new person. Now, if you want to abort a left-over transaction, you can > do an ABORT but that is going to show up in the server logs because an > ABORT without a transaction causes an error message. > > We also have RESET ALL for connection pooling use. The problem with connection polling, and it can be a very difficult problem, is the state of the connection. What we saw with The Oracle connection pooling was if a SQL query took too long, and/or the PHP front end timed out or crashed (The XML library does this sometimes) that the Oracle connection was in a strange state. Sometimes the connection stayed active with respect to the pooling software, but brain dead. The apache processes which was lucky enough to get that pooled connection either errored or hung. There were a large number of virtually untraceable problems related to the previous query and the previous client's behavior. I know I am being alarmist, but my experience with connection pooling left a bad taste in my mouth. I can see a persistent connection used per process, but pooling "n" processes by a "x < n" connections is problematic. The pooling software has to be able to detect and act upon the real "unexpected" status of the back-end, not just what it thinks it is. Most high performance software already has a notion of persistent connection, which has been debugged and tuned. If there is no real benefit to reducing the number of back-end processes, I think connection pooling is something that will more problematic than productive.
> At 11:49 PM 12/17/01 -0500, Bruce Momjian wrote: > >new person. Now, if you want to abort a left-over transaction, you can > >do an ABORT but that is going to show up in the server logs because an > >ABORT without a transaction causes an error message. > > I do a lot of rollbacks typically. Would that cause errors? > > I prefer doing rollbacks to commits when in doubt. No problem, it is just that rollbacks when you are not in a transaction cause a log error message. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > No problem, it is just that rollbacks when you are not in a transaction > cause a log error message. I don't see any difference in the behavior: you get a notice either way. regression=# commit; NOTICE: COMMIT: no transaction in progress COMMIT regression=# rollback; NOTICE: ROLLBACK: no transaction in progress ROLLBACK regression=# My recommendation would generally be to do a ROLLBACK not a COMMIT, on the grounds that if the previous user failed to complete his transaction you probably want to abort it, not assume that it's safe to commit. However, this safety-first approach might be unworkable if you have a large body of existing code that all assumes it needn't issue COMMIT explicitly. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > No problem, it is just that rollbacks when you are not in a transaction > > cause a log error message. > > I don't see any difference in the behavior: you get a notice either way. > > regression=# commit; > NOTICE: COMMIT: no transaction in progress > COMMIT > regression=# rollback; > NOTICE: ROLLBACK: no transaction in progress > ROLLBACK > regression=# > > My recommendation would generally be to do a ROLLBACK not a COMMIT, on > the grounds that if the previous user failed to complete his transaction > you probably want to abort it, not assume that it's safe to commit. > > However, this safety-first approach might be unworkable if you have a > large body of existing code that all assumes it needn't issue COMMIT > explicitly. Sorry, I should have said do a "BEGIN;COMMIT;". That only generates an error message if a transaction was left open, and it commits the left-open transaction. We can add a SILENT keyword to COMMIT/ROLLBACK if people really want it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > It would just be nice to have it done internally rather than have all > the clients do it, iff it can be done cleanly. Serious client applications that need it already do it. Firing up an Oracle or most other db's isn't that lightweight a deal, either, it's not useful only for PG.. Personally I'd just view it as getting in the way, but then I use a webserver that's provided connection pooling for client threads for the last seven years ... I agree with Tom that the client seems to be the best place to do this. Among other things it isn't that difficult. If you know how to fire up one connection, you know how to fire up N of them and adding logic to pool them afterwards is easy enough. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
Bruce Momjian wrote: > Yes, that is assuming you are using PHP. If you are using something > else, you connection pooling in there too. All those client interfaces > reimplementing connection pooling seems like a waste to me. Effective pooling's pretty specific to your environment, though, so any general mechanism would have to provide a wide-ranging suite of parameters governing the number to pool, how long each handle should live, what to do if a handle's released by a client while in the midst of a transaction (AOLserver rolls back the transaction, other clients might want to do something else, i.e. fire a callback or the like), etc etc. I think it would be fairly complex and for those high-throughput applications already written with client-side pooling no improvement. And those are the only applications that need it. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
Mark Pritchard wrote: >>I think it is the startup cost that most people want to avoid, and our's >>is higher than most db's that use threads; at least I think so. >> >>It would just be nice to have it done internally rather than have all >>the clients do it, iff it can be done cleanly. >> > > I'd add that client side connection pooling isn't effective in some cases > anyway - one application we work with has 4 physical application servers > running around 6 applications. Each of the applications was written by a > different vendor, and thus a pool size of five gives you 120 open > connections. Tuning a central pooling mechanism to run well in this kind of situation isn't going to be a trivial task, either. The next thing you'll want is some way to prioritize the various clients so your more serious applications have a better chance of getting a pool. Or you'll want to set up subpools so they don't compete with each other, in effect replicating what's done now, but adding more complexity to the central service. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
Bruce Momjian wrote: > > The trick for that is to call COMMIT before you pass the backend to a > new person. The failure to COMMIT is a programmer error - ROLLBACK's much safer. At least that's what we decided in the AOLserver community,and that's what the drivers for Oracle and PG (the two I maintain) implement. > Now, if you want to abort a left-over transaction, you can > do an ABORT but that is going to show up in the server logs because an > ABORT without a transaction causes an error message. The connection pooling mechanism needs to track the transaction state and only ROLLBACK a handle that's not in autocommit state or in the midst of a BEGIN/END transaction (again, Oracle vs. PG).. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
Does schema support will resolve this discussion ? If I understand correctly, initial arguments for connection pooling was restriction in number of persistent connections. it's right in current postgresql that if one wants keep connection for performance reason to several databases the total number of connections will doubled, trippled and so on. But if I understand schema support will eventually put away these problem because we could keep only one pool of connections to the *one* database. Oleg On Tue, 18 Dec 2001, Don Baccus wrote: > Bruce Momjian wrote: > > > > Yes, that is assuming you are using PHP. If you are using something > > else, you connection pooling in there too. All those client interfaces > > reimplementing connection pooling seems like a waste to me. > > > Effective pooling's pretty specific to your environment, though, so any > general mechanism would have to provide a wide-ranging suite of > parameters governing the number to pool, how long each handle should > live, what to do if a handle's released by a client while in the midst > of a transaction (AOLserver rolls back the transaction, other clients > might want to do something else, i.e. fire a callback or the like), etc etc. > > I think it would be fairly complex and for those high-throughput > applications already written with client-side pooling no improvement. > > And those are the only applications that need it. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> Bruce Momjian wrote: > > > > > > The trick for that is to call COMMIT before you pass the backend to a > > new person. > > > The failure to COMMIT is a programmer error - ROLLBACK's much safer. At > least that's what we decided in the AOLserver community, and that's > what the drivers for Oracle and PG (the two I maintain) implement. Then you can issue a "BEGIN;ROLLBACK;" when you pass the session to the next user, and "RESET ALL;" of course. > > Now, if you want to abort a left-over transaction, you can > > do an ABORT but that is going to show up in the server logs because an > > ABORT without a transaction causes an error message. > > > The connection pooling mechanism needs to track the transaction state > and only ROLLBACK a handle that's not in autocommit state or in the > midst of a BEGIN/END transaction (again, Oracle vs. PG).. Seems like a lot of work to keep track of transaction state in the client; seems easier to just unconditionally issue the begin;rollback. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Seems like a lot of work to keep track of transaction state in the > client; seems easier to just unconditionally issue the begin;rollback. Well, in the Oracle and PG drivers for AOLserver it wasn't, but then again applications code in that environment doesn't call libpq directly but through an abstraction layer that works with all DBs (the layer does, the query strings obviously sometimes don't!). The db primitives then call an RDBMS-specific driver, which can call thread-safe RDMBS client libraries directly or call an external driver (possibly the external ODBC driver) for RDBMS's without a thread-safe client library. So we can track things easily. Along with other things, for instance retrying queries in one backend after another backend has bombed out and given the nice little message saying "another backend has closed, please retry your query". Luckily it was pretty easy to kill PG 6.5 so I could test and debug this feature... I suspect that major applications that support multiple RDBMS's take a somewhat similar approach. In the context of providing an abstract database API for one's client code, adding persistent connection pooling seems pretty minor. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
> I suspect that major applications that support multiple RDBMS's take a > somewhat similar approach. In the context of providing an abstract > database API for one's client code, adding persistent connection pooling > seems pretty minor. Yes, with that abstraction layer, it is quite easy. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 2001-12-18 at 13:46, Michael Owens wrote: > > By having the postmaster map multiple clients to a fixed number of backends, > you achieve the happy medium: You never exceed the ideal number of active > backends, and at the same time you are not limited to only accepting a fixed > number of connections. Accepting connections can now be based on load > (however you wish to define it), not number. You now make decisions based on > utlization. > > If it were shown that even half of a backend's life consisted of idle time, > leasing out that idle time to another active connection would potentially > double the average number of simultaneous requests without (theoretically) > incurring any significant degradation in performance. > Have you looked at the client-side connection pooling solutions out there? DBBalancer ( http://dbbalancer.sourceforge.net/ ) tries to sit very transparently between your application and PostgreSQL, letting you implement connection pooling with almost no application changes. There was another one I came across too, but that one requires you to make more wide-reaching changes to the application. In my applications I have found DBBalancer to be roughly the same level of performance as PHP persistent connections, but a lot fewer connections are needed in the pool because they are only needed when Apache is delivering dynamic content - not the associated static stylesheets and images. Regards, Andrew. -- -------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Are you enrolled at http://schoolreunions.co.nz/yet?
As long as each client's call is composed of a standalone transaction, there is no problem with external connection pools. But what about when a client's transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling is not safe: it offers no assurance of what may be interjected into an open transaction between calls. For example, each is a separate call to a shared connection: Client A: BEGIN WORK; SELECT last_name from customer for update where <X>; Client B: BEGIN WORK; SELECT street from customer for update where <Y>; Client A: update customer set lastname=<modified value> where <X>; COMMIT WORK; Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's lock is still hanging around. While Client B's commit will close it, Client B has lost the assurance of its lock, defeating the purpose of SELECT FOR UPDATE. If this is corrent, then external connection pools limit what you can do with the database to a single call. Any transaction spanning more than one call is unsafe, because it is not isolated from other clients sharing the same connection. On Tuesday 18 December 2001 11:30 pm, Andrew McMillan wrote: > On Tue, 2001-12-18 at 13:46, Michael Owens wrote: > > By having the postmaster map multiple clients to a fixed number of > > backends, you achieve the happy medium: You never exceed the ideal number > > of active backends, and at the same time you are not limited to only > > accepting a fixed number of connections. Accepting connections can now be > > based on load (however you wish to define it), not number. You now make > > decisions based on utlization. > > > > If it were shown that even half of a backend's life consisted of idle > > time, leasing out that idle time to another active connection would > > potentially double the average number of simultaneous requests without > > (theoretically) incurring any significant degradation in performance. > > Have you looked at the client-side connection pooling solutions out > there? > > DBBalancer ( http://dbbalancer.sourceforge.net/ ) tries to sit very > transparently between your application and PostgreSQL, letting you > implement connection pooling with almost no application changes. > > There was another one I came across too, but that one requires you to > make more wide-reaching changes to the application. > > In my applications I have found DBBalancer to be roughly the same level > of performance as PHP persistent connections, but a lot fewer > connections are needed in the pool because they are only needed when > Apache is delivering dynamic content - not the associated static > stylesheets and images. > > Regards, > Andrew.
Michael Owens wrote: > As long as each client's call is composed of a standalone transaction, there > is no problem with external connection pools. But what about when a client's > transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling > is not safe: it offers no assurance of what may be interjected into an open > transaction between calls. For example, each is a separate call to a shared > connection: > > Client A: BEGIN WORK; SELECT last_name from customer for update where <X>; > > Client B: BEGIN WORK; SELECT street from customer for update where <Y>; > > Client A: update customer set lastname=<modified value> where <X>; COMMIT > WORK; > > > Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's > lock is still hanging around. While Client B's commit will close it, Client B > has lost the assurance of its lock, defeating the purpose of SELECT FOR > UPDATE. > > If this is corrent, then external connection pools limit what you can do with > the database to a single call. Any transaction spanning more than one call is > unsafe, because it is not isolated from other clients sharing the same > connection. The general idea is that you grab a handle and hold onto it until you're done. This makes the above scenario impossible. Forgetting to commit or rollback before relenquishing the handle is another scenario that can lead to problems but that's already been discussed in detail. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
On Wednesday 19 December 2001 01:04 pm, Don Baccus wrote: > The general idea is that you grab a handle and hold onto it until you're > done. This makes the above scenario impossible. > > Forgetting to commit or rollback before relenquishing the handle is > another scenario that can lead to problems but that's already been > discussed in detail. But then the shared connection is unshared, sitting idle while the client works in between calls, thus introducing idle time among a fixed number of connections. The server is doing less than it could. I agree that this connection pool has improved things in eliminating backend startup time. But idle time still exists for the clients performing multiple calls, proportional to the product of the number of multiple call clients and the number of calls they make, plus the idle time between them. However this probably only ever happens on update. Inserts and selects can be done in one call. And, I suppose updates comprise only a small fraction of the requests sent to the database. Even then, you can probably eliminate some multiple calls by using things such as procedures. Factoring all that in, you can probably do as well by optimizing your particular database/application than by writing code. I relent. Thanks for your thoughts.
On Thu, 2001-12-20 at 07:22, Michael Owens wrote: > As long as each client's call is composed of a standalone transaction, there > is no problem with external connection pools. But what about when a client's > transactions spans two or more calls, such as SELECT FOR UPDATE? Then pooling > is not safe: it offers no assurance of what may be interjected into an open > transaction between calls. For example, each is a separate call to a shared > connection: > > Client A: BEGIN WORK; SELECT last_name from customer for update where <X>; > > Client B: BEGIN WORK; SELECT street from customer for update where <Y>; > > Client A: update customer set lastname=<modified value> where <X>; COMMIT > WORK; > > > Now, isn't Client B's write lock gone with Client A's commit? Yet Client A's > lock is still hanging around. While Client B's commit will close it, Client B > has lost the assurance of its lock, defeating the purpose of SELECT FOR > UPDATE. > > If this is corrent, then external connection pools limit what you can do with > the database to a single call. Any transaction spanning more than one call is > unsafe, because it is not isolated from other clients sharing the same > connection. Oh, I see. You are absolutely correct that client-side pooling wouldn't work in that situation of course. As an application developer nobody has forced me into such a corner yet, however. Long running transactions are something I avoid like the plague. Cheers, Andrew. -- -------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Are you enrolled at http://schoolreunions.co.nz/yet?