Обсуждение: Connection pooling - Number of connections
Hi all, Brett Wooldridge, the creator of HikariCP [1] - a high performance Java connection pool - is contemplating the idea to change the way pooling is done in HikariCP and have a fixed-size pool of connections always open. No maxPoolSize, no minIdle, no minPoolSize, juste a poolSize parameter which sets the size of the pool. At application startup, all the connections are opened and maintained by the pool throughout the life of the application. The basic idea is that if you decide that your application might need 100 connections at time, you set poolSize to 100 and HikariCP maintains 100 connections open. I recall very old posts on this list where people were talking about code paths sensitive to the number of connections open (or even max_connections) and that it wasn't such a good idea to keep connections open if they were not really needed. As a lot of scalability work has been done since this (very old) time, I was wondering if it was still the rule of thumb or if the idea of Brett to completely simplify the connection management is the way to go. It seems that at least another pool implementation is going this way so I thought it might be a good idea to have the opinion of the database side of things. This way, it will be easier to take a well informed decision. Thanks in advance for your comments/advices. -- Guillaume [1] https://github.com/brettwooldridge/HikariCP
Guillaume Smet wrote > Brett Wooldridge, the creator of HikariCP [1] - a high performance > Java connection pool - is contemplating the idea to change the way > pooling is done in HikariCP and have a fixed-size pool of connections > always open. > > No maxPoolSize, no minIdle, no minPoolSize, juste a poolSize parameter > which sets the size of the pool. At application startup, all the > connections are opened and maintained by the pool throughout the life > of the application. > > The basic idea is that if you decide that your application might need > 100 connections at time, you set poolSize to 100 and HikariCP > maintains 100 connections open. > > I recall very old posts on this list where people were talking about > code paths sensitive to the number of connections open (or even > max_connections) and that it wasn't such a good idea to keep > connections open if they were not really needed. > > As a lot of scalability work has been done since this (very old) time, > I was wondering if it was still the rule of thumb or if the idea of > Brett to completely simplify the connection management is the way to > go. > > It seems that at least another pool implementation is going this way > so I thought it might be a good idea to have the opinion of the > database side of things. This way, it will be easier to take a well > informed decision. The developer, not the pool implementer, is going to ultimately decide which trade-offs to incur. Having a connection open, even if idle, consumes resources and performance no matter how minimal. Pool management does cost cycles as well so if one does not need pool management then getting rid of it is probably worthwhile to them. The question is whether you want to only meet the need of this specific user or whether you want to provide them with flexibility. If existing pool management implementations are reasonably well implemented and efficient then focusing effort on a potentially under-served use-case definitely has merit. Consider this train-of-thought: no matter how large the pool size if you are constantly keeping, say, 90% of the connections actively working then having, on average, 10% of the connections sitting idle is probably not going to be noticeable on the server and the reduction in overhead of managing a pool is typically a net positive. Now, I had no clue what percentage is actually true, or under what conditions and pool sizes it may vary, but that is a calculation that someone deciding on between managed and un-managed pools would need to make. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797030.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote: > Consider this train-of-thought: no matter how large the pool size if you > are constantly keeping, say, 90% of the connections actively working then > having, on average, 10% of the connections sitting idle is probably not > going to be noticeable on the server and the reduction in overhead of > managing a pool is typically a net positive. Now, I had no clue what > percentage is actually true, or under what conditions and pool sizes it may > vary, but that is a calculation that someone deciding on between managed and > un-managed pools would need to make. Sure. The big question is if it is suited for general purpose or if having 100 connections open when 10 only are necessary at the time is causing any unnecessary contention/spinlock issues/performance overhead/whatever... -- Guillaume
Guillaume Smet <guillaume.smet@gmail.com> writes: > On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote: >> Consider this train-of-thought: no matter how large the pool size if you >> are constantly keeping, say, 90% of the connections actively working then >> having, on average, 10% of the connections sitting idle is probably not >> going to be noticeable on the server and the reduction in overhead of >> managing a pool is typically a net positive. Now, I had no clue what >> percentage is actually true, or under what conditions and pool sizes it may >> vary, but that is a calculation that someone deciding on between managed and >> un-managed pools would need to make. > Sure. > The big question is if it is suited for general purpose or if having > 100 connections open when 10 only are necessary at the time is causing > any unnecessary contention/spinlock issues/performance > overhead/whatever... It will cost you, in ProcArray scans for example. But lots-of-idle- connections is exactly what a pooler is supposed to prevent. If you have a server that can handle say 10 active queries, you should have a pool size of 10, not 100. (If you have a server that can actually handle 100 active queries, I'd like to have your IT budget.) The proposed design sounds fairly reasonable to me, as long as users are clear on how to set the pool size --- and in particular that bigger is not better. Clueless users could definitely shoot themselves in the foot, though. regards, tom lane
Reaching the maxPoolSize from the minPoolSize means creating the connections at the crucial moment where the client application is in the desperate need of completing an important query/transaction which the primary responsibility since it cannot hold the data collected.
So here the connection creation action is the costliest among all the other management tasks. so keeping the connections ready is the best option.
poolSize parameter is very good in the sense when the application owner know what is the optimal number to put, after having application performance analysed with the history of previous settings and the improvements made on it. server sizing always shows up in this sort of analysis.
On Fri, Mar 21, 2014 at 9:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Guillaume Smet <guillaume.smet@gmail.com> writes:It will cost you, in ProcArray scans for example. But lots-of-idle-
> On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote:
>> Consider this train-of-thought: no matter how large the pool size if you
>> are constantly keeping, say, 90% of the connections actively working then
>> having, on average, 10% of the connections sitting idle is probably not
>> going to be noticeable on the server and the reduction in overhead of
>> managing a pool is typically a net positive. Now, I had no clue what
>> percentage is actually true, or under what conditions and pool sizes it may
>> vary, but that is a calculation that someone deciding on between managed and
>> un-managed pools would need to make.
> Sure.
> The big question is if it is suited for general purpose or if having
> 100 connections open when 10 only are necessary at the time is causing
> any unnecessary contention/spinlock issues/performance
> overhead/whatever...
connections is exactly what a pooler is supposed to prevent. If you have
a server that can handle say 10 active queries, you should have a pool
size of 10, not 100. (If you have a server that can actually handle
100 active queries, I'd like to have your IT budget.)
The proposed design sounds fairly reasonable to me, as long as users are
clear on how to set the pool size --- and in particular that bigger is
not better. Clueless users could definitely shoot themselves in the
foot, though.
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Hi Tom, On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It will cost you, in ProcArray scans for example. But lots-of-idle- > connections is exactly what a pooler is supposed to prevent. If you have > a server that can handle say 10 active queries, you should have a pool > size of 10, not 100. (If you have a server that can actually handle > 100 active queries, I'd like to have your IT budget.) > > The proposed design sounds fairly reasonable to me, as long as users are > clear on how to set the pool size --- and in particular that bigger is > not better. Clueless users could definitely shoot themselves in the > foot, though. Yeah, well. My understanding of what happened on the field is that people usually set the pool size limit quite high because they don't want to experience connection starvation even if there is a temporary slowdown of their application/database. Is the overhead of having 100 connections open noticeable or is it better to not have them but not so bad to have them? Thanks. -- Guillaume
Hi Sethu, On Fri, Mar 21, 2014 at 6:51 PM, Sethu Prasad <sethuprasad.in@gmail.com> wrote: > So here the connection creation action is the costliest among all the other > management tasks. so keeping the connections ready is the best option. That's why you often have a minIdle parameter which allows to create idle connections in advance. > poolSize parameter is very good in the sense when the application owner know > what is the optimal number to put, after having application performance > analysed with the history of previous settings and the improvements made on > it. server sizing always shows up in this sort of analysis. It supposes that you do this job. From my experience, most of the "not so demanding" apps are put into production without this sort of detailed analysis. You do it for your critical high throughput applications, not for the others. That said, interesting discussion. Not exactly what I expected. -- Guillaume
Sethu Prasad wrote > Reaching the maxPoolSize from the minPoolSize means creating the > connections at the crucial moment where the client application is in the > desperate need of completing an important query/transaction which the > primary responsibility since it cannot hold the data collected. One query is slowed down a little in the unusual situation where not enough pooled connections are available. To fix that you want to slow down the entire server all of the time? Really? And even if this is sometimes the best option your assertion is unqualified so do you really think this is best for everyone, always? I think it is good to give developers options but if your situation is 10 / 100 then a fixed 100 connection pool is probably not the best configuration. The question I'd ask is if you are developing a new driver what problem and use-case are you trying to accommodate? For those in the general case a resizing pool is probably the best bet. It will usually stabilize at typical volume so that an optimum number of connections are maintained while still allowing for some expansion in times of excess demand. A fixed size pool would be something an experienced user would decide they need based upon their usage patterns and need to eke out every last bit of performance in the extremes situations while only trading a little bit of performance when the connections are not maxed out. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797061.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Fri, Mar 21, 2014 at 3:36 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It will cost you, in ProcArray scans for example. But lots-of-idle- >> connections is exactly what a pooler is supposed to prevent. If you have >> a server that can handle say 10 active queries, you should have a pool >> size of 10, not 100. (If you have a server that can actually handle >> 100 active queries, I'd like to have your IT budget.) >> >> The proposed design sounds fairly reasonable to me, as long as users are >> clear on how to set the pool size --- and in particular that bigger is >> not better. Clueless users could definitely shoot themselves in the >> foot, though. > > Yeah, well. > > My understanding of what happened on the field is that people usually > set the pool size limit quite high because they don't want to > experience connection starvation even if there is a temporary slowdown > of their application/database. My experience is that small transaction-mode connection pools used to serve very quick queries can sometimes not fully use the hardware if the connections aren't set in autocommit mode on the client side, because the network roundtrips hold onto the server slot for a sizable portion of the lifecycle. So, my recommendation: use protocol-level autocommit for read-only queries and cores+spindles workers - that will use your hardware fully. On Fri, Mar 21, 2014 at 3:41 PM, David Johnston <polobo@yahoo.com> wrote: >> Reaching the maxPoolSize from the minPoolSize means creating the >> connections at the crucial moment where the client application is in the >> desperate need of completing an important query/transaction which the >> primary responsibility since it cannot hold the data collected. > > One query is slowed down a little in the unusual situation where not enough > pooled connections are available. To fix that you want to slow down the > entire server all of the time? Really? And even if this is sometimes the > best option your assertion is unqualified so do you really think this is > best for everyone, always? I don't think a variable connection pool makes any sense if you cast deadlocks aside for a moment. The only reason for connection starvation for a properly sized pool, is hardware overload. When you have hardware overload, you really don't want to throw more load at it, you want to let it cool down. The solution, if you have many heavy, OLAP-style queries that block the rest, is to have two pools, and size both so that you don't overload (or at least you overload controlledly) the server. Send the OLAP queries to one pool, and the OLTP queries to the other, and you guarantee a smooth flow, even if you cannot guarantee 100% hardware utilization and maximum thoughput both. Now, if we consider deadlocks, you might have connection starvation because all of the connections are waiting for an operation that is deadlocked in application code (the deadlock cannot occur at the DB level if you use transaction mode, but you can have open transactions waiting on a mutex that is waiting for a connection). I've experienced application-level deadlocks like these, and the solution for me has always been to have a reasonable timeout and connections for a reserve pool - when connections are waiting for more than X seconds (that one considers abnormal given the knowledge you have about load characteristics), grow the pool to free up resources and try to dislodge the deadlocked application threads. Sometimes, regular overload triggers the reserve pool, so you cannot be too generous on the number of connections you'll have in reserve.
Hi, Brett Wooldridge here, one of the principals of HikariCP. I thought I'd wade into the conversation pool a little myself if you guys don't mind. Speaking to David's point... >> Reaching the maxPoolSize from the minPoolSize means creating the >> connections at the crucial moment where the client application is in the >> desperate need of completing an important query/transaction which the >> primary responsibility since it cannot hold the data collected. This was one of the reasons I was proposing the fixed pool design. In my experience, even in pools that maintain a minimum number of idle connections, responding to spike demands is problematic. If you have a pool with say 30 max. connections, and a 10 minimum idle connection goal, a sudden spike demand for 20 connections means the pool can satisfy 10 instantly but then is left to [try to] establish 10 connections before the application's connectionTimeout (read acquisition timeout from the pool) is reached. This in turn generates a spike demand on the database slowing down not only the connection establishments themselves but also slowing down the completion of transactions that might actually return connections to the pool. As I think Tom noted is a slidestack I read somewhere, there is a "knee" in the performance curve beyond which additional connections cause a drop in TPS. While users think it is a good idea to have 10 idle connections but a maxPoolSize of 100, the reality is, they can retire/reuse connections faster with a much smaller maxPoolSize. And I didn't see a pool of a few dozen connections actually impacting performance much when half of them are idle and half are executing transactions (ie. the idle ones don't impact the overall performance much). Finally, one of my contentions was, either your database server has resources or it doesn't. Either it has enough memory and processing power for N connections or it doesn't. If the pool is set below, near, or at that capacity what is the purpose of releasing connections in that case? Yes, it frees up memory, but that memory is not really available for other use given that at any instant the maximum capacity of the pool may be demanded. Instead releasing resources only to try to reallocate them during a demand peak seems counter-productive. -Brett -- View this message in context: http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797135.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.