Обсуждение: Performance and Clustering
hi,
Today is my first day looking at PostgreSQL
I am looking to migrate a MS SQL DB to PostgreSQL :) :)
My customer requires that DBMS shall support 4000 simultaneous requests
Also the system to be deploy maybe a cluster, with 12 microprocessors
From what I have read, PostgreSQL has really good performance and reliability but I would like to get some numbers, not sure if somewhere in the wiki some of this data is available.
I am currently in a research/explore face on the project.
I am looking at PostgreSQL and MySQL.
In future this will be a showcase success story, and we plan to public it once we complete the project.
I will be really happy if you can point me in the right direction so I can get strong data to make me choose PostgreSQL :)
Thanks,
--
Ing. Jaime Rodríguez Quesada, Mag.
Liberux S.A.
http://www.liberux.com
Today is my first day looking at PostgreSQL
I am looking to migrate a MS SQL DB to PostgreSQL :) :)
My customer requires that DBMS shall support 4000 simultaneous requests
Also the system to be deploy maybe a cluster, with 12 microprocessors
From what I have read, PostgreSQL has really good performance and reliability but I would like to get some numbers, not sure if somewhere in the wiki some of this data is available.
I am currently in a research/explore face on the project.
I am looking at PostgreSQL and MySQL.
In future this will be a showcase success story, and we plan to public it once we complete the project.
I will be really happy if you can point me in the right direction so I can get strong data to make me choose PostgreSQL :)
Thanks,
--
Ing. Jaime Rodríguez Quesada, Mag.
Liberux S.A.
http://www.liberux.com
Jaime Rodriguez wrote: > hi, > Today is my first day looking at PostgreSQL > I am looking to migrate a MS SQL DB to PostgreSQL :) :) > My customer requires that DBMS shall support 4000 simultaneous requests thats a lot of connections and processes. 4000 concurrent queries will be generating a massive IO workload, what sort of storage system are you planning on using? 4000 sockets from clients making simulatneous queries will generate a massive network workload. what sort of networks are you using? > Also the system to be deploy maybe a cluster, with 12 microprocessors a single server with 12 CPU cores, no problem. 12 separate servers, you're going to have to resolve some very sticky issues with transactional integrity of updates and conflict resolution.
On Wed, Apr 28, 2010 at 8:08 PM, Jaime Rodriguez <jaime.rodriguez@liberux.com> wrote: > hi, > Today is my first day looking at PostgreSQL > I am looking to migrate a MS SQL DB to PostgreSQL :) :) > My customer requires that DBMS shall support 4000 simultaneous requests and that requests come from the fantasy of some one or are there numbers supporting that? if the app is correctly written the connections wiil be taken and relesead as needed then you can use a connection pooler -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Rodriguez wrote: > My customer requires that DBMS shall support 4000 simultaneous requests > Also the system to be deploy maybe a cluster, with 12 microprocessors In order to support 4000 true simultaneous requests, you'd need 4000 processor cores available. What you probably mean here is that you expect 4000 simultaneous database connections instead. The number of connections open and the number actually expected to be doing work at any time are very different quantities, and that ratio is a critical number you'll need to determine before you can estimate something here. Generally a single PostgreSQL server can handle in the range of 100-1000 open connections at a time, depending on OS and hardware specs. The number of active queries running at any one time will be closer to the number of cores in the server. If most connections are read-only, there are a few ways to design a cluster of systems to support the sort of design needed to scale up to where you're aiming at. Getting more than one node you can write to in a cluster is much harder. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 29/04/2010 10:04 AM, Greg Smith wrote: > Jaime Rodriguez wrote: >> My customer requires that DBMS shall support 4000 simultaneous requests >> Also the system to be deploy maybe a cluster, with 12 microprocessors > [snip] > If most connections are read-only, there are a few ways to design a > cluster of systems to support the sort of design needed to scale up to > where you're aiming at. Getting more than one node you can write to in a > cluster is much harder. If most of the connections are read-only then in addition to using a connection pooler and/or read slave cluster, you can look into getting the customer to using memcached as a midlayer. They should see a huge performance boost if they're prepared to do the work. -- Craig Ringer
On 29 Apr 2010, at 3:08, Jaime Rodriguez wrote: > hi, > Today is my first day looking at PostgreSQL > I am looking to migrate a MS SQL DB to PostgreSQL :) :) > My customer requires that DBMS shall support 4000 simultaneous requests > Also the system to be deploy maybe a cluster, with 12 microprocessors > > From what I have read, PostgreSQL has really good performance and reliability but I would like to get some numbers, notsure if somewhere in the wiki some of this data is available. Are you looking at PostgreSQL on Windows or on a UNIX or UNIX-based OS? The reason I'm asking is that Postgres doesn't perform at its best on Windows and I seriously wonder whether the OS wouldbe able to handle a load like that at all (can Windows handle 4000 open sockets for example?). Other database solutionson Windows will probably have similar issues, so this is not a reason to base your choice of database on - it isIMHO something that you should look into. OTOH, changing both the database and the OS is a big change. For example, most UNIX-es by default use a case-sensitive filesystem, whereas Windows does not. That said, for both you'll certainly have to make lots of changes in your application, so combining the two and do that onlyonce may be preferable. If you're thinking of going that way I'd suggest FreeBSD or Solaris, but Linux is a popular choice(as is Windows, for that matter). Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4bd984be10411660912508!
Alban Hertroys wrote: > The reason I'm asking is that Postgres doesn't perform at its best on Windows and I seriously wonder whether the OS wouldbe able to handle a load like that at all (can Windows handle 4000 open sockets for example?). You have to go out of your way to even get >125 connections going on Windows; see the very last entry at http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 4/29/10 12:42 PM, Greg Smith wrote: > Alban Hertroys wrote: >> The reason I'm asking is that Postgres doesn't perform at its best on >> Windows and I seriously wonder whether the OS would be able to handle >> a load like that at all (can Windows handle 4000 open sockets for >> example?). > > You have to go out of your way to even get >125 connections going on > Windows; see the very last entry at > > http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows > > I design socket component suites for developers, on windows, with few registry tweaks, you are able to have over 50,000 live, hot sockets. On Linux 2.6 and later, I have yet to hit a serious limit. Performance wise, your focus will be poor memory paging, so make sure you have too much RAM, and nothing else running. O.
On 4/29/2010 11:49 AM, Ozz Nixon wrote: > On 4/29/10 12:42 PM, Greg Smith wrote: >> Alban Hertroys wrote: >>> The reason I'm asking is that Postgres doesn't perform at its best on >>> Windows and I seriously wonder whether the OS would be able to handle >>> a load like that at all (can Windows handle 4000 open sockets for >>> example?). >> >> You have to go out of your way to even get >125 connections going on >> Windows; see the very last entry at >> >> http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows >> >> > I design socket component suites for developers, on windows, with few > registry tweaks, you are able to have over 50,000 live, hot sockets. I dont think its that easy. 50,000 sockets open, sure, but whats the performance? The programming model has everything to do with that, and windows select() wont support that many sockets with any sort of performance. For windows you have to convert to using non-blocking sockets w/messages. (and I've never see the PG code, but I'll bet it's not using non-blocking sockets & windows msg q, so 50k sockets using select() on windows will not be usable). That being said, I'm not a windows socket component developer, so its mostly guessing. But saying "it can" and saying "its usable" are two different things, and that depends on the code, not the registry settings. -Andy
> I dont think its that easy. 50,000 sockets open, sure, but whats the > performance? The programming model has everything to do with that, > and windows select() wont support that many sockets with any sort of > performance. For windows you have to convert to using non-blocking > sockets w/messages. (and I've never see the PG code, but I'll bet > it's not using non-blocking sockets & windows msg q, so 50k sockets > using select() on windows will not be usable). > > That being said, I'm not a windows socket component developer, so its > mostly guessing. But saying "it can" and saying "its usable" are two > different things, and that depends on the code, not the registry > settings. Actually that is incorrect. You can use Synchronous non-blocking sockets. Asynchronous is a nightmare due to the overhead of pushing and handling messages... the busier the kernel, the slower your application. Syn-Non-Blocking will perform a small degradation in performance every 5,000 sockets. (Meaning 10,000 streams is minimally slower than 5,000 - but enough to denote degradation). Systems Running my product and Designs: AOL's Proxy Server System Some of the UK's largest ISP's AT&T Fiber Monitoring Framework HBO Video Streaming to Satellite Hart, a Front-End for TransUnion, Equifax and Experian OFAC Query (B-Tree Query Service, processing over 100,000 requests a second) (*) * WAN Latency plays a running variable on their stats, but they average 100,000+ a second during peak-hours. [1 master, 2 fail-over load-balanced servers]. Most people run into the "2048+/-" thread limitation until they learn how to properly manage stack allocation per thread. I have been designing commercial enterprise socket solutions for over 15 years and sell an SDK that no product has yet to touch and I compete with ALL the big boys (and they all know who I am). :-) ... the limitations in performance are factors of poor (modern sloppiness) variable allocation, memory management, buffering techniques, etc. I got out of actively promoting DXSock (my socket suite) when I found I could capitalize more on my time and my product... so since 2000 - I sale my knowledge. Factors which also come into play are the built-in overhead of the Operating System when it is a "Network Client/Server" it has active connections. These connections also incur the poor default settings Microsoft picked (FIN_WAIT/2 issue which is another registry tweak). Once you learn what servers a "Dedicated Windows Server" will not need, rip out all of the excess "Network Client" junk (and this is well documented all over the net) - you can produce very robust Windows servers. (Of course there are much better solutions for production servers than Windows - but, people still drink the Microsoft "blue" coolaide. * People who document the registry tweaks needed: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc38421_1500/html/ntconfig/X26667.htm ;-) O.
This whole sockets conversation has wandered way off topic. PostgreSQL runs into high-connection scaling issues due to memory limitations (on Windows in particular, as noted in the FAQ entry I suggested), shared resource contention, and general per-connection overhead long before socket issues matter. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez <jaime.rodriguez@liberux.com> wrote: > hi, > Today is my first day looking at PostgreSQL > I am looking to migrate a MS SQL DB to PostgreSQL :) :) > My customer requires that DBMS shall support 4000 simultaneous requests > Also the system to be deploy maybe a cluster, with 12 microprocessors I'm gonna jump in here and say that if you 400 REQUESTS running at the same time, you're gonna want a REALLY big machine. I admin a setup where two db servers handle ~200 simultaneous requests, almost all being very short millisecond long requests, and a few being 100 milliseconds, and a very very few running for seconds. With 8 2.1 GHz Opteron cores, 32 Gigs of ram, and 14x15k drives those machines run with a load factor in the range of 10 to 15. CPUs are maxed at that range of load, and IO is 70 to 80% utilized acording to iostat -x. Wait % is generally one core max. Some of that load is fixed on the master, but a lot can be handled by slaves. Your load, if you really are having 4000 simultaneous connections, is likely going to need 20 times the load handling I need. Given the newer 12 core AMDs are somewhat faster, you could probably get away with two or three of these machines. If you were to use 96 core machines (8Px12core) with as many disks as you could throw at them (40 to 100) then you're in the ballpark for a set of machines to process 4,000 simultaneous requests, assuming a mostly read (80% or so) setup. We're talking a large % of a full sized rack to hold all the drives and cores you'd need. But this brings up a lot of questions about partitioning your dataset if you can, things like that. Do all of these 4,000 simultaneous requests need to update the same exact data set? Or are they read mostly reporting users? Can you use memcached to handle part of the load? Usage patterns informs a great deal on how to size a system to handle that much load.
On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez > <jaime.rodriguez@liberux.com> wrote: >> hi, >> Today is my first day looking at PostgreSQL >> I am looking to migrate a MS SQL DB to PostgreSQL :) :) >> My customer requires that DBMS shall support 4000 simultaneous requests >> Also the system to be deploy maybe a cluster, with 12 microprocessors > > I'm gonna jump in here and say that if you 400 REQUESTS running at the > same time, you're gonna want a REALLY big machine. I hate my keyboard... I meant to say: .. if you really need 4000 requests running at the same time...
Thanks a lot for all your responses
I am impress, really impress. I never though I could get this amount of responses in this shorter time. Wonderful support :)
Thanks a lot :) :)
I don't have details, I'll get them really soon. But all your input is really valuable. I have much more information. I'll continue my research. I'll spend a lot of time reading at wiki :P :)
I am agree, 4k requests seams to be toooo much and crazy. I hope that my contact was wrong and it's only 400, which looks to be manageable.
Once again, thanks a lot, I have a lot of information. Really appreciate your valuable time.
Thanks :)
--
Ing. Jaime Rodríguez Quesada, Mag
Liberux S.A.
http://www.liberux.com
I am impress, really impress. I never though I could get this amount of responses in this shorter time. Wonderful support :)
Thanks a lot :) :)
I don't have details, I'll get them really soon. But all your input is really valuable. I have much more information. I'll continue my research. I'll spend a lot of time reading at wiki :P :)
I am agree, 4k requests seams to be toooo much and crazy. I hope that my contact was wrong and it's only 400, which looks to be manageable.
Once again, thanks a lot, I have a lot of information. Really appreciate your valuable time.
Thanks :)
On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:I hate my keyboard... I meant to say:
> On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez
> <jaime.rodriguez@liberux.com> wrote:
>> hi,
>> Today is my first day looking at PostgreSQL
>> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
>> My customer requires that DBMS shall support 4000 simultaneous requests
>> Also the system to be deploy maybe a cluster, with 12 microprocessors
>
> I'm gonna jump in here and say that if you 400 REQUESTS running at the
> same time, you're gonna want a REALLY big machine.
.. if you really need 4000 requests running at the same time...
--
Ing. Jaime Rodríguez Quesada, Mag
Liberux S.A.
http://www.liberux.com