Обсуждение: Performance and Clustering

Поиск
Список
Период
Сортировка

Performance and Clustering

От
Jaime Rodriguez
Дата:
hi,
Today is my first day looking at PostgreSQL
I am looking to migrate a MS SQL DB to PostgreSQL :) :)
My customer requires that DBMS shall support 4000 simultaneous requests
Also the system to be deploy maybe a cluster, with 12 microprocessors

From what I have read, PostgreSQL has really good performance and reliability but I would like to get some numbers, not sure if somewhere in the wiki some of this data is available.

I am currently in a research/explore face on the project.
I am looking at PostgreSQL and MySQL.
In future this will be a showcase success story, and we plan to public it once we complete the project.
I will be really happy if you can point me in the right direction so I can get strong data to make me choose PostgreSQL :)

Thanks,
--
Ing. Jaime Rodríguez Quesada, Mag.
Liberux S.A.
http://www.liberux.com

Re: Performance and Clustering

От
John R Pierce
Дата:
Jaime Rodriguez wrote:
> hi,
> Today is my first day looking at PostgreSQL
> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
> My customer requires that DBMS shall support 4000 simultaneous requests

thats a lot of connections and processes.
4000 concurrent queries will be generating a massive IO workload, what
sort of storage system are you planning on using?
4000 sockets from clients making simulatneous queries will generate a
massive network workload.   what sort of networks are you using?


> Also the system to be deploy maybe a cluster, with 12 microprocessors

a single server with 12 CPU cores, no problem.

12 separate servers, you're going to have to resolve some very sticky
issues with transactional integrity of updates and conflict resolution.




Re: Performance and Clustering

От
Jaime Casanova
Дата:
On Wed, Apr 28, 2010 at 8:08 PM, Jaime Rodriguez
<jaime.rodriguez@liberux.com> wrote:
> hi,
> Today is my first day looking at PostgreSQL
> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
> My customer requires that DBMS shall support 4000 simultaneous requests

and that requests come from the fantasy of some one or are there
numbers supporting that?
if the app is correctly written the connections wiil be taken and
relesead as needed then you can use a connection pooler


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Performance and Clustering

От
Greg Smith
Дата:
Jaime Rodriguez wrote:
> My customer requires that DBMS shall support 4000 simultaneous requests
> Also the system to be deploy maybe a cluster, with 12 microprocessors

In order to support 4000 true simultaneous requests, you'd need 4000
processor cores available.  What you probably mean here is that you
expect 4000 simultaneous database connections instead.  The number of
connections open and the number actually expected to be doing work at
any time are very different quantities, and that ratio is a critical
number you'll need to determine before you can estimate something here.

Generally a single PostgreSQL server can handle in the range of 100-1000
open connections at a time, depending on OS and hardware specs.  The
number of active queries running at any one time will be closer to the
number of cores in the server.

If most connections are read-only, there are a few ways to design a
cluster of systems to support the sort of design needed to scale up to
where you're aiming at.  Getting more than one node you can write to in
a cluster is much harder.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance and Clustering

От
Craig Ringer
Дата:
On 29/04/2010 10:04 AM, Greg Smith wrote:
> Jaime Rodriguez wrote:
>> My customer requires that DBMS shall support 4000 simultaneous requests
>> Also the system to be deploy maybe a cluster, with 12 microprocessors
> [snip]
> If most connections are read-only, there are a few ways to design a
> cluster of systems to support the sort of design needed to scale up to
> where you're aiming at. Getting more than one node you can write to in a
> cluster is much harder.

If most of the connections are read-only then in addition to using a
connection pooler and/or read slave cluster, you can look into getting
the customer to using memcached as a midlayer. They should see a huge
performance boost if they're prepared to do the work.

--
Craig Ringer

Re: Performance and Clustering

От
Alban Hertroys
Дата:
On 29 Apr 2010, at 3:08, Jaime Rodriguez wrote:

> hi,
> Today is my first day looking at PostgreSQL
> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
> My customer requires that DBMS shall support 4000 simultaneous requests
> Also the system to be deploy maybe a cluster, with 12 microprocessors
>
> From what I have read, PostgreSQL has really good performance and reliability but I would like to get some numbers,
notsure if somewhere in the wiki some of this data is available. 

Are you looking at PostgreSQL on Windows or on a UNIX or UNIX-based OS?

The reason I'm asking is that Postgres doesn't perform at its best on Windows and I seriously wonder whether the OS
wouldbe able to handle a load like that at all (can Windows handle 4000 open sockets for example?). Other database
solutionson Windows will probably have similar issues, so this is not a reason to base your choice of database on - it
isIMHO something that you should look into. 

OTOH, changing both the database and the OS is a big change. For example, most UNIX-es by default use a case-sensitive
filesystem, whereas Windows does not. 
That said, for both you'll certainly have to make lots of changes in your application, so combining the two and do that
onlyonce may be preferable. If you're thinking of going that way I'd suggest FreeBSD or Solaris, but Linux is a popular
choice(as is Windows, for that matter). 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4bd984be10411660912508!



Re: Performance and Clustering

От
Greg Smith
Дата:
Alban Hertroys wrote:
> The reason I'm asking is that Postgres doesn't perform at its best on Windows and I seriously wonder whether the OS
wouldbe able to handle a load like that at all (can Windows handle 4000 open sockets for example?).  

You have to go out of your way to even get >125 connections going on
Windows; see the very last entry at

http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance and Clustering

От
Ozz Nixon
Дата:
On 4/29/10 12:42 PM, Greg Smith wrote:
> Alban Hertroys wrote:
>> The reason I'm asking is that Postgres doesn't perform at its best on
>> Windows and I seriously wonder whether the OS would be able to handle
>> a load like that at all (can Windows handle 4000 open sockets for
>> example?).
>
> You have to go out of your way to even get >125 connections going on
> Windows; see the very last entry at
>
> http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows
>
>
I design socket component suites for developers, on windows, with  few
registry tweaks, you are able to have over 50,000 live, hot sockets. On
Linux 2.6 and later, I have yet to hit a serious limit.

Performance wise, your focus will be poor memory paging, so make sure
you have too much RAM, and nothing else running.

O.

Re: Performance and Clustering

От
Andy Colson
Дата:
On 4/29/2010 11:49 AM, Ozz Nixon wrote:
> On 4/29/10 12:42 PM, Greg Smith wrote:
>> Alban Hertroys wrote:
>>> The reason I'm asking is that Postgres doesn't perform at its best on
>>> Windows and I seriously wonder whether the OS would be able to handle
>>> a load like that at all (can Windows handle 4000 open sockets for
>>> example?).
>>
>> You have to go out of your way to even get >125 connections going on
>> Windows; see the very last entry at
>>
>> http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows
>>
>>
> I design socket component suites for developers, on windows, with few
> registry tweaks, you are able to have over 50,000 live, hot sockets.

I dont think its that easy.  50,000 sockets open, sure, but whats the
performance?  The programming model has everything to do with that, and
windows select() wont support that many sockets with any sort of
performance.  For windows you have to convert to using non-blocking
sockets w/messages.  (and I've never see the PG code, but I'll bet it's
not using non-blocking sockets & windows msg q, so 50k sockets using
select() on windows will not be usable).

That being said, I'm not a windows socket component developer, so its
mostly guessing.  But saying "it can" and saying "its usable" are two
different things, and that depends on the code, not the registry settings.

-Andy


Re: Performance and Clustering

От
Ozz Nixon
Дата:
> I dont think its that easy.  50,000 sockets open, sure, but whats the
> performance?  The programming model has everything to do with that,
> and windows select() wont support that many sockets with any sort of
> performance.  For windows you have to convert to using non-blocking
> sockets w/messages.  (and I've never see the PG code, but I'll bet
> it's not using non-blocking sockets & windows msg q, so 50k sockets
> using select() on windows will not be usable).
>
> That being said, I'm not a windows socket component developer, so its
> mostly guessing.  But saying "it can" and saying "its usable" are two
> different things, and that depends on the code, not the registry
> settings.
Actually that is incorrect. You can use Synchronous non-blocking
sockets. Asynchronous is a nightmare due to the overhead of pushing and
handling messages... the busier the kernel, the slower your application.
Syn-Non-Blocking will perform a small degradation in performance every
5,000 sockets. (Meaning 10,000 streams is minimally slower than 5,000 -
but enough to denote degradation).

Systems Running my product and Designs:

     AOL's Proxy Server System
     Some of the UK's largest ISP's
     AT&T Fiber Monitoring Framework
     HBO Video Streaming to Satellite
     Hart, a Front-End for TransUnion, Equifax and Experian
     OFAC Query (B-Tree Query Service, processing over 100,000 requests
a second) (*)

* WAN Latency plays a running variable on their stats, but they average
100,000+ a second during peak-hours. [1 master, 2 fail-over
load-balanced servers].

Most people run into the "2048+/-" thread limitation until they learn
how to properly manage stack allocation per thread. I have been
designing commercial enterprise socket solutions for over 15 years and
sell an SDK that no product has yet to touch and I compete with ALL the
big boys (and they all know who I am). :-) ... the limitations in
performance are factors of poor (modern sloppiness) variable allocation,
memory management, buffering techniques, etc. I got out of actively
promoting DXSock (my socket suite) when I found I could capitalize more
on my time and my product... so since 2000 - I sale my knowledge.

Factors which also come into play are the built-in overhead of the
Operating System when it is a "Network Client/Server" it has active
connections. These connections also incur the poor default settings
Microsoft picked (FIN_WAIT/2 issue which is another registry tweak).
Once you learn what servers a "Dedicated Windows Server" will not need,
rip out all of the excess "Network Client" junk (and this is well
documented all over the net) - you can produce very robust Windows
servers. (Of course there are much better solutions for production
servers than Windows - but, people still drink the Microsoft "blue"
coolaide.

* People who document the registry tweaks needed:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc38421_1500/html/ntconfig/X26667.htm

;-)

O.



Re: Performance and Clustering

От
Greg Smith
Дата:
This whole sockets conversation has wandered way off topic.  PostgreSQL
runs into high-connection scaling issues due to memory limitations (on
Windows in particular, as noted in the FAQ entry I suggested), shared
resource contention, and general per-connection overhead long before
socket issues matter.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance and Clustering

От
Scott Marlowe
Дата:
On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez
<jaime.rodriguez@liberux.com> wrote:
> hi,
> Today is my first day looking at PostgreSQL
> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
> My customer requires that DBMS shall support 4000 simultaneous requests
> Also the system to be deploy maybe a cluster, with 12 microprocessors

I'm gonna jump in here and say that if you 400 REQUESTS running at the
same time, you're gonna want a REALLY big machine.

I admin a setup where two db servers handle ~200 simultaneous
requests, almost all being very short millisecond long requests, and a
few being 100 milliseconds, and a very very few running for seconds.

With 8 2.1 GHz Opteron cores, 32 Gigs of ram, and 14x15k drives those
machines run with a load factor in the range of 10 to 15.  CPUs are
maxed at that range of load, and IO is 70 to 80% utilized acording to
iostat -x.  Wait % is generally one core max.  Some of that load is
fixed on the master, but a lot can be handled by slaves.

Your load, if you really are having 4000 simultaneous connections, is
likely going to need 20 times the load handling I need.  Given the
newer 12 core AMDs are somewhat faster, you could probably get away
with two or three of these machines.  If you were to use 96 core
machines (8Px12core) with as many disks as you could throw at them (40
to 100) then you're in the ballpark for a set of machines to process
4,000 simultaneous requests, assuming a mostly read (80% or so) setup.
 We're talking a large % of a full sized rack to hold all the drives
and cores you'd need.

But this brings up a lot of questions about partitioning your dataset
if you can, things like that.  Do all of these 4,000 simultaneous
requests need to update the same exact data set?  Or are they read
mostly reporting users? Can you use memcached to handle part of the
load?  Usage patterns informs a great deal on how to size a system to
handle that much load.

Re: Performance and Clustering

От
Scott Marlowe
Дата:
On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez
> <jaime.rodriguez@liberux.com> wrote:
>> hi,
>> Today is my first day looking at PostgreSQL
>> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
>> My customer requires that DBMS shall support 4000 simultaneous requests
>> Also the system to be deploy maybe a cluster, with 12 microprocessors
>
> I'm gonna jump in here and say that if you 400 REQUESTS running at the
> same time, you're gonna want a REALLY big machine.

I hate my keyboard... I meant to say:

.. if you really need 4000 requests running at the same time...

Re: Performance and Clustering

От
Jaime Rodriguez
Дата:
Thanks a lot for all your responses

I am impress, really impress. I never though I could get this amount of responses in this shorter time. Wonderful support :)
Thanks a lot :) :)

I don't have details, I'll get them really soon. But all your input is really valuable. I have much more information. I'll continue my research. I'll spend a lot of time reading at wiki :P :)

I am agree, 4k requests seams to be toooo much and crazy. I hope that my contact was wrong and it's only 400, which looks to be manageable.

Once again, thanks a lot, I have a lot of information. Really appreciate your valuable time.

Thanks :)


On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Apr 29, 2010 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Apr 28, 2010 at 7:08 PM, Jaime Rodriguez
> <jaime.rodriguez@liberux.com> wrote:
>> hi,
>> Today is my first day looking at PostgreSQL
>> I am looking to migrate a MS SQL DB to PostgreSQL :) :)
>> My customer requires that DBMS shall support 4000 simultaneous requests
>> Also the system to be deploy maybe a cluster, with 12 microprocessors
>
> I'm gonna jump in here and say that if you 400 REQUESTS running at the
> same time, you're gonna want a REALLY big machine.

I hate my keyboard... I meant to say:

.. if you really need 4000 requests running at the same time...



--
Ing. Jaime Rodríguez Quesada, Mag
Liberux S.A.
http://www.liberux.com