Re: Diminishing bandwidth performance with multiple quad core X5355s
От | jlmarin |
---|---|
Тема | Re: Diminishing bandwidth performance with multiple quad core X5355s |
Дата | |
Msg-id | 1179093615.331448.161490@e65g2000hsc.googlegroups.com обсуждение исходный текст |
Ответы |
Re: Diminishing bandwidth performance with multiple quad
core X5355s
(Arjen van der Meijden <acmmailing@tweakers.net>)
|
Список | pgsql-performance |
On May 5, 9:44 am, CharlesBlackstone <charlesblacksto...@hotmail.com> wrote: > I think a lot of people are aware that an Opteron system has less > bandwidth restrictions with a lot of processors, but that woodcrests > don't have as good a memory controller and fall behind opterons after > 4 cores or so. I'm asking how severe this is. Heavy number cruncing of > huge data sets in RAM is a bandwidth intensive operation. So, I'm > asking how badly woodcrests are impacted above 4 cores, for example, 8 > cores vs 4 cores, on bandwidth performance. I didn't think this was > that vague, is there anything else I can tell you that will make the > question less difficult to answer? Your question is difficult to answer because you'd first need to know (at least approximately) what's the ratio of FLOPS vs memory accesses, and the pattern of those accesses. It all boils down to that. If your program can keep the CPU busy during "long" stretches of time without needing to access the memory bus, then your program will definitely benefit from more cpus/cores. If, on the other hand, your program needs to request (i.e. load/store) to main RAM (i.e. cache misses) very frequently, then you will have contention on the memory bus and your performance per cpu will degrade. You ask "how badly" will your app degrade; well, the actual way to model and predict that would be using the hardware performance counters (OProfile under Linux, cputrack on Solaris, etc), and then you'd get an idea about the rate of instructions vs anything else (load/stores to ram, retired FLOPS, cache misses, TLB misses, etc). But of course the best way is to measure your program on the real thing. I wanted to post this even if it's a bit late on the thread because right now I have exactly this kind of problem. We're trying to figure out if a dual-Quadcore (Xeon) will be better (cost/benefit wise) than a 4-way Opteron dualcore, for *our* program. Spec CPU 2006 can give you some pretty good insights on this: go to the advanced query option, and list all available results, but filter by "number of total cores" equal to 8. Go straight to the int_rate and fp_rate figures, and you'll be able to compare how 4-way dual Opterons compare to (Xeon) dual-Quadcores. At least, on the Spec-2006 suite, whose programs have working set sizes quite big, although they may not be as RAM-bottlenecked as your particular program. As you say, Opterons do definitely have a much better memory system. But then a 4-way mobo is WAY more expensive that a dual-socket one... And btw, if you want to benchmark just memory bandwidth/latency performance, STREAM (http://www.cs.virginia.edu/stream/) is the way to go. Cheers, JL
В списке pgsql-performance по дате отправления: