Обсуждение: Performance on new 64bit server compared to my 32bit desktop

Поиск
Список
Период
Сортировка

Performance on new 64bit server compared to my 32bit desktop

От
Philippe Rimbault
Дата:
Hi,

I'm having a strange performance result on a new database server
compared to my simple desktop.

The configuration of the new server :
     - OS : GNU/Linux Debian Etch x86_64
     - kernel : Linux 2.6.26-2-vserver-amd64 #1 SMP Sun Jun 20 20:40:33
UTC 2010 x86_64 GNU/Linux
         (tests are on the "real server", not on a vserver)
     - CPU : 2 x Six-Core AMD Opteron(tm) Processor 2427 @ 2.20GHz
     - RAM : 32 Go
The configuration of my desktop pc :
     - OS : GNU/Linux Debian Testing i686
     - kernel : Linux 2.6.32-5-686 #1 SMP Tue Jun 1 04:59:47 UTC 2010
i686 GNU/Linux
     - CPU : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
     - RAM : 2 Go

On each configuration, i've compiled Postgresql 8.4.4 (simple
./configuration && make && make install).

On each configuration, i've restore a little database (the compressed
dump is 33Mo), here is the output of "\d+" :
  Schema |            Name            |   Type   |    Owner    |
Size    | Description
--------+----------------------------+----------+-------------+------------+-------------
  public | article                    | table    | indexwsprem | 77
MB      |
  public | article_id_seq             | sequence | indexwsprem | 8192
bytes |
  public | evt                        | table    | indexwsprem | 8192
bytes |
  public | evt_article                | table    | indexwsprem | 17
MB      |
  public | evt_article_id_seq         | sequence | indexwsprem | 8192
bytes |
  public | evt_id_seq                 | sequence | indexwsprem | 8192
bytes |
  public | firm                       | table    | indexwsprem | 1728
kB    |
  public | firm_article               | table    | indexwsprem | 17
MB      |
  public | firm_article_id_seq        | sequence | indexwsprem | 8192
bytes |
  public | firm_id_seq                | sequence | indexwsprem | 8192
bytes |
  public | publication                | table    | indexwsprem | 64
kB      |
  public | publication_article        | table    | indexwsprem | 0
bytes    |
  public | publication_article_id_seq | sequence | indexwsprem | 8192
bytes |
  public | publication_id_seq         | sequence | indexwsprem | 8192
bytes |
(14 rows)

On each configuration, postgresql.conf are the same and don't have been
modified (the shared_buffer seems enought for my simple tests).

I've enabled timing on psql, and here is the result of different
"simple" query (executed twice to use cache) :
1- select count(*) from firm;
     server x64 :  48661 (1 row) Time: 14,412 ms
     desk i686  :  48661 (1 row) Time: 4,845 ms

2- select * from pg_settings;
     server x64 :  Time: 3,898 ms
     desk i686  :  Time: 1,517 ms

3- I've run "time pgbench -c 50" :
     server x64 :
         starting vacuum...end.
         transaction type: TPC-B (sort of)
         scaling factor: 1
         query mode: simple
         number of clients: 50
         number of transactions per client: 10
         number of transactions actually processed: 500/500
         tps = 523.034437 (including connections establishing)
         tps = 663.511008 (excluding connections establishing)

         real    0m0.984s
         user    0m0.088s
         sys     0m0.096s
     desk i686 :
         starting vacuum...end.
         transaction type: TPC-B (sort of)
         scaling factor: 1
         query mode: simple
         number of clients: 50
         number of transactions per client: 10
         number of transactions actually processed: 500/500
         tps = 781.986778 (including connections establishing)
         tps = 862.809792 (excluding connections establishing)

         real    0m0.656s
         user    0m0.028s
         sys     0m0.052s


Do you think it's a 32bit/64bit difference ?

Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
On Thu, Aug 19, 2010 at 2:07 AM, Philippe Rimbault <primbault@edd.fr> wrote:
> Hi,
>
> I'm having a strange performance result on a new database server compared to
> my simple desktop.
>
> The configuration of the new server :
>    - OS : GNU/Linux Debian Etch x86_64
>    - kernel : Linux 2.6.26-2-vserver-amd64 #1 SMP Sun Jun 20 20:40:33 UTC
> 2010 x86_64 GNU/Linux
>        (tests are on the "real server", not on a vserver)
>    - CPU : 2 x Six-Core AMD Opteron(tm) Processor 2427 @ 2.20GHz
>    - RAM : 32 Go
> The configuration of my desktop pc :
>    - OS : GNU/Linux Debian Testing i686
>    - kernel : Linux 2.6.32-5-686 #1 SMP Tue Jun 1 04:59:47 UTC 2010 i686
> GNU/Linux
>    - CPU : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>    - RAM : 2 Go

PERFORMANCE STUFF DELETED FOR BREVITY

> Do you think it's a 32bit/64bit difference ?

No, it's likely that your desktop has much faster CPU cores than your
server, and it has drives that may or may not be obeying fsync
commands.  Your server, OTOH, has more cores, so it's likely to do
better under a real load.  And assuming it has more disks on a better
controller it will also do better under heavier loads.

So how are the disks setup anyway?

Re: Performance on new 64bit server compared to my 32bit desktop

От
Philippe Rimbault
Дата:
On 19/08/2010 11:51, Scott Marlowe wrote:
> On Thu, Aug 19, 2010 at 2:07 AM, Philippe Rimbault<primbault@edd.fr>  wrote:
>
>> Hi,
>>
>> I'm having a strange performance result on a new database server compared to
>> my simple desktop.
>>
>> The configuration of the new server :
>>     - OS : GNU/Linux Debian Etch x86_64
>>     - kernel : Linux 2.6.26-2-vserver-amd64 #1 SMP Sun Jun 20 20:40:33 UTC
>> 2010 x86_64 GNU/Linux
>>         (tests are on the "real server", not on a vserver)
>>     - CPU : 2 x Six-Core AMD Opteron(tm) Processor 2427 @ 2.20GHz
>>     - RAM : 32 Go
>> The configuration of my desktop pc :
>>     - OS : GNU/Linux Debian Testing i686
>>     - kernel : Linux 2.6.32-5-686 #1 SMP Tue Jun 1 04:59:47 UTC 2010 i686
>> GNU/Linux
>>     - CPU : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>>     - RAM : 2 Go
>>
> PERFORMANCE STUFF DELETED FOR BREVITY
>
>
>> Do you think it's a 32bit/64bit difference ?
>>
> No, it's likely that your desktop has much faster CPU cores than your
> server, and it has drives that may or may not be obeying fsync
> commands.  Your server, OTOH, has more cores, so it's likely to do
> better under a real load.  And assuming it has more disks on a better
> controller it will also do better under heavier loads.
>
> So how are the disks setup anyway?
>
Thanks for your reply !

The server use a HP Smart Array P410 with a Raid 5 array on Sata 133 disk.
My desktop only use one Sata 133 disk.
I was thinking that my simples queries didn't use disk but only memory.
I've launch a new pgbench with much more client and transactions :

Server :
     postgres$ pgbench -c 400 -t 100
     starting vacuum...end.
     transaction type: TPC-B (sort of)
     scaling factor: 1
     query mode: simple
     number of clients: 400
     number of transactions per client: 100
     number of transactions actually processed: 40000/40000
     tps = 115.054386 (including connections establishing)
     tps = 115.617186 (excluding connections establishing)

     real    5m47.706s
     user    0m27.054s
     sys     0m59.804s

Desktop :
     postgres$ time pgbench -c 400 -t 100
     starting vacuum...end.
     transaction type: TPC-B (sort of)
     scaling factor: 1
     query mode: simple
     number of clients: 400
     number of transactions per client: 100
     number of transactions actually processed: 40000/40000
     tps = 299.456785 (including connections establishing)
     tps = 300.590503 (excluding connections establishing)

     real    2m13.604s
     user    0m5.304s
     sys     0m13.469s





Re: Performance on new 64bit server compared to my 32bit desktop

От
Philippe Rimbault
Дата:
On 19/08/2010 12:23, Philippe Rimbault wrote:
> On 19/08/2010 11:51, Scott Marlowe wrote:
>> On Thu, Aug 19, 2010 at 2:07 AM, Philippe Rimbault<primbault@edd.fr>
>> wrote:
>>> Hi,
>>>
>>> I'm having a strange performance result on a new database server
>>> compared to
>>> my simple desktop.
>>>
>>> The configuration of the new server :
>>>     - OS : GNU/Linux Debian Etch x86_64
>>>     - kernel : Linux 2.6.26-2-vserver-amd64 #1 SMP Sun Jun 20
>>> 20:40:33 UTC
>>> 2010 x86_64 GNU/Linux
>>>         (tests are on the "real server", not on a vserver)
>>>     - CPU : 2 x Six-Core AMD Opteron(tm) Processor 2427 @ 2.20GHz
>>>     - RAM : 32 Go
>>> The configuration of my desktop pc :
>>>     - OS : GNU/Linux Debian Testing i686
>>>     - kernel : Linux 2.6.32-5-686 #1 SMP Tue Jun 1 04:59:47 UTC 2010
>>> i686
>>> GNU/Linux
>>>     - CPU : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>>>     - RAM : 2 Go
>> PERFORMANCE STUFF DELETED FOR BREVITY
>>
>>> Do you think it's a 32bit/64bit difference ?
>> No, it's likely that your desktop has much faster CPU cores than your
>> server, and it has drives that may or may not be obeying fsync
>> commands.  Your server, OTOH, has more cores, so it's likely to do
>> better under a real load.  And assuming it has more disks on a better
>> controller it will also do better under heavier loads.
>>
>> So how are the disks setup anyway?
> Thanks for your reply !
>
> The server use a HP Smart Array P410 with a Raid 5 array on Sata 133
> disk.
> My desktop only use one Sata 133 disk.
> I was thinking that my simples queries didn't use disk but only memory.
> I've launch a new pgbench with much more client and transactions :
>
> Server :
>     postgres$ pgbench -c 400 -t 100
>     starting vacuum...end.
>     transaction type: TPC-B (sort of)
>     scaling factor: 1
>     query mode: simple
>     number of clients: 400
>     number of transactions per client: 100
>     number of transactions actually processed: 40000/40000
>     tps = 115.054386 (including connections establishing)
>     tps = 115.617186 (excluding connections establishing)
>
>     real    5m47.706s
>     user    0m27.054s
>     sys     0m59.804s
>
> Desktop :
>     postgres$ time pgbench -c 400 -t 100
>     starting vacuum...end.
>     transaction type: TPC-B (sort of)
>     scaling factor: 1
>     query mode: simple
>     number of clients: 400
>     number of transactions per client: 100
>     number of transactions actually processed: 40000/40000
>     tps = 299.456785 (including connections establishing)
>     tps = 300.590503 (excluding connections establishing)
>
>     real    2m13.604s
>     user    0m5.304s
>     sys     0m13.469s
>
>
>
>
>
I've re-init the pgbench with -s 400 and now server work (very) better
than desktop.
So ... my desktop cpu is faster if i only work with small query but
server handle better heavier loads.
I was just suprise about the difference on my small database.

Thx

Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
On Thu, Aug 19, 2010 at 4:23 AM, Philippe Rimbault <primbault@edd.fr> wrote:
>> So how are the disks setup anyway?
>>
>
> Thanks for your reply !
>
> The server use a HP Smart Array P410 with a Raid 5 array on Sata 133 disk.

If you can change that to RAID-10 do so now.  RAID-5 is notoriously
slow for database use, unless you're only gonna do reporting type
queries with few updates.

> My desktop only use one Sata 133 disk.
> I was thinking that my simples queries didn't use disk but only memory.

No, butt pgbench has to write to the disk.

> I've launch a new pgbench with much more client and transactions :
>
> Server :
>    postgres$ pgbench -c 400 -t 100

-c 400 is HUGE.  (and as you mentioned in your later email, you need
to -s -i 400 for -c 400 to make sense)  Try values in the 4 to 40
range and the server should REALLY outshine your desktop as you pass
12 or 16 or so.

Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Philippe Rimbault wrote:
> I've run "time pgbench -c 50" :
>     server x64 :
>         starting vacuum...end.
>         transaction type: TPC-B (sort of)
>         scaling factor: 1
>         query mode: simple
>         number of clients: 50
>         number of transactions per client: 10
>         number of transactions actually processed: 500/500
>         tps = 523.034437 (including connections establishing)
>         tps = 663.511008 (excluding connections establishing)
>

As mentioned already, most of the difference you're seeing is simply
that your desktop system has faster individual processor cores in it, so
jobs where only a single core are being used are going to be faster on it.

The above isn't going to work very well either because the database
scale is too small, and you're not running the test for very long.  The
things the bigger server is better at, you're not testing.

Since your smaller system has 2GB of RAM and the larger one 32GB, try
this instead:

pgbench -i -s 2000
pgbench -c 24 -T 60 -S
pgbench -c 24 -T 300

That will create a much larger database, run some simple SELECT-only
tests on it, and then run a write intensive one.  Expect to see the
server system crush the results of the desktop here.  Note that this
will take quite a while to run--the pgbench initialization step in
particular is going to take a good fraction of an hour or more, and then
the actual tests will run for 6 minutes after that.  You can run more
tests after that without doing the initialization step again, but if you
run a lot of the write-heavy tests eventually performance will start to
degrade.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Greg Smith wrote:
> Since your smaller system has 2GB of RAM and the larger one 32GB, try
> this instead:
>
> pgbench -i -s 2000
> pgbench -c 24 -T 60 -S
> pgbench -c 24 -T 300

Oh, and to at least give a somewhat more normal postgresql.conf I'd
recommend you at least make the following two changes before doing the
above:

shared_buffers=256MB
checkpoint_segments=32

Those are the two parameters the pgbench test is most sensitive to, so
setting to higher values will give more realistic results.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Carey
Дата:
On Aug 19, 2010, at 11:25 AM, Greg Smith wrote:

> Philippe Rimbault wrote:
>> I've run "time pgbench -c 50" :
>>    server x64 :
>>        starting vacuum...end.
>>        transaction type: TPC-B (sort of)
>>        scaling factor: 1
>>        query mode: simple
>>        number of clients: 50
>>        number of transactions per client: 10
>>        number of transactions actually processed: 500/500
>>        tps = 523.034437 (including connections establishing)
>>        tps = 663.511008 (excluding connections establishing)
>>
>
> As mentioned already, most of the difference you're seeing is simply
> that your desktop system has faster individual processor cores in it, so
> jobs where only a single core are being used are going to be faster on it.
>

But the select count(*) query, cached in RAM is 3x faster in one system than the other.  The CPUs aren't 3x different
performancewise.  Something else may be wrong here. 

An individual Core2 Duo 2.93Ghz should be at most 50% faster than a 2.2Ghz Opteron for such a query.   Unless there are
somecompile options that are set wrong.   I would check the compile options. 



Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Scott Carey wrote:
> But the select count(*) query, cached in RAM is 3x faster in one system than the other.  The CPUs aren't 3x different
performancewise.  Something else may be wrong here. 
>
> An individual Core2 Duo 2.93Ghz should be at most 50% faster than a 2.2Ghz Opteron for such a query.   Unless there
aresome compile options that are set wrong.   I would check the compile options. 
>

Sure, it might be.  But I've seen RAM on an Intel chip like the E7500
here (DDR3-1066 or better, around 10GB/s possible) run almost 3X as fast
as what you'll find paired with an Opteron 2427 (DDR2-800, closer to
3.5GB/s).  Throw in the clock differences and there you go.

I've been wandering around for years warning that the older Opterons on
DDR2 running a single PostgreSQL process are dog slow compared to the
same thing on Intel.  So that alone might actually be enough to account
for the difference.  Ultimately the multi-processor stuff is what's more
important to most apps, though, which is why I was hinting to properly
run that instead.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Jose Ildefonso Camargo Tolosa
Дата:
Hi!

On Fri, Aug 27, 2010 at 12:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Scott Carey wrote:
>>
>> But the select count(*) query, cached in RAM is 3x faster in one system
>> than the other.  The CPUs aren't 3x different performance wise.  Something
>> else may be wrong here.
>>
>> An individual Core2 Duo 2.93Ghz should be at most 50% faster than a 2.2Ghz
>> Opteron for such a query.   Unless there are some compile options that are
>> set wrong.   I would check the compile options.
>>
>
> Sure, it might be.  But I've seen RAM on an Intel chip like the E7500 here
> (DDR3-1066 or better, around 10GB/s possible) run almost 3X as fast as what
> you'll find paired with an Opteron 2427 (DDR2-800, closer to 3.5GB/s).
>  Throw in the clock differences and there you go.

Precisely! CPU core clock is not all that matters, specially when it
comes to work with large datasets.  CPU core clock will only make a
difference with relatively small (ie, that fits on cpu cache) code
that works with a relatively small (ie, that *also* fits on cpu cache)
dataset, for example, a series PI calculation, or a simple prime
number generation algorithm, but when it comes to large amounts of
data/code, the RAM starts to play a vital role, and not just "raw" RAM
speed, but latency!!! (a combination of them both) some people just go
for the "fastest" RAM around, but they don't pay attention to latency
numbers, you need to get the fastest RAM with the slowest latency.

Also, nowadays, Intel has better performance than AMD, at least when
comparing Athlon 64 vs Core2, I'm still saving to get a Phenom II
system in order to benchmark them and see how it goes (does anyone
have one of these for testing?).

>
> I've been wandering around for years warning that the older Opterons on DDR2
> running a single PostgreSQL process are dog slow compared to the same thing
> on Intel.  So that alone might actually be enough to account for the
> difference.  Ultimately the multi-processor stuff is what's more important
> to most apps, though, which is why I was hinting to properly run that
> instead.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Jose Ildefonso Camargo Tolosa wrote:
> Also, nowadays, Intel has better performance than AMD, at least when
> comparing Athlon 64 vs Core2, I'm still saving to get a Phenom II
> system in order to benchmark them and see how it goes (does anyone
> have one of these for testing?).
>

Things even out again when you reach the large server line from AMD that
uses DDR-3 RAM; they've finally solved this problem there.  Scott
Marlowe has been helping me out with some tests of a new system he's got
running the AMD Opteron 6172, using the STREAM memory benchmark.  Intro
to that and some sample numbers at
http://www.advancedclustering.com/company-blog/stream-benchmarking.html

He's been seeing >75GB/s of aggregate memory bandwidth out of that
monster--using gcc, so even at a disadvantage compared to the Intel one
used for that report.  If you're only using one or two cores Intel still
seems to have a lead, I am still working out if that's true in every
situation.

I haven't had a chance to test any of the Phenom II processors yet, from
what I know of their design I expect them to still have the same
fundamental design issues that kept all AMD processors from scaling very
well, memory speed wise, the last few years.  You might be able to dig a
system using one of them out of the list at
http://www.cs.virginia.edu/stream/peecee/Bandwidth.html , I didn't
notice anything obvious that featured one.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Greg Smith wrote:
> He's been seeing >75GB/s of aggregate memory bandwidth out of that
> monster--using gcc, so even at a disadvantage compared to the Intel
> one used for that report.

On second read this was confusing.  The best STREAM results from using
the Intel compiler on Linux.  The ones I've been doing and that Scott
has been running are using regular gcc instead.  So when the new AMD
system is clearing 75MB/s in the little test set I'm trying to get
automated, that's actually a conservative figure, given that a compiler
swap is almost guaranteed to boost results a bit too.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Yeb Havinga
Дата:
Jose Ildefonso Camargo Tolosa wrote:
> Also, nowadays, Intel has better performance than AMD, at least when
> comparing Athlon 64 vs Core2, I'm still saving to get a Phenom II
> system in order to benchmark them and see how it goes (does anyone
> have one of these for testing?).
root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : AMD Phenom(tm) II X4 940 Processor
stepping        : 2
cpu MHz         : 3000.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips        : 6020.46
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


stream compiled with -O3

root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# ./a.out
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5031 microseconds.
   (= 5031 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5056.0434       0.0064       0.0063       0.0064
Scale:       4950.4916       0.0065       0.0065       0.0065
Add:         5322.0173       0.0091       0.0090       0.0091
Triad:       5395.1815       0.0089       0.0089       0.0089
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

two parallel
root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# ./a.out & ./a.out

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2984.2741       0.0108       0.0107       0.0108
Scale:       2945.8261       0.0109       0.0109       0.0110
Add:         3282.4631       0.0147       0.0146       0.0149
Triad:       3321.2893       0.0146       0.0145       0.0148
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2981.4898       0.0108       0.0107       0.0108
Scale:       2943.3067       0.0109       0.0109       0.0109
Add:         3283.8552       0.0147       0.0146       0.0149
Triad:       3313.9634       0.0147       0.0145       0.0148


four parallel
root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# ./a.out & ./a.out &
./a.out & ./a.out

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1567.4880       0.0208       0.0204       0.0210
Scale:       1525.3401       0.0211       0.0210       0.0213
Add:         1739.7735       0.0279       0.0276       0.0282
Triad:       1763.4858       0.0274       0.0272       0.0276
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1559.0759       0.0208       0.0205       0.0210
Scale:       1536.2520       0.0211       0.0208       0.0212
Add:         1740.4503       0.0279       0.0276       0.0283
Triad:       1758.4951       0.0276       0.0273       0.0279
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1552.7271       0.0208       0.0206       0.0210
Scale:       1527.5275       0.0211       0.0209       0.0212
Add:         1737.9263       0.0279       0.0276       0.0282
Triad:       1757.3439       0.0276       0.0273       0.0278
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1515.5912       0.0213       0.0211       0.0214
Scale:       1544.7033       0.0210       0.0207       0.0212
Add:         1754.4495       0.0278       0.0274       0.0281
Triad:       1856.3659       0.0279       0.0259       0.0284



Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Carey
Дата:
On Aug 27, 2010, at 10:25 AM, Greg Smith wrote:

> Scott Carey wrote:
>> But the select count(*) query, cached in RAM is 3x faster in one system than the other.  The CPUs aren't 3x
differentperformance wise.  Something else may be wrong here. 
>>
>> An individual Core2 Duo 2.93Ghz should be at most 50% faster than a 2.2Ghz Opteron for such a query.   Unless there
aresome compile options that are set wrong.   I would check the compile options. 
>>
>
> Sure, it might be.  But I've seen RAM on an Intel chip like the E7500
> here (DDR3-1066 or better, around 10GB/s possible) run almost 3X as fast
> as what you'll find paired with an Opteron 2427 (DDR2-800, closer to
> 3.5GB/s).  Throw in the clock differences and there you go.

The 2427 should do 12.8 GB/sec theoretical peak (dual channel 800Mhz DDR2) per processor socket (so 2x that if
multithreadedand 2 Sockets). 

A Nehalem will do ~2x that (triple channel, 1066Mhz) and is also significantly faster clock for clock.

But a Core2 based Xeon on Socket 775 at 1066Mhz FSB?  Nah... its theoretical peak bandwidth is 33% more and real world
nomore than 40% more. 

Latency and other factors might add up too.  3x just does not make sense here.

Nehalem would be another story, but Core2 was only slightly faster than Opterons of this generation and did not scale
aswell with more sockets. 


>
> I've been wandering around for years warning that the older Opterons on
> DDR2 running a single PostgreSQL process are dog slow compared to the
> same thing on Intel.

This isn't an older Opteron, its 6 core, 6MB L3 cache "Istanbul".  Its not the newer stuff either.   The E7500 is
basicallythe end of line Core2 before Nehalem based processors took over. 

> So that alone might actually be enough to account
> for the difference.  Ultimately the multi-processor stuff is what's more
> important to most apps, though, which is why I was hinting to properly
> run that instead.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>


Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
On Mon, Aug 30, 2010 at 1:58 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> four parallel
> root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# ./a.out & ./a.out & ./a.out
> & ./a.out

You know you can just do "stream 4" to get 4 parallel streams right?

Re: Performance on new 64bit server compared to my 32bit desktop

От
Yeb Havinga
Дата:
Scott Marlowe wrote:
> On Mon, Aug 30, 2010 at 1:58 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>
>> four parallel
>> root@p:~/ff/www.cs.virginia.edu/stream/FTP/Code# ./a.out & ./a.out & ./a.out
>> & ./a.out
>>
>
> You know you can just do "stream 4" to get 4 parallel streams right?
>
Which version is that? The stream.c source contains no argc/argv usage,
though Code/Versions/Experimental has a script called Parallel_jobs that
spawns n processes.

-- Yeb


Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Scott Carey wrote:
> The 2427 should do 12.8 GB/sec theoretical peak (dual channel 800Mhz DDR2) per processor socket (so 2x that if
multithreadedand 2 Sockets). 
>
> A Nehalem will do ~2x that (triple channel, 1066Mhz) and is also significantly faster clock for clock.
>
> But a Core2 based Xeon on Socket 775 at 1066Mhz FSB?  Nah... its theoretical peak bandwidth is 33% more and real
worldno more than 40% more. 
> The E7500 is basically the end of line Core2 before Nehalem based processors took over.
>

Ah...from its use of DDR3, I thought that the E7500 was a low-end
Nehalem.  Now I see that you're right, that it's actually a high-end
Wolfdale.  So that does significantly decrease the margin between the
two I'd expect.  I agree with your figures, and that this may be back to
looking a little fishy.

The other thing I normally check is whether one of the two systems has
more aggressive power management turned on.  Easiest way to tell on
Linux is look at /proc/cpuinfo , and see if the displayed processor
speed is much lower than the actual one.  Many systems default to
something pretty conservative here, and don't return up to full speed
nearly fast enough for some benchmark tests.


> This isn't an older Opteron, its 6 core, 6MB L3 cache "Istanbul".  Its not the newer stuff either.
>

Everything before Magny Cours is now an older Opteron from my
perspective.  They've caught up with Intel again with the release of
those.  Everything from AMD that's come out ever since Intel Nehalem
products started shipping in quantity (early 2009) have been marginal
products until the new M-C, and their early Quad-core stuff was pretty
terrible too.  So in my head I'm lumping AMD's Budapest, Shanghai, and
Istanbul product lines all into a giant "slow compared to Intel during
the same period" bin in my head.  Fine for databases with lots of
clients, not so good at executing single queries quickly.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Yeb Havinga wrote:
> model name      : AMD Phenom(tm) II X4 940 Processor @ 3.00GHz
> cpu cores         : 4
> stream compiled with -O3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       5395.1815       0.0089       0.0089       0.0089

For comparison sake, an only moderately expensive desktop Intel CPU
using DDR3-1600:

model name    : Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz
cpu cores    : 4
siblings    : 8
Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      13666.0986       0.0108       0.0107       0.0108

8 hyper-threaded cores here.  They work well for improving CPU-heavy
tasks, but with 4 threads total is where the memory throughput maxes out at.

I'm not sure if Yeb's stream was compiled to use MPI correctly though,
because I'm not seeing "Number of Threads" in his results.  Here's what
works for me:

  gcc -O3 -fopenmp stream.c -o stream

And then you can set:

export OMP_NUM_THREADS=4

Or whatever you want in order to control the number of threads it uses
inside.  Here's the way scaling works on my processor:

Number of Threads requested = 1
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       9806.2648       0.0150       0.0149       0.0151

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12495.2113       0.0117       0.0117       0.0118

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      13388.7187       0.0111       0.0109       0.0126

Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      13695.6611       0.0107       0.0107       0.0108

Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12651.7200       0.0116       0.0116       0.0116

Number of Threads requested = 6
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12804.7192       0.0115       0.0114       0.0117

Number of Threads requested = 7
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12670.2525       0.0116       0.0116       0.0117

Number of Threads requested = 8
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12468.5739       0.0119       0.0117       0.0131

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Jose Ildefonso Camargo Tolosa
Дата:
Hi!

Thanks you all for this great amount of information!

What memory/motherboard (ie, chipset) is installed on the phenom ii one?

it looks like it peaks to ~6.2GB/s with 4 threads.

Also, what kernel is on it? (uname -a would be nice).

Now, this looks like sustained memory speed, what about random memory
access (where latency comes to play an important role):
http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/

I don't have any of these systems to test, but it would be interesting
to get the random access benchmarks too, what do you think? will the
result be the same?

Once again, thanks!

Sincerely,

Ildefonso Camargo

Re: Performance on new 64bit server compared to my 32bit desktop

От
Clemens Eisserer
Дата:
Hi,

>> This isn't an older Opteron, its 6 core, 6MB L3 cache "Istanbul".  Its not
>> the newer stuff either.
>
> Everything before Magny Cours is now an older Opteron from my perspective.

The 6-cores are identical to Magny Cours (except that Magny Cours has
two of those beast in one package).

- Clemens

Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Clemens Eisserer wrote:
Hi,
 
This isn't an older Opteron, its 6 core, 6MB L3 cache "Istanbul".  Its not
the newer stuff either.     
Everything before Magny Cours is now an older Opteron from my perspective.   
The 6-cores are identical to Magny Cours (except that Magny Cours has
two of those beast in one package). 

In some ways, but not in regards to memory issues.  http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/2 has a good intro.  While the inside is like two 6-core models stuck together, the external memory interface was completely reworked.

Original report here involved Opteron 2427, correctly idenitified as being from the 6-core "Istanbul" architecture.  All Istanbul processors use DDR2 and are quite slow at memory access compared to similar Intel Nehalem systems.  The "Magny-Cours" architecture is available in 8 and 12 core variants, and the memory controller has been completely redesigned to take advantage of many banks of DDR3 at the same time; it is far faster than two of the older 6 cores working together.

http://en.wikipedia.org/wiki/List_of_AMD_Opteron_microprocessors has a good summary of the models; it's confusing.  Quick chart showing the three generations compared demonstrates what I just said above using the same STREAM benchmarking that a few results have popped out here using already:

http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/5

Istanbul Opteron 2435 in this case, 21GB/s.  The two Nehelam Intel Xeons, >31GB/s.  New Magny, 49MB/s.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Performance on new 64bit server compared to my 32bit desktop

От
Jose Ildefonso Camargo Tolosa
Дата:
Hi!

Thanks for the review link!

Ildefonso.

On Mon, Aug 30, 2010 at 6:01 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Clemens Eisserer wrote:
>
> Hi,
>
>
>
> This isn't an older Opteron, its 6 core, 6MB L3 cache "Istanbul".  Its not
> the newer stuff either.
>
>
> Everything before Magny Cours is now an older Opteron from my perspective.
>
>
> The 6-cores are identical to Magny Cours (except that Magny Cours has
> two of those beast in one package).
>
>
> In some ways, but not in regards to memory issues.
> http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/2
> has a good intro.  While the inside is like two 6-core models stuck
> together, the external memory interface was completely reworked.
>
> Original report here involved Opteron 2427, correctly idenitified as being
> from the 6-core "Istanbul" architecture.  All Istanbul processors use DDR2
> and are quite slow at memory access compared to similar Intel Nehalem
> systems.  The "Magny-Cours" architecture is available in 8 and 12 core
> variants, and the memory controller has been completely redesigned to take
> advantage of many banks of DDR3 at the same time; it is far faster than two
> of the older 6 cores working together.
>
> http://en.wikipedia.org/wiki/List_of_AMD_Opteron_microprocessors has a good
> summary of the models; it's confusing.  Quick chart showing the three
> generations compared demonstrates what I just said above using the same
> STREAM benchmarking that a few results have popped out here using already:
>
> http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/5
>
> Istanbul Opteron 2435 in this case, 21GB/s.  The two Nehelam Intel Xeons,
>>31GB/s.  New Magny, 49MB/s.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>

Re: Performance on new 64bit server compared to my 32bit desktop

От
Yeb Havinga
Дата:
Greg Smith wrote:
> Yeb Havinga wrote:
>> model name      : AMD Phenom(tm) II X4 940 Processor @ 3.00GHz
>> cpu cores         : 4
>> stream compiled with -O3
>> Function      Rate (MB/s)   Avg time     Min time     Max time
>> Triad:       5395.1815       0.0089       0.0089       0.0089
> I'm not sure if Yeb's stream was compiled to use MPI correctly though,
> because I'm not seeing "Number of Threads" in his results.  Here's
> what works for me:
>
>  gcc -O3 -fopenmp stream.c -o stream
>
> And then you can set:
>
> export OMP_NUM_THREADS=4
Then I get the following. The rather wierd dip at 5 threads is
consistent over multiple tries:

Number of Threads requested = 1
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       5378.7495       0.0089       0.0089       0.0090

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       6596.1140       0.0073       0.0073       0.0073

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       7033.9806       0.0069       0.0068       0.0069

Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       7007.2950       0.0069       0.0069       0.0069

Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       6553.8133       0.0074       0.0073       0.0074

Number of Threads requested = 6
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       6803.6427       0.0071       0.0071       0.0071

Number of Threads requested = 7
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       6895.6909       0.0070       0.0070       0.0071

Number of Threads requested = 8
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:       6931.3018       0.0069       0.0069       0.0070

Other info: DDR2 800MHz ECC memory
MB: 790FX chipset (Asus m4a78-e)

regards,
Yeb Havinga


Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Yeb Havinga wrote:
> The rather wierd dip at 5 threads is consistent over multiple tries

I've seen that twice on 4 core systems now.  The spot where there's just
one more thread than cores seems to be the worst case for cache
thrashing on a lot of these servers.

How much total RAM is in this server?  Are all the slots filled?  Just
filling in a spreadsheet I have here with sample configs of various
hardware.

Yeb's results look right to me now.  That's what an AMD Phenom II X4 940
@ 3.00GHz should look like.  It's a little faster, memory-wise, than my
older Intel Q6600 @ 2.4GHz.  So they've finally caught up with that
generation of Intel's stuff.  But my current desktop quad-core i860 with
hyperthreading is nearly twice as fast in terms of memory access at
every thread size.  That's why I own one of them instead of a Phenom II X4.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Jose Ildefonso Camargo Tolosa
Дата:
Hi!

On Tue, Aug 31, 2010 at 8:11 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Greg Smith wrote:
>>
>> Yeb Havinga wrote:
>>>
>>> model name      : AMD Phenom(tm) II X4 940 Processor @ 3.00GHz
>>> cpu cores         : 4
>>> stream compiled with -O3
>>> Function      Rate (MB/s)   Avg time     Min time     Max time
>>> Triad:       5395.1815       0.0089       0.0089       0.0089
>>
>> I'm not sure if Yeb's stream was compiled to use MPI correctly though,
>> because I'm not seeing "Number of Threads" in his results.  Here's what
>> works for me:
>>
>>  gcc -O3 -fopenmp stream.c -o stream
>>
>> And then you can set:
>>
>> export OMP_NUM_THREADS=4
>
> Then I get the following. The rather wierd dip at 5 threads is consistent
> over multiple tries:
>
> Number of Threads requested = 1
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       5378.7495       0.0089       0.0089       0.0090
>
> Number of Threads requested = 2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       6596.1140       0.0073       0.0073       0.0073
>
> Number of Threads requested = 3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       7033.9806       0.0069       0.0068       0.0069
>
> Number of Threads requested = 4
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       7007.2950       0.0069       0.0069       0.0069
>
> Number of Threads requested = 5
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       6553.8133       0.0074       0.0073       0.0074
>
> Number of Threads requested = 6
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       6803.6427       0.0071       0.0071       0.0071
>
> Number of Threads requested = 7
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       6895.6909       0.0070       0.0070       0.0071
>
> Number of Threads requested = 8
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:       6931.3018       0.0069       0.0069       0.0070
>
> Other info: DDR2 800MHz ECC memory

Ok, this could explain the huge difference.  I was planing on getting
GigaByte GA-890GPA-UD3H, with a Phenom II X6 and that ram: Crucial
CT2KIT25664BA13​39, Crucial BL2KIT25664FN1608, or something better I
find when I get enough money (depending on my budget at the moment).

> MB: 790FX chipset (Asus m4a78-e)
>
> regards,
> Yeb Havinga
>
>

Thanks for the extra info!

Ildefonso.

Re: Performance on new 64bit server compared to my 32bit desktop

От
Yeb Havinga
Дата:
Jose Ildefonso Camargo Tolosa wrote:
> Ok, this could explain the huge difference.  I was planing on getting
> GigaByte GA-890GPA-UD3H, with a Phenom II X6 and that ram: Crucial
> CT2KIT25664BA13​39, Crucial BL2KIT25664FN1608, or something better I
> find when I get enough money (depending on my budget at the moment).
>
Why not pair a 8-core magny cours ($280,- at newegg
http://www.newegg.com/Product/Product.aspx?Item=N82E16819105266) with a
supermicro ATX board
http://www.supermicro.com/Aplus/motherboard/Opteron6100/SR56x0/H8SGL-F.cfm
($264 at newegg
http://www.newegg.com/Product/Product.aspx?Item=N82E16813182230&Tpk=H8SGL-F)
and some memory?

regards,
Yeb Havinga


Re: Performance on new 64bit server compared to my 32bit desktop

От
Jose Ildefonso Camargo Tolosa
Дата:
Hi!

On Tue, Aug 31, 2010 at 11:13 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Yeb Havinga wrote:
>>
>> The rather wierd dip at 5 threads is consistent over multiple tries
>
> I've seen that twice on 4 core systems now.  The spot where there's just one
> more thread than cores seems to be the worst case for cache thrashing on a
> lot of these servers.
>
> How much total RAM is in this server?  Are all the slots filled?  Just
> filling in a spreadsheet I have here with sample configs of various
> hardware.
>
> Yeb's results look right to me now.  That's what an AMD Phenom II X4 940 @
> 3.00GHz should look like.  It's a little faster, memory-wise, than my older
> Intel Q6600 @ 2.4GHz.  So they've finally caught up with that generation of
> Intel's stuff.  But my current desktop quad-core i860 with hyperthreading is
> nearly twice as fast in terms of memory access at every thread size.  That's
> why I own one of them instead of a Phenom II X4.

your i860? http://en.wikipedia.org/wiki/Intel_i860  wow!. :D

Now, seriously: what memory (brand/model) does the Q6600 and your
newer desktop have?

I'm just too curious, last time I was able to run benchmarks myself
was with a core2duo and a athlon 64 x2, back then: core2due beated
athlon at almost anything.

Nowadays, it looks like amd is playing the "more cores for the money"
game, but I think that sooner or later they will catchup again, and
when that happen: Intel will just get another ET chip, and put on
marked,and so on! :D

This is a game where the winners are: us!

Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
Note that in that graph, the odd dips are happening every 8 cores on a
system with 4 12 core processors.  I don't know why, I would expect it
to be every 6 or something.

Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
And, I have zone reclaim set to off because it makes the linux kernel
on large cpu machines make pathologically unsound decisions during
large file transfers.

Re: Performance on new 64bit server compared to my 32bit desktop

От
Greg Smith
Дата:
Jose Ildefonso Camargo Tolosa wrote:
> your i860? http://en.wikipedia.org/wiki/Intel_i860  wow!. :D
>

That's supposed to be i7-860:
http://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors

It was a whole $199, so not an expensive processor.

> Now, seriously: what memory (brand/model) does the Q6600 and your
> newer desktop have?
>

Q6600 is running Corsair DDR2-800 (5-5-5-18):
http://www.newegg.com/Product/Product.aspx?Item=N82E16820145176

i7-860 has Corsair DDR3-1600 C8 (8-8-8-24):
http://www.newegg.com/Product/Product.aspx?Item=N82E16820145265

Both systems have 4 2GB modules in them for 8GB total.

I've been both happy with the performance of the Corsair stuff, and with
how their head spreader design keeps my grubby fingers off the sensitive
parts of the chips.  This is all desktop memory though; the registered
and ECC stuff for servers tends to be a bit slower, but for good reasons.

> I'm just too curious, last time I was able to run benchmarks myself
> was with a core2duo and a athlon 64 x2, back then: core2due beated
> athlon at almost anything.
>

Yes.  The point I've made a couple of times here already is that Intel
pulled ahead around the Core 2 time, and AMD has been anywhere from a
little to way behind ever since.  And in the last 18 months that's
mainly been related to the memory controller design, not the CPUs
themselves.  Until these new Magny Cours designs, where AMD finally
caught back up, particularly on big servers with lots of banks of RAM.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Performance on new 64bit server compared to my 32bit desktop

От
Yeb Havinga
Дата:
Scott Marlowe wrote:
> On Tue, Aug 31, 2010 at 6:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>
>>> export OMP_NUM_THREADS=4
>>>
>> Then I get the following. The rather wierd dip at 5 threads is consistent
>> over multiple tries:
>>
>>
>
> I get similar dips on my server.  Especially as you make the stream
> test write a large enough chunk of data to outrun its caches.
>
> See attached png.
>
Interesting graph, especially since the overall feeling is a linear like
increase in memory bandwidth when more cores are active.

Just curious, what is the 8-core cpu?

-- Yeb


Re: Performance on new 64bit server compared to my 32bit desktop

От
Scott Marlowe
Дата:
On Tue, Aug 31, 2010 at 12:55 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Scott Marlowe wrote:
>>
>> On Tue, Aug 31, 2010 at 6:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>>
>>>>
>>>> export OMP_NUM_THREADS=4
>>>>
>>>
>>> Then I get the following. The rather wierd dip at 5 threads is consistent
>>> over multiple tries:
>>>
>>>
>>
>> I get similar dips on my server.  Especially as you make the stream
>> test write a large enough chunk of data to outrun its caches.
>>
>> See attached png.
>>
>
> Interesting graph, especially since the overall feeling is a linear like
> increase in memory bandwidth when more cores are active.
>
> Just curious, what is the 8-core cpu?

8 core = dual 2352 cpus (2x4) 2.1 GHz
12 core = dual 2427 cpus (2x6) 2.2 GHz
48 core = quad 6127 cpus (4x12) 2.1 GHz

--
To understand recursion, one must first understand recursion.