Обсуждение: CPUs for new databases

Поиск
Список
Период
Сортировка

CPUs for new databases

От
Christian Elmerot
Дата:
Hello,

What is the general view of performance CPU's nowadays when it comes to
PostgreSQL performance? Which CPU is the better choice, in regards to
RAM access-times, stream speed, cache synchronization etc. Which is the
better CPU given the limitation of using AMD64 (x86-64)?

We're getting ready to replace our (now) aging db servers with some
brand new with higher core count. The old ones are 4-socket dual-core
Opteron 8218's with 48GB RAM. Right now the disk-subsystem is not the
limiting factor so we're aiming for higher core-count and as well as
faster and more RAM. We're also moving into the territory of version 9.0
with streaming replication to be able to offload at least a part of the
read-only queries to the slave database. The connection count on the
database usually lies in the region of ~2500 connections and the
database is small enough that it can be kept entirely in RAM (dump is
about 2,5GB).

Regards,
Christian Elmerot

Re: CPUs for new databases

От
"Kevin Grittner"
Дата:
Christian Elmerot <ce@one.com> wrote:

> What is the general view of performance CPU's nowadays when it
> comes to PostgreSQL performance? Which CPU is the better choice,
> in regards to RAM access-times, stream speed, cache
> synchronization etc. Which is the better CPU given the limitation
> of using AMD64 (x86-64)?

You might want to review recent posts by Greg Smith on this.  One
such thread starts here:

http://archives.postgresql.org/pgsql-performance/2010-09/msg00120.php

> We're getting ready to replace our (now) aging db servers with
> some brand new with higher core count. The old ones are 4-socket
> dual-core Opteron 8218's with 48GB RAM. Right now the disk-subsystem
> is not the limiting factor so we're aiming for higher core-count
> and as well as faster and more RAM. We're also moving into the
> territory of version 9.0 with streaming replication to be able to
> offload at least a part of the read-only queries to the slave
> database. The connection count on the database usually lies in the
> region of ~2500 connections and the database is small enough that
> it can be kept entirely in RAM (dump is about 2,5GB).

You really should try connection pooling.  Even though many people
find it counterintuitive, it is likely to improve both throughput
and response time significantly.  See any of the many previous
threads on the topic for reasons.

-Kevin

Re: CPUs for new databases

От
Scott Marlowe
Дата:
On Tue, Oct 26, 2010 at 6:55 AM, Christian Elmerot <ce@one.com> wrote:
> Hello,
>
> What is the general view of performance CPU's nowadays when it comes to
> PostgreSQL performance? Which CPU is the better choice, in regards to RAM
> access-times, stream speed, cache synchronization etc. Which is the better
> CPU given the limitation of using AMD64 (x86-64)?

For faster but fewer individual cores the Intels are pretty good.  For
way more cores, each being pretty fast and having enough memory
bandwidth to use all those cores, the AMDs are very impressive.  The
Magny Cours AMDs are probably the best 4 socket cpus made.

> We're getting ready to replace our (now) aging db servers with some brand
> new with higher core count. The old ones are 4-socket dual-core Opteron
> 8218's with 48GB RAM.

A single AMD 12 core Magny Cours or Intel Nehalem 8 core cpu would be
twice as fast or more than the old machine.

> The connection count on the database usually lies in
> the region of ~2500 connections and the database is small enough that it can
> be kept entirely in RAM (dump is about 2,5GB).

As another poster mentioned, you should really look at connection pooling.

Re: CPUs for new databases

От
Christian Elmerot
Дата:
On 2010-10-26 16:27, Kevin Grittner wrote:
> Christian Elmerot<ce@one.com>  wrote:
>
>> What is the general view of performance CPU's nowadays when it
>> comes to PostgreSQL performance? Which CPU is the better choice,
>> in regards to RAM access-times, stream speed, cache
>> synchronization etc. Which is the better CPU given the limitation
>> of using AMD64 (x86-64)?
>
> You might want to review recent posts by Greg Smith on this.  One
> such thread starts here:
>
> http://archives.postgresql.org/pgsql-performance/2010-09/msg00120.php

I've read those posts before and they are interresting but only part of
the puzzle.


>
>> We're getting ready to replace our (now) aging db servers with
>> some brand new with higher core count. The old ones are 4-socket
>> dual-core Opteron 8218's with 48GB RAM. Right now the disk-subsystem
>> is not the limiting factor so we're aiming for higher core-count
>> and as well as faster and more RAM. We're also moving into the
>> territory of version 9.0 with streaming replication to be able to
>> offload at least a part of the read-only queries to the slave
>> database. The connection count on the database usually lies in the
>> region of ~2500 connections and the database is small enough that
>> it can be kept entirely in RAM (dump is about 2,5GB).
>
> You really should try connection pooling.  Even though many people
> find it counterintuitive, it is likely to improve both throughput
> and response time significantly.  See any of the many previous
> threads on the topic for reasons.

I believe you are right as this is actually something we're looking into
as we're making read-only queries pass through a dedicated set of
lookup-hosts as well as having writes that are not time critical to pass
through another set of hosts.

Regards,
Christian Elmerot

Re: CPUs for new databases

От
Josh Berkus
Дата:
On 10/26/10 7:50 AM, Scott Marlowe wrote:
> For faster but fewer individual cores the Intels are pretty good.  For
> way more cores, each being pretty fast and having enough memory
> bandwidth to use all those cores, the AMDs are very impressive.  The
> Magny Cours AMDs are probably the best 4 socket cpus made.

In a general workload, fewer faster cores are better.  We do not scale
perfectly across cores.  The only case where that's not true is
maintaining lots of idle connections, and that's really better dealt
with in software.

So I've been buying Intel for the last couple years.  Would love to see
some testing on database workloads.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: CPUs for new databases

От
James Cloos
Дата:
>>>>> "JB" == Josh Berkus <josh@agliodbs.com> writes:

JB> In a general workload, fewer faster cores are better.  We do not scale
JB> perfectly across cores.  The only case where that's not true is
JB> maintaining lots of idle connections, and that's really better dealt
JB> with in software.

I've found that ram speed is the most limiting factor I've run into for
those cases where the db fits in RAM.  The less efficient lookups run
just as fast when the CPU is in powersving mode as in performance, which
implies that the cores are mostly waiting on RAM (cache or main).

I suspect cache size and ram speed will be the most important factors
until the point where disk i/o speed and capacity take over.

I'm sure some db applications run computaionally expensive queries on
the server, but most queries seem light on computaion and heavy on
gathering and comparing.

It can help to use recent versions of gcc with -march=native.  And
recent versions of glibc offer improved string ops on recent hardware.

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

Re: CPUs for new databases

От
Ivan Voras
Дата:
On 10/27/10 01:45, James Cloos wrote:
>>>>>> "JB" == Josh Berkus<josh@agliodbs.com>  writes:
>
> JB>  In a general workload, fewer faster cores are better.  We do not scale
> JB>  perfectly across cores.  The only case where that's not true is
> JB>  maintaining lots of idle connections, and that's really better dealt
> JB>  with in software.
>
> I've found that ram speed is the most limiting factor I've run into for
> those cases where the db fits in RAM.  The less efficient lookups run
> just as fast when the CPU is in powersving mode as in performance, which
> implies that the cores are mostly waiting on RAM (cache or main).
>
> I suspect cache size and ram speed will be the most important factors
> until the point where disk i/o speed and capacity take over.

FWIW, yes - once the IO is fast enough or not necessary (e.g. the
read-mostly database fits in RAM), RAM bandwidth *is* the next
bottleneck and it really, really can be observed in actual loads. Buying
a QPI-based CPU instead of the cheaper DMI-based ones (if talking about
Intel chips), and faster memory modules (DDR3-1333+) really makes a
difference in this case.

(QPI and DMI are basically the evolution the front side bus; AMD had HT
- HyperTransport for years now. Wikipedia of course has more information
for the interested.)


Re: CPUs for new databases

От
Scott Marlowe
Дата:
On Tue, Oct 26, 2010 at 6:18 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> FWIW, yes - once the IO is fast enough or not necessary (e.g. the
> read-mostly database fits in RAM), RAM bandwidth *is* the next bottleneck
> and it really, really can be observed in actual loads. Buying a QPI-based
> CPU instead of the cheaper DMI-based ones (if talking about Intel chips),
> and faster memory modules (DDR3-1333+) really makes a difference in this
> case.
>
> (QPI and DMI are basically the evolution the front side bus; AMD had HT -
> HyperTransport for years now. Wikipedia of course has more information for
> the interested.)

Note that there are greatly different speeds in HyperTransport from
one AMD chipset to the next.  The newest ones, currently Magny Cours
are VERY fast with 1333MHz memory in 64 banks on my 4 cpu x 12 core
machine.  And it does scale with each thread I throw at it through
right at 48.  Note that those CPUs have 12Megs L3 cache, which makes a
big difference if a lot can fit in cache, but even if it can't the
speed to main memory is very good.  There was an earlier thread with
Greg and I in it where we posted the memory bandwidth numbers for that
machine and it was insane how much data all 48 cores could pump into /
out of memory at the same time.

Re: CPUs for new databases

От
Yeb Havinga
Дата:
Scott Marlowe wrote:
> There was an earlier thread with
> Greg and I in it where we posted the memory bandwidth numbers for that
> machine and it was insane how much data all 48 cores could pump into /
> out of memory at the same time.
>
Yeah, it was insane. Building a economical 'that generation opteron'
database server has been on my wishlist since that thread, my current
favorite is the 8-core 6128 opteron, for $275,- at newegg
http://www.newegg.com/Product/Product.aspx?Item=N82E16819105266

Ah might as well drop the whole config on my wishlist as well:

2 times that 8 core processor
Supermicro H8DGU-F motherboard - 16 dimm slots, dual socket, dual Intel
ethernet and additional ethernet for IPMI.
2 times KVR1333D3D4R9SK8/32G memory - 4GB dimms seem to be at the GB/$
sweet spot at the moment for DDR3
1 time OCZ Vertex 2 Pro 100GB (there was a thread about this sandforce
disk as well: a SSD with supercap that acts as battery backup)
maybe another one or two spindled 2.5" drives for archive/backup.
Supermicro 113TQ-563UB chassis

At the time I looked this up, I could buy it for just over €3000,-

regards
Yeb Havinga

PS: I'm in no way involved with either of the manufacturers, nor one of
their fanboys. I'm just interested, like the OP, what is good
hardware/config for a PG related server.


Re: CPUs for new databases

От
Scott Marlowe
Дата:
On Wed, Oct 27, 2010 at 1:37 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Scott Marlowe wrote:
>>
>> There was an earlier thread with
>> Greg and I in it where we posted the memory bandwidth numbers for that
>> machine and it was insane how much data all 48 cores could pump into /
>> out of memory at the same time.
>>
>
> Yeah, it was insane. Building a economical 'that generation opteron'
> database server has been on my wishlist since that thread, my current
> favorite is the 8-core 6128 opteron, for $275,- at newegg
> http://www.newegg.com/Product/Product.aspx?Item=N82E16819105266
>
> Ah might as well drop the whole config on my wishlist as well:
>
> 2 times that 8 core processor
> Supermicro H8DGU-F motherboard - 16 dimm slots, dual socket, dual Intel
> ethernet and additional ethernet for IPMI.
> 2 times KVR1333D3D4R9SK8/32G memory - 4GB dimms seem to be at the GB/$ sweet
> spot at the moment for DDR3
> 1 time OCZ Vertex 2 Pro 100GB (there was a thread about this sandforce disk
> as well: a SSD with supercap that acts as battery backup)
> maybe another one or two spindled 2.5" drives for archive/backup.
> Supermicro 113TQ-563UB chassis
>
> At the time I looked this up, I could buy it for just over €3000,-

It's important to remember that often we're talking about a machine
that has to run dozens of concurrent requests when you start needing
this many cores, and consequently, how man spindles (SSD or HD) to
sustain a certain throughput rate.

If you're looking at that many cores make sure you can put enough SSDs
and / or HDs underneath it to keep up.  Just being able to go from 4
to 8 drives can extend the life of a db server by years.  Supermicro
makes some nice 2U enclosures that hold either 8 or 16  2.5" drives.

> PS: I'm in no way involved with either of the manufacturers, nor one of
> their fanboys. I'm just interested, like the OP, what is good
> hardware/config for a PG related server.

Me either really.  Both times I bought db servers were right after AMD
had taken a lead in SMP.  Got a fair number of intel cpu machines in
the farm that work great, but not as database servers.  But I am keen
on the 8 core AMDs to come down.  Those things have crazy good memory
bandwidth and you can actually use all 16 cores in a server.  I've got
a previous intermediate AMD with the old 6 core cpus, and that thing
can't run more than 8 processes before it starts slowing down.

I don't know your projected data usage needs, but if they are at all
on a positive slope, consider the machines with 8 drive bays at least,
even if you only need 2 or 4 drives now.  Those chassis let you extend
the IO of a db server at will to 2 to 4 times it's original setup
pretty easily.

--
To understand recursion, one must first understand recursion.

Re: CPUs for new databases

От
Scott Marlowe
Дата:
One last note.  Our vendor at the time we ordered our quad 12 core
machines could only provide that mobo in a 1U chassis.  Consequently
we bought all external arrays for that machine.  Since you're looking
at a dual 8 core machine, you should be able to get a mobo like that
in almost any chassis you want.

Re: CPUs for new databases

От
Josh Berkus
Дата:
On 10/26/10 6:14 PM, Scott Marlowe wrote:
>   There was an earlier thread with
> Greg and I in it where we posted the memory bandwidth numbers for that
> machine and it was insane how much data all 48 cores could pump into /
> out of memory at the same time.

Well, the next step then is to do some database server benchmarking.

My experience has been that PostgreSQL scales poorly past 30 cores, or
even at lower levels depending on the workload.  So it would be
interesting to see if the memory bandwidth on the AMDs makes up for our
scaling issues.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: CPUs for new databases

От
Scott Marlowe
Дата:
On Wed, Oct 27, 2010 at 12:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 10/26/10 6:14 PM, Scott Marlowe wrote:
>>   There was an earlier thread with
>> Greg and I in it where we posted the memory bandwidth numbers for that
>> machine and it was insane how much data all 48 cores could pump into /
>> out of memory at the same time.
>
> Well, the next step then is to do some database server benchmarking.
>
> My experience has been that PostgreSQL scales poorly past 30 cores, or
> even at lower levels depending on the workload.  So it would be
> interesting to see if the memory bandwidth on the AMDs makes up for our
> scaling issues.

Which OSes have you tested it on?  And what hardware?  For smaller
operations, like pgbench, where a large amount of what you're working
on fits in cache, I get near linear scaling right up to 48 cores.
Overall performance increases til about 50 threads, then drops off to
about 60 to 70% peak for the next hundred or so threads I add on.

Re: CPUs for new databases

От
Scott Marlowe
Дата:
On Wed, Oct 27, 2010 at 12:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Oct 27, 2010 at 12:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 10/26/10 6:14 PM, Scott Marlowe wrote:
>>>   There was an earlier thread with
>>> Greg and I in it where we posted the memory bandwidth numbers for that
>>> machine and it was insane how much data all 48 cores could pump into /
>>> out of memory at the same time.
>>
>> Well, the next step then is to do some database server benchmarking.
>>
>> My experience has been that PostgreSQL scales poorly past 30 cores, or
>> even at lower levels depending on the workload.  So it would be
>> interesting to see if the memory bandwidth on the AMDs makes up for our
>> scaling issues.
>
> Which OSes have you tested it on?  And what hardware?  For smaller
> operations, like pgbench, where a large amount of what you're working
> on fits in cache, I get near linear scaling right up to 48 cores.
> Overall performance increases til about 50 threads, then drops off to
> about 60 to 70% peak for the next hundred or so threads I add on.

And that's with 8.3.latest on ubuntu 10.04 with latest updates on HW RAID.

Re: CPUs for new databases

От
Greg Smith
Дата:
Ivan Voras wrote:
> FWIW, yes - once the IO is fast enough or not necessary (e.g. the
> read-mostly database fits in RAM), RAM bandwidth *is* the next
> bottleneck and it really, really can be observed in actual loads.

This is exactly what I've concluded, after many rounds of correlating
memory speed tests with pgbench tests against in-RAM databases.  And
it's the reason why I've written the stream-scaling utility and been
collecting test results from as many systems as possible.  That seemed
to get dismissed upthread as not being the answer the poster was looking
for, but I think you have to get a handle on that part before the rest
of the trivia involved even matters.

I have a bunch more results that have been flowing in that I need to
publish there soon.  Note that there is a bug in stream-scaling where
sufficiently large systems can hit a compiler problem where it reports
"relocation truncated to fit: R_X86_64_PC32 against `.bss'".  I have two
of those reports and am working on resolving.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: CPUs for new databases

От
"Christian Elmerot @ One.com"
Дата:
On 2010-10-27 21:58, Greg Smith wrote:
> Ivan Voras wrote:
>> FWIW, yes - once the IO is fast enough or not necessary (e.g. the
>> read-mostly database fits in RAM), RAM bandwidth *is* the next
>> bottleneck and it really, really can be observed in actual loads.
>
> This is exactly what I've concluded, after many rounds of correlating
> memory speed tests with pgbench tests against in-RAM databases.  And
> it's the reason why I've written the stream-scaling utility and been
> collecting test results from as many systems as possible.  That seemed
> to get dismissed upthread as not being the answer the poster was
> looking for, but I think you have to get a handle on that part before
> the rest of the trivia involved even matters.
>
> I have a bunch more results that have been flowing in that I need to
> publish there soon.  Note that there is a bug in stream-scaling where
> sufficiently large systems can hit a compiler problem where it reports
> "relocation truncated to fit: R_X86_64_PC32 against `.bss'".  I have
> two of those reports and am working on resolving.
>

Just to chime in after the new systems were purchased and installed. We
ended up with buying a 4x Opteron 6168 (12core Magny-cours,12MB cache @
1.9Ghz) with 128GB 1333Mhz DDR3 RAM. That's an insane 48 cores. That is
perhaps slightly beyond the scaling horizon for Postgres at the moment
but we're confident that scaling will improve over the lifetime with
these servers.

Using the stream-scaling test we see some very impressive numbers:

Highest results comes at 32 threads:

Number of Threads requested = 32
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      81013.5506       0.0378       0.0377       0.0379

The pattern is quite clear in that any multiple of 4 (the number of
physical CPU packages) get a higher value but thinking about how the
memory is connected and utilized this makes perfect sense.

Full output below

Regards,
Christian Elmerot, One.com




=== CPU cache information ===
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu10 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu10 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu11 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu11 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu12 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu12 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu12 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu12 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu13 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu13 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu13 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu13 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu14 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu14 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu14 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu14 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu15 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu15 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu15 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu15 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu16 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu16 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu16 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu16 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu17 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu17 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu17 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu17 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu18 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu18 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu18 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu18 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu19 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu19 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu19 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu19 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu2 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu20 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu20 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu20 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu20 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu21 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu21 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu21 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu21 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu22 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu22 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu22 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu22 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu23 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu23 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu23 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu23 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu24 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu24 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu24 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu24 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu25 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu25 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu25 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu25 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu26 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu26 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu26 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu26 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu27 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu27 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu27 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu27 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu28 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu28 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu28 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu28 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu29 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu29 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu29 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu29 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu3 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu30 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu30 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu30 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu30 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu31 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu31 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu31 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu31 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu32 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu32 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu32 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu32 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu33 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu33 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu33 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu33 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu34 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu34 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu34 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu34 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu35 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu35 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu35 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu35 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu36 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu36 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu36 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu36 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu37 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu37 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu37 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu37 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu38 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu38 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu38 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu38 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu39 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu39 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu39 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu39 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu4 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu40 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu40 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu40 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu40 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu41 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu41 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu41 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu41 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu42 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu42 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu42 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu42 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu43 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu43 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu43 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu43 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu44 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu44 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu44 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu44 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu45 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu45 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu45 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu45 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu46 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu46 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu46 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu46 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu47 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu47 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu47 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu47 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu5 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu6 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu6 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu7 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu7 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu8 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu8 Level 3 Cache: 5118K (Unified)
CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 64K (Data)
CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 64K (Instruction)
CPU /sys/devices/system/cpu/cpu9 Level 2 Cache: 512K (Unified)
CPU /sys/devices/system/cpu/cpu9 Level 3 Cache: 5118K (Unified)
Total CPU system cache: 279871488 bytes
Computed minimum array elements needed: 127214312
Minimum array elements used: 127214312

=== Check and build stream ===

=== Testing up to 48 cores ===

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 127214312, Offset = 0
Total memory required = 2911.7 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 254582 microseconds.
    (= 254582 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6057.6445       0.3418       0.3360       0.3750
Scale:       6028.3481       0.3442       0.3376       0.3786
Add:         6304.5394       0.4900       0.4843       0.5142
Triad:       6236.0693       0.4968       0.4896       0.5219
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      12471.5252       0.2448       0.2448       0.2449

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      15952.3092       0.1914       0.1914       0.1915

Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      24935.8135       0.1225       0.1224       0.1225

Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      26223.8995       0.1165       0.1164       0.1166

Number of Threads requested = 6
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      36886.6048       0.0828       0.0828       0.0828

Number of Threads requested = 7
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      36930.7515       0.0827       0.0827       0.0828

Number of Threads requested = 8
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      38068.1227       0.0826       0.0802       0.0833

Number of Threads requested = 9
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      21442.7286       0.1506       0.1424       0.1639

Number of Threads requested = 10
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      22577.0833       0.1356       0.1352       0.1359

Number of Threads requested = 11
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23312.3289       0.1311       0.1310       0.1311

Number of Threads requested = 12
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      40323.1058       0.0760       0.0757       0.0763

Number of Threads requested = 13
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      47004.6724       0.0652       0.0650       0.0654

Number of Threads requested = 14
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      44424.2111       0.0687       0.0687       0.0688

Number of Threads requested = 15
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      52259.2348       0.0585       0.0584       0.0587

Number of Threads requested = 16
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      64229.4556       0.0476       0.0475       0.0477

Number of Threads requested = 17
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      34654.6042       0.0969       0.0881       0.0989

Number of Threads requested = 18
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      43236.4397       0.0846       0.0706       0.0985

Number of Threads requested = 19
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      40173.4578       0.0783       0.0760       0.0799

Number of Threads requested = 20
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      52418.1724       0.0585       0.0582       0.0587

Number of Threads requested = 21
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      59309.7805       0.0517       0.0515       0.0518

Number of Threads requested = 22
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      55953.3174       0.0547       0.0546       0.0548

Number of Threads requested = 23
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      69792.5266       0.0439       0.0437       0.0443

Number of Threads requested = 24
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      78001.9366       0.0393       0.0391       0.0393

Number of Threads requested = 25
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      56661.3804       0.0670       0.0539       0.0740

Number of Threads requested = 26
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      51899.3931       0.0624       0.0588       0.0674

Number of Threads requested = 27
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      48560.3902       0.0681       0.0629       0.0704

Number of Threads requested = 28
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      63773.3287       0.0485       0.0479       0.0498

Number of Threads requested = 29
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      67561.1570       0.0456       0.0452       0.0457

Number of Threads requested = 30
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      59568.8426       0.0514       0.0513       0.0515

Number of Threads requested = 31
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      54612.0337       0.0565       0.0559       0.0567

Number of Threads requested = 32
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      81013.5506       0.0378       0.0377       0.0379

Number of Threads requested = 33
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      58938.5382       0.0570       0.0518       0.0594

Number of Threads requested = 34
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      58142.9574       0.0555       0.0525       0.0591

Number of Threads requested = 35
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      52356.8789       0.0590       0.0583       0.0594

Number of Threads requested = 36
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      64303.6362       0.0481       0.0475       0.0485

Number of Threads requested = 37
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      63251.3840       0.0483       0.0483       0.0484

Number of Threads requested = 38
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      74401.3522       0.0411       0.0410       0.0412

Number of Threads requested = 39
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      77623.2130       0.0394       0.0393       0.0394

Number of Threads requested = 40
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      80152.0442       0.0383       0.0381       0.0384

Number of Threads requested = 41
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      68952.6217       0.0443       0.0443       0.0443

Number of Threads requested = 42
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      69971.7614       0.0437       0.0436       0.0437

Number of Threads requested = 43
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      71488.5304       0.0428       0.0427       0.0430

Number of Threads requested = 44
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      72992.9602       0.0419       0.0418       0.0420

Number of Threads requested = 45
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      75000.9485       0.0408       0.0407       0.0409

Number of Threads requested = 46
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      76208.7407       0.0401       0.0401       0.0402

Number of Threads requested = 47
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      77969.6418       0.0392       0.0392       0.0393

Number of Threads requested = 48
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      79731.3522       0.0384       0.0383       0.0385



Re: CPUs for new databases

От
Greg Smith
Дата:
Christian Elmerot @ One.com wrote:
> Highest results comes at 32 threads:
> Number of Threads requested = 32
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Triad:      81013.5506       0.0378       0.0377       0.0379

There is some run-to-run variation in the results of this test, and
accordingly some margin for error in each individual result.  I wouldn't
consider the difference between the speed at 32 threads (81) and 48
(79.7) to be statistically significant, and based on the overall shape
of the results curve that 32 result looks suspicious.  I would bet that
if you run the test multiple times, you'd sometimes seen the 48 core one
run faster than the 32.

> The pattern is quite clear in that any multiple of 4 (the number of
> physical CPU packages) get a higher value but thinking about how the
> memory is connected and utilized this makes perfect sense.

In addition to the memory issues, there's also thread CPU scheduling
involved here.  Ideally the benchmark would pin each thread to a single
core and keep it there for the runtime of the test, but it's not there
yet.  I suspect one source of variation at odd numbers of threads
involves processes that bounce between CPUs more than in the more even
cases.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: CPUs for new databases

От
Scott Carey
Дата:
On Nov 26, 2010, at 2:30 PM, Greg Smith wrote:

>
> In addition to the memory issues, there's also thread CPU scheduling
> involved here.  Ideally the benchmark would pin each thread to a single
> core and keep it there for the runtime of the test, but it's not there
> yet.  I suspect one source of variation at odd numbers of threads
> involves processes that bounce between CPUs more than in the more even
> cases.
>

Depends on what you're interested in.

Postgres doesn't pin threads to processors.  Postgres doesn't use threads.  A STREAM benchmark that used multiple
processes,with half SYSV shared and half in-process memory access, would be better.   How the OS schedules the
processesand memory access is critical.  One server might score higher on an optimized 'pin the processes' STREAM test,
butbe slower in the real world for Postgres because its not testing anything that Postgres can do. 


> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> PostgreSQL Training, Services and Support        www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance