Обсуждение: Opteron vs. Xeon performance differences
Forgive me if this has been beaten into the ground, but my team and I couldn’t find much conclusive study or posts on this issue. To make a long story short: we’re experiencing Xeons as 50% slower than Opterons, even when the Xeon has twice as much cache and a slight clock speed advantage.
The full story: we have an older production server with 2G of RAM, 2.4GHz Opterons w/ 1M of cache. The database is not large, only around 7M or 8M rows altogether, 2.5G on disk. Most queries are reads, probably on a 10:1 proportion with writes. In the process of upgrading this server to a pair of DRBD-mirrored (more on this below) servers we discovered that the new servers were actually slower than the older one. The newer servers have 4G of RAM, 3.0GHz Xeons with 2M of cache. And not just a little slower, but queries (simple, complex, and disgusting recursive stored procedures) routinely run in 50-100% more time than they did on the older server. After many troubleshooting techniques (downgrading the kernel to that of the older machine, verifying version parity, copying the binary from the older server, building a 32bit binary on the new servers, running the entire database out of a ramdisk, and of course much tweaking of postgresql.conf) and seeing virtually no benefit from any of these tests I finally took the final leap: just pull the disks and throw them in a newer Opteron chassis (2.8GHz, 1M cache). And whaddya know? It’s got a 20% speed edge on the older Opteron, and blows away the performance of the newer Xeons.
One of my guys did some testing and it appears that LWLockAquire and LWLockRelease are the culprits, but we’re not entirely confident of our conclusion. Any thoughts on why this might be so different between the two architectures? We’re a hosting provider so we’ve got some spare equipment to work with and I’m going to request that we keep these two boxes up for a week or so. Are there any other tests that you guys can suggest that would help get down to the bottom of this? I figure that not everyone has access to as much gear as we do so it might be a good opportunity to get some A/B testing on a production database on identical OS/server installs on different hardware. I’m content to just say “Well, we use Opterons then!”, but I imagine that if we could help bring equal performance to Xeon users that it would be worth the effort of volunteering. To be clear, I have two machines sitting on the network ready for tweaking, one is a Xeon, the other is an Opteron, neither is in production and both can be fully mangled in the interest of figuring this out.
Speaking of being a hosting provider, I may as well take a moment to point out that we are working with DRBD for mirroring and have found it works beautifully with PG (MySQL as well). Also, while our “Managed Database Service” product is geared around MySQL, Oracle, and MSSQL, we’re pretty familiar with PG and would be happy to talk to anyone about hosting needs they may have.
Thanks for listening, and again please let me know if there is further testing we can do to help get to the bottom of this Opteron/Xeon performance discrepancy.
Bart Grantham
VP of R&D
Logicworks, Inc.
www.logicworks.net
On Thu, Oct 9, 2008 at 3:34 PM, Bart Grantham <bg@logicworks.net> wrote: > Forgive me if this has been beaten into the ground, but my team and I > couldn't find much conclusive study or posts on this issue. To make a long > story short: we're experiencing Xeons as 50% slower than Opterons, even when > the Xeon has twice as much cache and a slight clock speed advantage. I'm not sure what causes this issue either, although I suspect it's the inter-CPU / CPU to memory communication speeds that make the difference. It seems that as the number of CPUs increase, the opteron lead increases over the xeon.
How do you manage the wal in both servers? The version kernel is the same in both? Runs the same services? Do you make some test with Posgresql only in both servers? If the problem is the inter-CPU, i know you can specified the number of processors do you want to run dedicated to one process. 2008/10/10 Scott Marlowe <scott.marlowe@gmail.com>: > On Thu, Oct 9, 2008 at 3:34 PM, Bart Grantham <bg@logicworks.net> wrote: >> Forgive me if this has been beaten into the ground, but my team and I >> couldn't find much conclusive study or posts on this issue. To make a long >> story short: we're experiencing Xeons as 50% slower than Opterons, even when >> the Xeon has twice as much cache and a slight clock speed advantage. > > I'm not sure what causes this issue either, although I suspect it's > the inter-CPU / CPU to memory communication speeds that make the > difference. It seems that as the number of CPUs increase, the opteron > lead increases over the xeon. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Bart Grantham wrote: > Forgive me if this has been beaten into the ground, but my team and I > couldn’t find much conclusive study or posts on this issue. To make a > long story short: we’re experiencing Xeons as 50% slower than Opterons, > even when the Xeon has twice as much cache and a slight clock speed > advantage. Simple question, you know that the plans are the same? And I don't think you said conclusively that it's the same version of PGSQL on both servers?
On Thu, 9 Oct 2008, Bart Grantham wrote: > The full story: we have an older production server with 2G of RAM, > 2.4GHz Opterons w/ 1M of cache...The newer servers have 4G of RAM, > 3.0GHz Xeons with 2M of cache. Model numbers please? I can probably guess for the Opterons, there are a lot of different implementations lumped under the Xeon brand name. Have you taken compared how fast the RAM is in the two systems? We were just talking about a similar unexpected performance different yesterday on another list: http://archives.postgresql.org/pgsql-performance/2008-10/msg00051.php I'd be curious what memtest86+ and the simple hdparm -T benchmark say about the two servers. If those numbers correlate with the performance difference you're seeing, the PostgreSQL code might have nothing to do with it. I've seen a 60% performance difference just between the best and worst RAM I tried on a single motherboard recently. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Bart Grantham wrote: > a long story short: we're experiencing Xeons as 50% slower than > Opterons, even when the Xeon has twice as much cache and a slight > clock speed advantage. > tests I finally took the final leap: just pull the disks and throw > them in a newer Opteron chassis (2.8GHz, 1M cache). And whaddya > know? It's got a 20% speed edge on the older Opteron, and blows away > the performance of the newer Xeons. But is the difference in cpu or disk? Do the two machines get a similar disk transfer rate? Same raid card and disks in both machines, do they get the same MB/Sec? (as opposed to on-board controllers) -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
When i question about WAL, i mean if WAL is in other drive. You must run a benchmark more expensive to cpu for make a conclusion. Make a query that have more of 8 seconds, then you can see really if exists a diference in other way... i think you don't use the same image of the old server in the new. In that way could be a configuration kernel. do you make a test of hardware instead postgres?? if the hard give you better numbers, so postgres have the problem. 2008/10/10 Shane Ambler <pgsql@sheeky.biz>: > Bart Grantham wrote: >> >> a long story short: we're experiencing Xeons as 50% slower than >> Opterons, even when the Xeon has twice as much cache and a slight >> clock speed advantage. > >> tests I finally took the final leap: just pull the disks and throw >> them in a newer Opteron chassis (2.8GHz, 1M cache). And whaddya >> know? It's got a 20% speed edge on the older Opteron, and blows away >> the performance of the newer Xeons. > > But is the difference in cpu or disk? > > Do the two machines get a similar disk transfer rate? > > Same raid card and disks in both machines, do they get the same MB/Sec? > (as opposed to on-board controllers) > > > > -- > > Shane Ambler > pgSQL (at) Sheeky (dot) Biz > > Get Sheeky @ http://Sheeky.Biz > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Fri, 10 Oct 2008, Bart Grantham wrote: > The Opterons are 2220 SE's, the Xeons are 5450's I think (family "15", model "6"). > > Xeon - 3056 MB in 2.00 seconds = 1527.85 MB/sec > Opteron - 4944 MB in 2.00 seconds = 2472.50 MB/sec There's something wrong with that Xeon system. That number should be twice that and your Xeon smoking those Opterons by 25% or so on benchmarks. My Q6600 system at home has a slower bus and clock speed than your Xeon, but hits 3891MB/s on cached hdparm even with the slowest of the RAM I have here. Now that I got the first round right, can I make a double or nothing bet that your Xeon system is either a) not running your RAM in dual-channel mode or b) is getting throttled by power management? > Should I cross post to pgsql-performance? Or are most of the people on > that list here, too? That would have been a better place to start at, but don't bother switching now--there's a lot of overlap. Cross-posting to the lists here is bad, partly because then replies by people who only belong to one of the two end up bugging the list admins. One of these days I'm going to summarize the main lore on this topic into a Wiki article anyway, which will pull the good stuff out of here regardless of the originating list. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD