Обсуждение: Performance problems on 4/8way Opteron (dualcore) HP DL585
Hi, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? We run RHEL 3.0, 32bit and under high load it is a drag. We mostly run memory demanding queries. Context switches are pretty much around 20.000 on the average, no cs spikes when we run many processes in parallel. Actually we only see two processes in running state! When there are only a few processes running context switches go much higher. At the moment we are much slower that with a 4way XEON box (DL580). We are running 8.0.3 compiled with -mathlon flags. Regards, Dirk
Dirk, > does anybody have expierence with this machine (4x 875 dual core Opteron > CPUs)? Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet encountered. > We run RHEL 3.0, 32bit and under high load it is a drag. We > mostly run memory demanding queries. Context switches are pretty much > around 20.000 on the average, no cs spikes when we run many processes in > parallel. Actually we only see two processes in running state! When > there are only a few processes running context switches go much higher. > At the moment we are much slower that with a 4way XEON box (DL580). Um, that was a bit incoherent. Are you seeing a CS storm or aren't you? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote: > Dirk, > > > does anybody have expierence with this machine (4x 875 dual core Opteron > > CPUs)? I'm using dual 275s without problems. > Nope. I suspect that you may be the first person to report in on > dual-cores. There may be special compile issues with dual-cores that > we've not yet encountered. Doubtful. However you could see improvements using recent Linux kernel code. There have been some patches for optimizing scheduling and memory allocations. However, if you are running this machine in 32-bit mode, why did you bother paying $14,000 for your CPUs? You will get FAR better performance in 64-bit mode. 64-bit mode will give you 30-50% better performance on PostgreSQL loads, in my experience. Also, if I remember correctly, the 32-bit x86 kernel doesn't understand Opteron NUMA topology, so you may be seeing poor memory allocation decisions. -jwb > > We run RHEL 3.0, 32bit and under high load it is a drag. We > > mostly run memory demanding queries. Context switches are pretty much > > around 20.000 on the average, no cs spikes when we run many processes in > > parallel. Actually we only see two processes in running state! When > > there are only a few processes running context switches go much higher. > > At the moment we are much slower that with a 4way XEON box (DL580). > > Um, that was a bit incoherent. Are you seeing a CS storm or aren't you? >
On 7/29/05 10:46 AM, "Josh Berkus" <josh@agliodbs.com> wrote: >> does anybody have expierence with this machine (4x 875 dual core Opteron >> CPUs)? > > Nope. I suspect that you may be the first person to report in on > dual-cores. There may be special compile issues with dual-cores that > we've not yet encountered. There was recently a discussion of similar types of problems on a couple of the supercomputing lists, regarding surprisingly substandard performance from large dual-core opteron installations. The problem as I remember it boiled down to the Linux kernel handling memory/process management very badly on large dual core systems -- pathological NUMA behavior. However, this problem has apparently been fixed in Linux v2.6.12+, and using the more recent kernel on large dual core systems generated *massive* performance improvements on these systems for the individuals with this issue. Using the patched kernel, one gets the performance most people were expecting. The v2.6.12+ kernels are a bit new, but they contain a very important performance patch for systems like the one above. It would definitely be worth testing if possible. J. Andrew Rogers
I've been running 2x265's on FC4 64-bit (2.6.11-1+) and it's been running perfect. With NUMA enabled, it runs incrementally faster than NUMA off. Performance is definitely better than the 2x244s they replaced -- how much faster, I can't measure since I don't have the transaction volume to compare to previous benchmarks. I do see more consistently low response times though, can run apache also on the server for faster HTML generation times and top seems to show in general twice as much CPU power idle on average (25% per 265 core versus 50% per 244.) I haven't investigated the 2.6.12+ kernel updates yet -- I probably will do our development servers first to give it a test. > The problem as I remember it boiled down to the Linux kernel handling > memory/process management very badly on large dual core systems -- > pathological NUMA behavior. However, this problem has apparently been fixed > in Linux v2.6.12+, and using the more recent kernel on large dual core > systems generated *massive* performance improvements on these systems for > the individuals with this issue. Using the patched kernel, one gets the > performance most people were expecting.
Hi Jeff, which box are you running precisely and which OS/kernel? We need to run 32bit because we need failover to 32 bit XEON system (DL580). If this does not work out we probably need to switch to 64 bit (dump/restore) and run a nother 64bit failover box too. Regards, Dirk Jeffrey W. Baker wrote: > On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote: > >>Dirk, >> >> >>>does anybody have expierence with this machine (4x 875 dual core Opteron >>>CPUs)? > > > I'm using dual 275s without problems. > > >>Nope. I suspect that you may be the first person to report in on >>dual-cores. There may be special compile issues with dual-cores that >>we've not yet encountered. > > > Doubtful. However you could see improvements using recent Linux kernel > code. There have been some patches for optimizing scheduling and memory > allocations. > > However, if you are running this machine in 32-bit mode, why did you > bother paying $14,000 for your CPUs? You will get FAR better > performance in 64-bit mode. 64-bit mode will give you 30-50% better > performance on PostgreSQL loads, in my experience. Also, if I remember > correctly, the 32-bit x86 kernel doesn't understand Opteron NUMA > topology, so you may be seeing poor memory allocation decisions. > > -jwb > > >>>We run RHEL 3.0, 32bit and under high load it is a drag. We >>>mostly run memory demanding queries. Context switches are pretty much >>>around 20.000 on the average, no cs spikes when we run many processes in >>>parallel. Actually we only see two processes in running state! When >>>there are only a few processes running context switches go much higher. >>>At the moment we are much slower that with a 4way XEON box (DL580). >> >>Um, that was a bit incoherent. Are you seeing a CS storm or aren't you? >>
Anybody knows if RedHat is already supporting this patch on an enterprise version? Regards, Dirk J. Andrew Rogers wrote: > On 7/29/05 10:46 AM, "Josh Berkus" <josh@agliodbs.com> wrote: > >>>does anybody have expierence with this machine (4x 875 dual core Opteron >>>CPUs)? >> >>Nope. I suspect that you may be the first person to report in on >>dual-cores. There may be special compile issues with dual-cores that >>we've not yet encountered. > > > > There was recently a discussion of similar types of problems on a couple of > the supercomputing lists, regarding surprisingly substandard performance > from large dual-core opteron installations. > > The problem as I remember it boiled down to the Linux kernel handling > memory/process management very badly on large dual core systems -- > pathological NUMA behavior. However, this problem has apparently been fixed > in Linux v2.6.12+, and using the more recent kernel on large dual core > systems generated *massive* performance improvements on these systems for > the individuals with this issue. Using the patched kernel, one gets the > performance most people were expecting. > > The v2.6.12+ kernels are a bit new, but they contain a very important > performance patch for systems like the one above. It would definitely be > worth testing if possible. > > > J. Andrew Rogers > >
On 7/30/05 12:57 AM, "William Yu" <wyu@talisys.com> wrote: > I haven't investigated the 2.6.12+ kernel updates yet -- I probably will > do our development servers first to give it a test. The kernel updates make the NUMA code dual-core aware, which apparently makes a big difference in some cases but not in others. It makes some sense, since multi-processor multi-core machines will have two different types of non-locality instead of just one that need to be managed. Prior to the v2.6.12 patches, a dual-core dual-proc machine was viewed as a quad-proc machine. The closest thing to a supported v2.6.12 kernel that I know of is FC4, which is not really supported in the enterprise sense of course. J. Andrew Rogers
A 4xDC would be far more sensitive to poor NUMA code than 2xDC so I'm not surprised I don't see performance issues on our 2xDC w/ < 2.6.12. J. Andrew Rogers wrote: > On 7/30/05 12:57 AM, "William Yu" <wyu@talisys.com> wrote: > >>I haven't investigated the 2.6.12+ kernel updates yet -- I probably will >>do our development servers first to give it a test. > > > > The kernel updates make the NUMA code dual-core aware, which apparently > makes a big difference in some cases but not in others. It makes some > sense, since multi-processor multi-core machines will have two different > types of non-locality instead of just one that need to be managed. Prior to > the v2.6.12 patches, a dual-core dual-proc machine was viewed as a quad-proc > machine. > > The closest thing to a supported v2.6.12 kernel that I know of is FC4, which > is not really supported in the enterprise sense of course. > > > J. Andrew Rogers > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >