Strange behavior: pgbench and new Linux kernels
От | Greg Smith |
---|---|
Тема | Strange behavior: pgbench and new Linux kernels |
Дата | |
Msg-id | Pine.GSO.4.64.0804170230180.26917@westnet.com обсуждение исходный текст |
Ответы |
Re: Strange behavior: pgbench and new Linux kernels
(Matthew <matthew@flymine.org>)
Re: Strange behavior: pgbench and new Linux kernels ("Jeffrey Baker" <jwbaker@gmail.com>) |
Список | pgsql-performance |
This week I've finished building and installing OSes on some new hardware at home. I have a pretty standard validation routine I go through to make sure PostgreSQL performance is good on any new system I work with. Found a really strange behavior this time around that seems related to changes in Linux. Don't expect any help here, but if someone wanted to replicate my tests I'd be curious to see if that can be done. I tell the story mostly because I think it's an interesting tale in hardware and software validation paranoia, but there's a serious warning here as well for Linux PostgreSQL users. The motherboard is fairly new, and I couldn't get CentOS 5.1, which ships with kernel 2.6.18, to install with the default settings. I had to drop back to "legacy IDE" mode to install. But it was running everything in old-school IDE mode, no DMA or antyhing. "hdparm -Tt" showed a whopping 3MB/s on reads. I pulled down the latest (at the time--only a few hours and I'm already behind) Linux kernel, 2.6.24-4, and compiled that with the right modules included. Now I'm getting 70MB/s on simple reads. Everything looked fine from there until I got to the pgbench select-only tests running PG 8.2.7 (I do 8.2 then 8.3 separately because the checkpoint behavior on write-heavy stuff is so different and I want to see both results). Here's the regular thing I do to see how fast pgbench executes against things in memory (but bigger than the CPU's cache): -Set shared_buffers=256MB, start the server -dropdb pgbench (if it's already there) -createdb pgbench -pgbench -i -s 10 pgbench (makes about a 160MB database) -pgbench -S -c <2*cores> -t 10000 pgbench Since the database was just written out, the whole thing will still be in the shared_buffers cache, so this should execute really fast. This was an Intel quad-core system, I used -c 8, and that got me around 25K transactions/second. Curious to see how high I could push this, I started stepping up the number of clients. There's where the weird thing happened. Just by going to 12 clients instead of 8, I dropped to 8.5K TPS, about 1/3 of what I get from 8 clients. It was like that on every test run. When I use 10 clients, it's about 50/50; sometimes I get 25K, sometimes 8.5K. The only thing it seemed to correlate with is that vmstat on the 25K runs showed ~60K context switches/second, while the 8.5K ones had ~44K. Since I've never seen this before, I went back to my old benchmark system with a dual-core AMD processor. That started with CentOS 4 and kernel 2.6.9, but I happened to install kernel 2.6.24-3 on there to get better support for my Areca card (it goes bonkers regularly on x64 2.6.9). Never did a thorough perforance test of the new kernel though. Sure enough, the same behavior was there, except without a flip-flop point, just a sharp decline. Check this out: -bash-3.00$ pgbench -S -c 8 -t 10000 pgbench | grep excluding tps = 15787.684067 (excluding connections establishing) tps = 15551.963484 (excluding connections establishing) tps = 14904.218043 (excluding connections establishing) tps = 15330.519289 (excluding connections establishing) tps = 15606.683484 (excluding connections establishing) -bash-3.00$ pgbench -S -c 12 -t 10000 pgbench | grep excluding tps = 7593.572749 (excluding connections establishing) tps = 7870.053868 (excluding connections establishing) tps = 7714.047956 (excluding connections establishing) Results are consistant, right? Summarizing that and extending out, here's what the median TPS numbers look like with 3 tests at each client load: -c4: 16621 (increased -t to 20000 here) -c8: 15551 (all these with t=10000) -c9: 13269 -c10: 10832 -c11: 8993 -c12: 7714 -c16: 7311 -c32: 7141 (cut -t to 5000 here) Now, somewhere around here I start thinking about CPU cache coherency, I play with forcing tasks to particular CPUs, I try the deadline scheduler instead of the default CFQ, but nothing makes a difference. Wanna guess what did? An earlier kernel. These results are the same test as above, same hardware, only difference is I used the standard CentOS 4 2.6.9-67.0.4 kernel instead of 2.6.24-3. -c4: 18388 -c8: 15760 -c9: 15814 (one result of 12623) -c12: 14339 (one result of 11105) -c16: 14148 -c32: 13647 (one result of 10062) We get the usual bit of pgbench flakiness, but using the earlier kernel is faster in every case, only degrades slowly as clients increase, and is almost twice as fast here in a typical high-client load case. So in the case of this simple benchmark, I see an enormous performance regression from the newest Linux kernel compared to a much older one. I need to do some version bisection to nail it down for sure, but my guess is it's the change to the Completely Fair Scheduler in 2.6.23 that's to blame. The recent FreeBSD 7.0 PostgreSQL benchmarks at http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf showed an equally brutal performance drop going from 2.6.22 to 2.6.23 (see page 16) in around the same client load on a read-only test. My initial guess is that I'm getting nailed by a similar issue here. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
В списке pgsql-performance по дате отправления: