Обсуждение: dump time increase by 1h with new kernel
[I got no response on -general for a few days so I'm trying here] When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump duration increased by 20% from 5 hours to 6. My first attempt at resolution was to boot with elevator=deadline. However that's actually the default IO scheduler in both kernels. The two dmesg's are at: https://www.norchemlab.com/tmp/linux-2.6.24-22.45-server https://www.norchemlab.com/tmp/linux-2.6.27-14.41-server The database partition is: xfs / lvm / aic79xx / scsi. Booting back into the .24 kernel brings the pg_dump back down to 5 hours (for daily 20GB output compressed by pg_dump -Fc). Does anyone know what might be different which could cause such a drastic change? Thanks, Justin
Justin Pryzby <justinp@norchemlab.com> writes: > When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump > duration increased by 20% from 5 hours to 6. Wouldn't be the first time the kernel guys broke something :-( I think a complaint to your kernel supplier is in order. In a coincidence, the first item in the changelog for this week's Fedora kernel update is: * Fri Sep 25 2009 Chuck Ebbert <cebbert@redhat.com> 2.6.30.8-64 - Fix serious CFQ performance regression. This is surely not the exact same issue you are seeing, but it does illustrate that performance regressions in the kernel aren't unheard-of. regards, tom lane
On Fri, Oct 2, 2009 at 7:48 PM, Justin Pryzby <justinp@norchemlab.com> wrote: > [I got no response on -general for a few days so I'm trying here] > > When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump > duration increased by 20% from 5 hours to 6. My first attempt at > resolution was to boot with elevator=deadline. However that's > actually the default IO scheduler in both kernels. To add to what tom said, when you post this to something like kernel hackers, it would really help if you could test the two other kernels between these two to tell them exactly which one(s) causes the regression(s). That and how you compiled them or where they came from otherwise (fc, Ubuntu dev, yada)
On Fri, 2 Oct 2009, Justin Pryzby wrote: > When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump > duration increased by 20% from 5 hours to 6. Why 2.6.27 of all versions? It's one of the versions I skipped altogether as looking like a mess, after CFS broke everything in 2.6.23 I went right from 2.6.22 to 2.6.28 before I found things usable again. The first thing you're going to hear if you try to report this in kernel land is "is it still slow on 2.6.[last stable|head]?". If you can try both kernel versions, the other thing you really should do is collect data from "vmstat 1" during the pg_dump period. It would help narrow what area is slower. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi Everyone On Fri, Oct 02, 2009 at 12:58:12PM -0700, Justin Pryzby wrote: > When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump > duration increased by 20% from 5 hours to 6. My first attempt at On Sat, Oct 03, 2009 at 11:31:11PM -0600, Scott Marlowe wrote: > between these two to tell them exactly which one(s) causes the > regression(s). That and how you compiled them or where they came from These are both ubuntu kernels, and there's none in-between available from their repository for testing. I could compile myself, but it would have to include all the ubuntu patches (apparmor in particular).. On Thu, Oct 08, 2009 at 01:14:52AM -0400, Greg Smith wrote: > report this in kernel land is "is it still slow on 2.6.[last > stable|head]?". I could try *newer* kernels, but not from newer ubuntu releases. This machine is running ubuntu 8.04 with select packages from 8.10. However it's running postgres 8.2, which isn't included with either of those releases.. I tried dumping with 8.3 pg_dump, which had only minimal effect. > If you can try both kernel versions, the other thing you really > should do is collect data from "vmstat 1" during the pg_dump period. > It would help narrow what area is slower. I have sar running on the machine, and came up with this: 07 and 30 are days of the month, 7 is last night, 30 is from September. pg_dump starts around 9pm. tail -15 gets us to about 10pm. On sep 30, the machine was running 2.6.24 and the dump ran until after 2am. Since we rebooted, it runs .27 and the dump runs until after 4am. So the last column shows higher rate values for every metric on the 30th (under .24) except for intr. for a in user system iowait cswch tps rtps wtps intr; do for b in 07 30; do eval t$b='`sadf /var/log/sysstat/sa$b -- -A |grep-wi "$a" |tail -15 |awk "{sum+=\\$NF}END{print sum/NR}"`'; done; printf "%-6s %4.4s %4.4s %5.5s\n" $a $t30 $t07 `calc$t30/$t07`; done s30 o07 s30/o07 user 13.9 6.85 ~2.03 system 0.56 0.37 ~1.52 iowait 0.61 0.52 ~1.16 cswch 873. 672. ~1.29 intr 121. 396. ~0.30 tps 412. 346. ~1.19 rtps 147. 143. ~1.02 wtps 264. 202. ~1.30 Not sure if sar can provide other data included by vmstat: IO merged in/out, {,soft}irq ticks? Thanks, Justin
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote: > Hi Everyone Did your scheduler change between the kernel versions? > Not sure if sar can provide other data included by vmstat: IO merged > in/out, {,soft}irq ticks? > > Thanks, > Justin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: > On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote: > > Hi Everyone > Did your scheduler change between the kernel versions? No, it's deadline for both. Justin
Justin T Pryzby <justinp@norchemlab.com> wrote: > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: >> Did your scheduler change between the kernel versions? > No, it's deadline for both. How about write barriers? I had a kernel upgrade which turned them on for xfs, with unfortunate performance impacts. The xfs docs explicitly recommend disabling it if you have a battery backed cache in your RAID controller. -Kevin
On Thu, Oct 08, 2009 at 03:37:39PM -0500, Kevin Grittner wrote: > Justin T Pryzby <justinp@norchemlab.com> wrote: > > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: > >> Did your scheduler change between the kernel versions? > > No, it's deadline for both. > > How about write barriers? I had a kernel upgrade which turned them on Doesn't seem to be that either :( [ 55.120073] Filesystem "dm-0": Disabling barriers, trial barrier write failed crb2-db2 (254, 0) /dev/mapper/crb2-db2 on /media/database Justin
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote: > Hi Everyone Did your scheduler change between the kernel versions? > Not sure if sar can provide other data included by vmstat: IO merged > in/out, {,soft}irq ticks? > > Thanks, > Justin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander