Обсуждение: dump time increase by 1h with new kernel

Поиск
Список
Период
Сортировка

dump time increase by 1h with new kernel

От
Justin Pryzby
Дата:
[I got no response on -general for a few days so I'm trying here]

When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
duration increased by 20% from 5 hours to 6.  My first attempt at
resolution was to boot with elevator=deadline.  However that's
actually the default IO scheduler in both kernels.

The two dmesg's are at:
https://www.norchemlab.com/tmp/linux-2.6.24-22.45-server
https://www.norchemlab.com/tmp/linux-2.6.27-14.41-server

The database partition is: xfs / lvm / aic79xx / scsi.

Booting back into the .24 kernel brings the pg_dump back down to 5
hours (for daily 20GB output compressed by pg_dump -Fc).

Does anyone know what might be different which could cause such a
drastic change?

Thanks,
Justin

Re: dump time increase by 1h with new kernel

От
Tom Lane
Дата:
Justin Pryzby <justinp@norchemlab.com> writes:
> When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
> duration increased by 20% from 5 hours to 6.

Wouldn't be the first time the kernel guys broke something :-(
I think a complaint to your kernel supplier is in order.

In a coincidence, the first item in the changelog for this week's
Fedora kernel update is:

* Fri Sep 25 2009 Chuck Ebbert <cebbert@redhat.com> 2.6.30.8-64
- Fix serious CFQ performance regression.

This is surely not the exact same issue you are seeing, but it does
illustrate that performance regressions in the kernel aren't
unheard-of.

            regards, tom lane

Re: dump time increase by 1h with new kernel

От
Scott Marlowe
Дата:
On Fri, Oct 2, 2009 at 7:48 PM, Justin Pryzby <justinp@norchemlab.com> wrote:
> [I got no response on -general for a few days so I'm trying here]
>
> When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
> duration increased by 20% from 5 hours to 6.  My first attempt at
> resolution was to boot with elevator=deadline.  However that's
> actually the default IO scheduler in both kernels.

To add to what tom said, when you post this to something like kernel
hackers, it would really help if you could test the two other kernels
between these two to tell them exactly which one(s) causes the
regression(s).  That and how you compiled them or where they came from
otherwise (fc, Ubuntu dev, yada)

Re: dump time increase by 1h with new kernel

От
Greg Smith
Дата:
On Fri, 2 Oct 2009, Justin Pryzby wrote:

> When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
> duration increased by 20% from 5 hours to 6.

Why 2.6.27 of all versions?  It's one of the versions I skipped altogether
as looking like a mess, after CFS broke everything in 2.6.23 I went right
from 2.6.22 to 2.6.28 before I found things usable again.  The first thing
you're going to hear if you try to report this in kernel land is "is it
still slow on 2.6.[last stable|head]?".

If you can try both kernel versions, the other thing you really should do
is collect data from "vmstat 1" during the pg_dump period.  It would help
narrow what area is slower.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: dump time increase by 1h with new kernel

От
Justin T Pryzby
Дата:
Hi Everyone

On Fri, Oct 02, 2009 at 12:58:12PM -0700, Justin Pryzby wrote:
> When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
> duration increased by 20% from 5 hours to 6.  My first attempt at

On Sat, Oct 03, 2009 at 11:31:11PM -0600, Scott Marlowe wrote:
> between these two to tell them exactly which one(s) causes the
> regression(s).  That and how you compiled them or where they came from
These are both ubuntu kernels, and there's none in-between available
from their repository for testing.  I could compile myself, but it
would have to include all the ubuntu patches (apparmor in
particular)..

On Thu, Oct 08, 2009 at 01:14:52AM -0400, Greg Smith wrote:
> report this in kernel land is "is it still slow on 2.6.[last
> stable|head]?".
I could try *newer* kernels, but not from newer ubuntu releases.  This
machine is running ubuntu 8.04 with select packages from 8.10.
However it's running postgres 8.2, which isn't included with either of
those releases..  I tried dumping with 8.3 pg_dump, which had only
minimal effect.

> If you can try both kernel versions, the other thing you really
> should do is collect data from "vmstat 1" during the pg_dump period.
> It would help narrow what area is slower.
I have sar running on the machine, and came up with this:

07 and 30 are days of the month, 7 is last night, 30 is from
September.  pg_dump starts around 9pm.  tail -15 gets us to about
10pm.  On sep 30, the machine was running 2.6.24 and the dump ran
until after 2am.  Since we rebooted, it runs .27 and the dump runs
until after 4am.  So the last column shows higher rate values for
every metric on the 30th (under .24) except for intr.

for a in user system iowait cswch tps rtps wtps intr; do for b in 07 30; do eval t$b='`sadf /var/log/sysstat/sa$b -- -A
|grep-wi "$a" |tail -15 |awk "{sum+=\\$NF}END{print sum/NR}"`'; done; printf "%-6s %4.4s %4.4s %5.5s\n" $a $t30 $t07
`calc$t30/$t07`; done 
       s30  o07  s30/o07
user   13.9 6.85 ~2.03
system 0.56 0.37 ~1.52
iowait 0.61 0.52 ~1.16
cswch  873. 672. ~1.29
intr   121. 396. ~0.30
tps    412. 346. ~1.19
rtps   147. 143. ~1.02
wtps   264. 202. ~1.30

Not sure if sar can provide other data included by vmstat: IO merged
in/out, {,soft}irq ticks?

Thanks,
Justin

Re: dump time increase by 1h with new kernel

От
"Joshua D. Drake"
Дата:
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote:
> Hi Everyone


Did your scheduler change between the kernel versions?

> Not sure if sar can provide other data included by vmstat: IO merged
> in/out, {,soft}irq ticks?
>
> Thanks,
> Justin
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: dump time increase by 1h with new kernel

От
Justin T Pryzby
Дата:
On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
> On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote:
> > Hi Everyone
> Did your scheduler change between the kernel versions?
No, it's deadline for both.

Justin

Re: dump time increase by 1h with new kernel

От
"Kevin Grittner"
Дата:
Justin T Pryzby <justinp@norchemlab.com> wrote:
> On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
>> Did your scheduler change between the kernel versions?
> No, it's deadline for both.

How about write barriers?  I had a kernel upgrade which turned them on
for xfs, with unfortunate performance impacts.  The xfs docs
explicitly recommend disabling it if you have a battery backed cache
in your RAID controller.

-Kevin

Re: dump time increase by 1h with new kernel

От
Justin T Pryzby
Дата:
On Thu, Oct 08, 2009 at 03:37:39PM -0500, Kevin Grittner wrote:
> Justin T Pryzby <justinp@norchemlab.com> wrote:
> > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
> >> Did your scheduler change between the kernel versions?
> > No, it's deadline for both.
>
> How about write barriers?  I had a kernel upgrade which turned them on
Doesn't seem to be that either :(

[   55.120073] Filesystem "dm-0": Disabling barriers, trial barrier write failed
crb2-db2        (254, 0)
/dev/mapper/crb2-db2 on /media/database

Justin

Re: dump time increase by 1h with new kernel

От
"Joshua D. Drake"
Дата:
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote:
> Hi Everyone


Did your scheduler change between the kernel versions?

> Not sure if sar can provide other data included by vmstat: IO merged
> in/out, {,soft}irq ticks?
>
> Thanks,
> Justin
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander