Обсуждение: linux deadline i/o elevator tuning

От:
Mark Wong
Дата:

Hi all,

Has anyone experimented with the Linux deadline parameters and have
some experiences to share?

Regards,
Mark

От:
"Kevin Grittner"
Дата:

Mark Wong <> wrote:
> Has anyone experimented with the Linux deadline parameters and
> have some experiences to share?

We've always used elevator=deadline because of posts like this:

http://archives.postgresql.org/pgsql-performance/2008-04/msg00148.php

I haven't benchmarked it, but when one of our new machines seemed a
little sluggish, I found this hadn't been set.  Setting this and
rebooting Linux got us back to our normal level of performance.

-Kevin

От:
Grzegorz Jaśkiewicz
Дата:

acording to kernel folks, anticipatory scheduler is even better for dbs.
Oh well, it probably means everyone has to test it on their own at the
end of day.

От:
Matthew Wakeling
Дата:

On Thu, 9 Apr 2009, Grzegorz Jaśkiewicz wrote:
> acording to kernel folks, anticipatory scheduler is even better for dbs.
> Oh well, it probably means everyone has to test it on their own at the
> end of day.

But the anticipatory scheduler basically makes the huge assumption that
you have one single disc in the system that takes a long time to seek from
one place to another. This assumption fails on both RAID arrays and SSDs,
so I'd be interested to see some numbers to back that one up.

Matthew

--
 import oz.wizards.Magic;
   if (Magic.guessRight())...           -- Computer Science Lecturer

От:
Grzegorz Jaśkiewicz
Дата:

On Thu, Apr 9, 2009 at 3:32 PM, Matthew Wakeling <> wrote:
> On Thu, 9 Apr 2009, Grzegorz Jaśkiewicz wrote:
>>
>> acording to kernel folks, anticipatory scheduler is even better for dbs.
>> Oh well, it probably means everyone has to test it on their own at the
>> end of day.
>
> But the anticipatory scheduler basically makes the huge assumption that you
> have one single disc in the system that takes a long time to seek from one
> place to another. This assumption fails on both RAID arrays and SSDs, so I'd
> be interested to see some numbers to back that one up.

(btw, CFQ is the anticipatory scheduler).

no they not. They only assume that application reads blocks in
synchronous fashion, and that data read in block N will determine
where the N+1 block is going to be.
So to avoid possible starvation problem, it will wait for short amount
of time - in hope that app will want to read possibly next block on
disc, and putting that request at the end of queue could potentially
starve it. (that reason alone is why 2.6 linux feels so much more
responsive).


--
GJ

От:
"Kevin Grittner"
Дата:

Matthew Wakeling <> wrote:
> On Thu, 9 Apr 2009, Grzegorz Jaœkiewicz wrote:
>> acording to kernel folks, anticipatory scheduler is even better for
>> dbs.  Oh well, it probably means everyone has to test it on their
>> own at the end of day.
>
> But the anticipatory scheduler basically makes the huge assumption
> that you have one single disc in the system that takes a long time
> to seek from one place to another. This assumption fails on both
> RAID arrays and SSDs, so I'd be interested to see some numbers to
> back that one up.

Yeah, we're running on servers with at least 4 effective spindles,
with some servers having several dozen effective spindles.  Assuming
one is not very effective.  The setting which seemed sluggish for our
environment was the anticipatory scheduler, so the kernel guys
apparently aren't thinking about the type of load we have on the
hardware we have.

-Kevin

От:
"Kevin Grittner"
Дата:

Grzegorz Jaœkiewicz <> wrote:
> (btw, CFQ is the anticipatory scheduler).

These guys have it wrong?:

http://www.wlug.org.nz/LinuxIoScheduler

-Kevin

От:
Matthew Wakeling
Дата:

On Thu, 9 Apr 2009, Grzegorz Jaśkiewicz wrote:
> (btw, CFQ is the anticipatory scheduler).

No, CFQ and anticipatory are two completely different schedulers. You can
choose between them.

>> But the anticipatory scheduler basically makes the huge assumption that you
>> have one single disc in the system that takes a long time to seek from one
>> place to another. This assumption fails on both RAID arrays and SSDs, so I'd
>> be interested to see some numbers to back that one up.
>
> So to avoid possible starvation problem, it will wait for short amount
> of time - in hope that app will want to read possibly next block on
> disc, and putting that request at the end of queue could potentially
> starve it. (that reason alone is why 2.6 linux feels so much more
> responsive).

This only actually helps if the assumptions I stated above are true.
Anticipatory is an opportunistic scheduler - it actually witholds requests
from the disc as you describe, in the hope that a block will be fetched
soon right next to the last one. However, if you have more than one disc,
then witholding requests means that you lose the ability to perform more
than one request at once. Also, it assumes that it will take longer to
seek to the next real request that it will for the program to issue its
next request, which is broken on SSDs. Anticipatory attempts to increase
performance by being unfair - it is essentially the opposite of CFQ.

Matthew

--
 Now you see why I said that the first seven minutes of this section will have
 you looking for the nearest brick wall to beat your head against. This is
 why I do it at the end of the lecture - so I can run.
                                        -- Computer Science lecturer

От:
Grzegorz Jaśkiewicz
Дата:

On Thu, Apr 9, 2009 at 3:42 PM, Kevin Grittner
<> wrote:
> Grzegorz Jaœkiewicz <> wrote:
>> (btw, CFQ is the anticipatory scheduler).
>
> These guys have it wrong?:
>
> http://www.wlug.org.nz/LinuxIoScheduler


sorry, I meant it replaced it :) (is default now).


--
GJ

От:
Arjen van der Meijden
Дата:

On 9-4-2009 16:09 Kevin Grittner wrote:
> I haven't benchmarked it, but when one of our new machines seemed a
> little sluggish, I found this hadn't been set.  Setting this and
> rebooting Linux got us back to our normal level of performance.

Why would you reboot after changing the elevator? For 2.6-kernels, it
can be adjusted on-the-fly for each device separately (echo 'deadline' >
/sys/block/sda/queue/scheduler).

I saw a nice reduction in load and slowness too after adjusting the cfq
to deadline for a machine that was at its maximum I/O-capacity on a
raid-array.
Apart from deadline, 'noop' should also be interesting for RAID and
SSD-owners, as it basically just forwards the I/O-request to the device
and doesn't do much (if any?) scheduling.

Best regards,

Arjen

От:
Mark Wong
Дата:

On Thu, Apr 9, 2009 at 7:00 AM, Mark Wong <> wrote:
> Hi all,
>
> Has anyone experimented with the Linux deadline parameters and have some
> experiences to share?

Hi all,

Thanks for all the responses, but I didn't mean selecting deadline as
much as its parameters such as:

antic_expire
read_batch_expire
read_expire
write_batch_expire
write_expire

Regards,
Mark

От:
"Kevin Grittner"
Дата:

Arjen van der Meijden <> wrote:
> On 9-4-2009 16:09 Kevin Grittner wrote:
>> I haven't benchmarked it, but when one of our new machines seemed a
>> little sluggish, I found this hadn't been set.  Setting this and
>> rebooting Linux got us back to our normal level of performance.
>
> Why would you reboot after changing the elevator? For 2.6-kernels,
> it can be adjusted on-the-fly for each device separately
> (echo 'deadline' > /sys/block/sda/queue/scheduler).

On the OS where this happened, not yet an option:

kgrittn@DBUTL-PG:~> cat /proc/version
Linux version 2.6.5-7.315-bigsmp (geeko@buildhost) (gcc version 3.3.3
(SuSE Linux)) #1 SMP Wed Nov 26 13:03:18 UTC 2008
kgrittn@DBUTL-PG:~> ls -l /sys/block/sda/queue/
total 0
drwxr-xr-x  2 root root    0 2009-03-06 15:27 iosched
-rw-r--r--  1 root root 4096 2009-03-06 15:27 nr_requests
-rw-r--r--  1 root root 4096 2009-03-06 15:27 read_ahead_kb

On machines built more recently than the above, I do see a scheduler
entry in the /sys/block/sda/queue/ directory.  I didn't know about
this enhancement, but I'll keep it in mind.  Thanks for the tip!

> Apart from deadline, 'noop' should also be interesting for RAID and
> SSD-owners, as it basically just forwards the I/O-request to the
> device and doesn't do much (if any?) scheduling.

Yeah, I've been tempted to give that a try, given that we have BBU
cache with write-back.  Without a performance problem using elevator,
though, it hasn't seemed worth the time.

-Kevin

От:
Mark Wong
Дата:

On Thu, Apr 9, 2009 at 7:53 AM, Mark Wong <> wrote:
> On Thu, Apr 9, 2009 at 7:00 AM, Mark Wong <> wrote:
>> Hi all,
>>
>> Has anyone experimented with the Linux deadline parameters and have some
>> experiences to share?
>
> Hi all,
>
> Thanks for all the responses, but I didn't mean selecting deadline as
> much as its parameters such as:
>
> antic_expire
> read_batch_expire
> read_expire
> write_batch_expire
> write_expire

And I dumped the parameters for the anticipatory scheduler. :p  Here
are the deadline parameters:

fifo_batch
front_merges
read_expire
write_expire
writes_starved

Regards,
Mark

От:
Scott Carey
Дата:

The anticipatory scheduler gets absolutely atrocious performance for server
workloads on even moderate server hardware.  It is applicable only to single
spindle setups on desktop-like worlkoads.

Seriously, never use this for a database.  It _literally_ will limit you to
100 iops maximum random access iops by waiting 10ms for 'nearby' LBA
requests.


For Postgres, deadline, cfq, and noop are the main options.

Noop is good for ssds and a few high performance hardware caching RAID cards
(and only a few of the good ones), and poor otherwise.

Cfq tends to favor random access over sequential access in mixed load
environments and does not tend to favor reads over writes.  Because it
batches its elevator algorithm by requesting process, it becomes less
efficient with lots of spindles where multiple processes have requests from
nearby disk regions.

Deadline tends to favor reads over writes and slightly favor sequential
access to random access (and gets more MB/sec on average as a result in
mixed loads).  It tends to work well for large stand-alone servers and not
as well for desktop/workstation type loads.

I have done a little tuning of the parameters of cfq and deadline, and never
noticed much difference.  I suppose you could shift the deadline biases to
read or write with these.


On 4/9/09 7:27 AM, "Grzegorz Jaśkiewicz" <> wrote:

> acording to kernel folks, anticipatory scheduler is even better for dbs.
> Oh well, it probably means everyone has to test it on their own at the
> end of day.
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


От:
"Albe Laurenz *EXTERN*"
Дата:

Grzegorz Jaskiewicz wrote:
> acording to kernel folks, anticipatory scheduler is even better for dbs.
> Oh well, it probably means everyone has to test it on their own at the
> end of day.

In my test case, noop and deadline performed well, deadline being a little
better than noop.

Both anticipatory and CFQ sucked big time.

Yours,
Laurenz Albe

От:
Jeff
Дата:

On Apr 10, 2009, at 2:47 AM, Albe Laurenz *EXTERN* wrote:

> Grzegorz Jaskiewicz wrote:
>> acording to kernel folks, anticipatory scheduler is even better for
>> dbs.
>> Oh well, it probably means everyone has to test it on their own at
>> the
>> end of day.
>
> In my test case, noop and deadline performed well, deadline being a
> little
> better than noop.
>
> Both anticipatory and CFQ sucked big time.
>

This is my experience as well, I posted about playing with the
scheduler a while ago on -performance, but I can't seem to find it.

If you have a halfway OK raid controller, CFQ is useless. You can fire
up something such as pgbench or pgiosim, fire up an iostat and then
watch your iops jump high when you flip to noop or deadline and
plummet on cfq.  Try it. it's neat!

--
Jeff Trout <>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




От:
"Kevin Grittner"
Дата:

Jeff <> wrote:

> If you have a halfway OK raid controller, CFQ is useless. You can
fire
> up something such as pgbench or pgiosim, fire up an iostat and then

> watch your iops jump high when you flip to noop or deadline and
> plummet on cfq.

An interesting data point, but not, by itself, conclusive.  One of the
nice things about a good scheduler is that it allows multiple writes
to the OS to be combined into a single write to the controller cache.
I think that having a large OS cache and the deadline elevator allowed
us to use what some considered extremely aggressive background writer
settings without *any* discernible increase in OS output to the disk.
The significant measure is throughput from the application point of
view; if you see that drop as cfq causes the disk I/O to drop, *then*
you've proven your point.

Of course, I'm betting that's what you do see....

-Kevin