Re: Disk latency goes up during certaing pediods

Поиск

Список

Период

Сортировка

От	German Becker
Тема	Re: Disk latency goes up during certaing pediods
Дата	31 июля 2013 г. 19:25:48
Msg-id	CALyjCLuDgczTxoqiXKSThqMwwAoPM_4rPE4+p8zr_g=k11crtA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Disk latency goes up during certaing pediods (bricklen <bricklen@gmail.com>)
Ответы	Re: Disk latency goes up during certaing pediods
Список	pgsql-admin

Дерево обсуждения

To all whom might be interested, I have an update on this.

I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.

Here is the output of iostat for the 8.3 server:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 230.00 22.00 106.00 308.00 2692.00 23.44 0.33 2.58 2.34 30.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 166.00 9.50 65.50 160.00 1708.00 24.91 0.29 3.07 2.47 18.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 236.50 7.50 118.50 120.00 2984.00 24.63 0.39 3.61 1.55 19.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 310.50 7.50 168.50 112.00 3832.00 22.41 0.44 2.50 0.94 16.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 321.50 22.50 184.00 320.00 4048.00 21.15 0.88 4.24 1.74 36.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 266.00 4.50 155.00 64.00 3356.00 21.44 0.29 1.72 0.88 14.00

Here is the output for 9.1:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.00 0.00 85.00 0.00 352.00 8.28 0.29 3.46 0.00 3.46 3.46 29.40

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.50 0.00 97.00 0.00 450.00 9.28 0.39 4.04 0.00 4.04 3.79 36.80

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.00 0.00 87.00 0.00 376.00 8.64 0.29 3.29 0.00 3.29 3.29 28.60

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.50 0.00 92.00 0.00 386.00 8.39 0.32 3.43 0.00 3.43 3.28 30.20

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.00 0.00 89.50 0.00 388.00 8.67 0.33 3.66 0.00 3.66 3.66 32.80

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sdb 0.00 0.00 0.00 104.50 0.00 432.00 8.27 0.38 3.62 0.00 3.62 3.62 37.80

(the columns are not the same, probably because of different Ubuntu versions)

What is notable is the following:

8.3 shows much less disk utilization even though the same disk is used for, for example the plain log

I think the main difference is average request size and w/s. In 9.1 the request sizes seem to be roughly half of those of 8.3, so the w/s are rougly double.

I think this might be because of the differnt setting in wal_level; in 8.3 we were using archive, (hot_standby was not available yet) and in 9.1 we are using hot_standby. Here is what the documentations says about this:

"It is thought that there is little measurable difference in performance between using hot_standby and archive levels, so feedback is welcome if any production impacts are noticeable."

Although, of course, this might be just the difference betwen Postgres releases.

Any thoughts or feedback is appreciated.

Cheers,

Germán Becker

On Tue, Jul 30, 2013 at 1:02 PM, bricklen <bricklen@gmail.com> wrote:

On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german.becker@gmail.com> wrote:
256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.
Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?

checkpoint_completion_target is to help with "checkpoint smoothing", to reduce the spike in disk I/O when shared_buffers are written out. Depesz has a good article about that: http://www.depesz.com/2010/11/03/checkpoint_completion_target/

Do your graphs show any correlation between number of WAL segments getting recycled, and disk I/O spikes? Are you logging checkpoints? If so, you could use the checkpoint times to compare against your I/O graphs. I am by no means an expert here, I'm just throwing out ideas (which might already have been suggested).

В списке pgsql-admin по дате отправления:

Предыдущее

От: Jürgen Fuchsberger
Дата: 31 июля 2013 г., 16:17:00
Сообщение: Problem: pg_ctl does not not stop postgres properly even with immediate option

Следующее

От: bricklen
Дата: 31 июля 2013 г., 19:34:33
Сообщение: Re: Disk latency goes up during certaing pediods