Fwd: Problems with pg_locks explosion

Поиск
Список
Период
Сортировка
От Armand du Plessis
Тема Fwd: Problems with pg_locks explosion
Дата
Msg-id CANf99sWN4LP_xF012miq=MghCYtbm369bup9tVT1+BiTOR9sUA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Problems with pg_locks explosion  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Ответы Re: Fwd: Problems with pg_locks explosion  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Список pgsql-performance
Thanks Mark, 

I had a look at the iostat output (on a 5s interval) and pasted it below. The utilization and waits seems low. Included a sample below, #1 taken during normal operation and then when the locks happen it basically drops to 0 across the board. My (mis)understanding of the IOPS was that it would be 1000 IOPS per/volume and when in RAID0 should give me quite a bit higher throughput than in a single EBS volume setup. (My naive envelop calculation was #volumes * PIOPS = Effective IOPS :/)


Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdk              0.00     0.00  141.60    0.00  5084.80     0.00    35.91     0.43    3.06   0.51   7.28
xvdj              0.00     0.00  140.40    0.40  4614.40    24.00    32.94     0.49    3.45   0.52   7.28
xvdi              0.00     0.00  123.00    2.00  4019.20   163.20    33.46     0.33    2.63   0.68   8.48
xvdh              0.00     0.00  139.80    0.80  4787.20    67.20    34.53     0.52    3.73   0.55   7.68
xvdg              0.00     0.00  143.80    0.20  4804.80    16.00    33.48     0.86    6.03   0.72  10.40
xvdf              0.00     0.00  146.40    0.00  4758.40     0.00    32.50     0.55    3.76   0.55   8.00
md127             0.00     0.00  831.20    3.40 27867.20   270.40    33.71     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00  100.00    0.00    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdk              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdj              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdi              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

It only spikes to 100% util when the server restarts. What bugs me though is Cloud Metrics show 100% Throughput on all the volumes despite the output above. 

I'm looking into  vm.dirty_background_ratio, vm.dirty_ratio sysctls. Is there any guidance or links available that would be useful as a starting point? 

Thanks again for the help, I really appreciate it. 

Regards,

Armand

On Tue, Apr 2, 2013 at 2:11 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
In addition to tuning the various Postgres config knobs you may need to look at how your AWS server is set up. If your load is causing an IO stall then *symptoms* of this will be lots of locks...

You have quite a lot of memory (60G), so look at tuning the vm.dirty_background_ratio, vm.dirty_ratio sysctls to avoid trying to *suddenly* write out many gigs of dirty buffers.

Your provisioned volumes are much better than the default AWS ones, but are still not hugely fast (i.e 1000 IOPS is about 8 MB/s worth of Postgres 8k buffers). So you may need to look at adding more volumes into the array, or adding some separate ones and putting pg_xlog directory on 'em.

However before making changes I would recommend using iostat or sar to monitor how volumes are handling the load (I usually choose a 1 sec granularity and look for 100% util and high - server hundred ms - awaits). Also iotop could be enlightening.

Regards

Mark

В списке pgsql-performance по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Postgres upgrade, security release, where?
Следующее
От: Ian Lawrence Barwick
Дата:
Сообщение: Re: Postgres upgrade, security release, where?