Re: long running commits

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: long running commits
Дата
Msg-id 4D6E4078020000250003B2BF@gw.wicourts.gov
обсуждение исходный текст
Ответ на long running commits  ("Vaughn, Adam (IMS)" <VaughnA@imsweb.com>)
Ответы Re: long running commits  (Robert Treat <rob@xzilla.net>)
Список pgsql-admin
> "Vaughn, Adam (IMS)" <VaughnA@imsweb.com> wrote:
>
> I made all of the changes you mentioned except for the
> shared_buffers (which will require a downtime I have set for
> tonight). I do have another question though, why did you pick 512
> MB for the new setting of shared_buffers? Everything I've ever
> read says that 25% of available RAM is a conservative value for
> shared_buffers.

Well, in general 25% may be the best for overall *throughput*, but
it can often lead to latency spikes, so it depends on what you care
about.  The curve of throughput against shared_buffers has gotten
pretty close to horizontal by around 1GB in a lot of my tests.  It
doesn't, after all, reduce the size of your cache; it affects how
much of the cache access requires an OS call versus staying within
PostgreSQL functions.

> Also, we had another one of these instances earlier today. During
> the 23 minute commit a single CPU was at 98% and it looked like
> all writes were backed up waiting for the commit to finalize.

What other symptoms did you notice?  Did the context switch rate
jump?  If I remember correctly you are going over the network to
your persistent storage -- did you see anything anomalous on the
network around that time?

Greg Smith has really turned the tuning of this into a science, with
incremental adjustments and some specific monitoring techniques.  I
strongly recommend you look for that on the list archives, in
recordings or slides from his presentations on the topice, or his
recent book.

> During the time our writing never got above 25 MB/s (far less than
> we can handle).

At that rate it should have been able to write your 6GB shared
buffers in about 4.1 minutes.  What did the disk writes *average*
during the incident?  6GB in 23 minutes would be 4.5MB per second.
If your average falls anywhere near that number, you have evidence
that the buffers were mostly dirty for some reason and the
checkpoint glutted on writing them.

> Is it possible that we're missing an index somewhere or there's
> something else going on?

A missing index would probably show high disk reads from sequential
passing large tables.  A context switch storm or network problems
seem like the two most likely causes to me, but it would pay to
monitor anything you can to help pin down the bottleneck next time
this happens.

-Kevin

В списке pgsql-admin по дате отправления:

Предыдущее
От: Scott Mead
Дата:
Сообщение: Re: HA warm standby issue with trigger file.
Следующее
От: daveg
Дата:
Сообщение: Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum