Re: Minimizing Recovery Time (wal replication)

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Minimizing Recovery Time (wal replication)
Дата
Msg-id alpine.GSO.2.01.0904091916440.6276@westnet.com
обсуждение исходный текст
Ответ на Minimizing Recovery Time (wal replication)  (Bryan Murphy <bmurphy1976@gmail.com>)
Ответы Re: Minimizing Recovery Time (wal replication)  (Bryan Murphy <bmurphy1976@gmail.com>)
Список pgsql-general
On Thu, 9 Apr 2009, Bryan Murphy wrote:

> (1) hot spare applies 70 to 75 wal files (~1.1g) in 2 to 3 min period

Yeah, if you ever let this many files queue up you're facing a long
recovery time.  You really need to get into a position where you're
applying WAL files regularly enough that you don't ever fall this far
behind.

> (2) hot spare pauses for 15 to 20 minutes, during this period pdflush
> consumes 99% IO (iotop).  Dirty (from /proc/meminfo) spikes to ~760mb,
> remains at that level for the first 10 minutes, and then slowly ticks
> down to 0 for the second 10 minutes.

What does vmstat say about the bi/bo during this time period?  It sounds
like the volume of random I/O produced by recovery is just backing up as
expected.  Some quick math:

15GB RAM * 5% dirty_ratio = 750MB ; there's where your measured 760MB
bottleneck is coming from.

750MB / 10 minutes = 1.25MB/s ; that's in the normal range for random
writes with a single disk

Therefore my bet is that "vmstat 1" will show bo~=1250 the whole time
you're waiting there, with matching figures from the iostat to the
database disk during that period.

Basically your options here are:

1) Decrease the maximum possible segment backlog so you can never get this
    far behind
2) Increase the rate at which random I/O can be flushed to disk by either
    a) Improving things with a [better] battery-backed controller disk cache
    b) Stripe across more disks

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Evidently no support for the mmddyyyy date format
Следующее
От: Bryan Murphy
Дата:
Сообщение: Re: Minimizing Recovery Time (wal replication)