Re: Reliability with RAID 10 SSD and Streaming Replication

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Reliability with RAID 10 SSD and Streaming Replication
Дата
Msg-id CAMkU=1xKCdJ0hdBqu66z2Fhf-WELoNZ+CZYKdD3wHCdO49yZ+A@mail.gmail.com
обсуждение исходный текст
Ответ на Reliability with RAID 10 SSD and Streaming Replication  (Cuong Hoang <climbingrose@gmail.com>)
Ответы Re: Reliability with RAID 10 SSD and Streaming Replication  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang <climbingrose@gmail.com> wrote:
Hi all,

Our application is heavy write and IO utilisation has been the problem for us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for the master server. I'm aware of write cache issue on SSDs in case of power loss. However, our hosting provider doesn't offer any other choices of SSD drives with supercapacitor. To minimise risk, we will also set up another RAID 10 SAS in streaming replication mode. For our application, a few seconds of data loss is acceptable. 

My question is, would corrupted data files on the primary server affect the streaming standby? In other word, is this setup acceptable in terms of minimising deficiency of SSDs?


That seems rather scary to me for two reasons.  

If the data center has a sudden power failure, why would it not take out both machines either simultaneously or in short succession?  Can you verify that the hosting provider does not have them on the same UPS (or even worse, as two virtual machines on the same physical host)?

The other issue is that you'd have to make sure the master does not restart after a crash.  If your init.d scripts just blindly start postgresql, then after a sudden OS restart it will automatically enter recovery and then open as usual, even though it might be silently corrupt.  At that point it will be generating WAL based on corrupt data (and incorrect query results), and propagating that to the standby.   So you have to be paranoid that if the master ever crashes, it is shot in the head and then reconstructed from the standby.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Reliability with RAID 10 SSD and Streaming Replication
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Reliability with RAID 10 SSD and Streaming Replication