Обсуждение: Risk of data corruption/loss?

Поиск
Список
Период
Сортировка

Risk of data corruption/loss?

От
Niels Kristian Schjødt
Дата:
I'm considering the following setup:

- Master server with battery back raid controller with 4 SAS disks in a RAID 0 - so NO mirroring here, due to max
performancerequirements. 
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The setup will be done with
synchronous_commit=offand synchronous_standby_names = '' 

So as you might have noticed, clearly there is a risk of data loss, which is acceptable, since our data is not very
crucial.However, I have quite a hard time figuring out, if there is a risk of total data corruption across both server
inthis setup? E.g. something goes wrong on the master and the wal files gets corrupt. Will the slave then apply the wal
filesINCLUDING the corruption (e.g. an unfinished transaction etc.), or will it automatically stop restoring at the
pointjust BEFORE the corruption, so my only loss is data AFTER the corruption? 

Hope my question is clear



Re: Risk of data corruption/loss?

От
Jeff Janes
Дата:
On Wed, Mar 13, 2013 at 8:24 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:
I'm considering the following setup:

- Master server with battery back raid controller with 4 SAS disks in a RAID 0 - so NO mirroring here, due to max performance requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The setup will be done with synchronous_commit=off and synchronous_standby_names = ''

Out of curiosity, in the presence of BB controller, is synchronous_commit=off getting you additional performance?


So as you might have noticed, clearly there is a risk of data loss, which is acceptable, since our data is not very crucial. However, I have quite a hard time figuring out, if there is a risk of total data corruption across both server in this setup? E.g. something goes wrong on the master and the wal files gets corrupt. Will the slave then apply the wal files INCLUDING the corruption (e.g. an unfinished transaction etc.), or will it automatically stop restoring at the point just BEFORE the corruption, so my only loss is data AFTER the corruption?

It depends on where the corruption happens.  WAL is checksummed, so the slave will detect a mismatch and stop applying records.  However, if the corruption happens in RAM before the checksum is taken, the checksum will match and it will attempt to apply the records.

Cheers,

Jeff

Re: Risk of data corruption/loss?

От
Niels Kristian Schjødt
Дата:

Den 13/03/2013 kl. 18.13 skrev Jeff Janes <jeff.janes@gmail.com>:

On Wed, Mar 13, 2013 at 8:24 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:
I'm considering the following setup:

- Master server with battery back raid controller with 4 SAS disks in a RAID 0 - so NO mirroring here, due to max performance requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The setup will be done with synchronous_commit=off and synchronous_standby_names = ''

Out of curiosity, in the presence of BB controller, is synchronous_commit=off getting you additional performance?

Time will show :-)


So as you might have noticed, clearly there is a risk of data loss, which is acceptable, since our data is not very crucial. However, I have quite a hard time figuring out, if there is a risk of total data corruption across both server in this setup? E.g. something goes wrong on the master and the wal files gets corrupt. Will the slave then apply the wal files INCLUDING the corruption (e.g. an unfinished transaction etc.), or will it automatically stop restoring at the point just BEFORE the corruption, so my only loss is data AFTER the corruption?

It depends on where the corruption happens.  WAL is checksummed, so the slave will detect a mismatch and stop applying records.  However, if the corruption happens in RAM before the checksum is taken, the checksum will match and it will attempt to apply the records.

Cheers,

Jeff

Re: Risk of data corruption/loss?

От
Joshua Berkus
Дата:
Neils,

> - Master server with battery back raid controller with 4 SAS disks in
> a RAID 0 - so NO mirroring here, due to max performance
> requirements.
> - Slave server setup with streaming replication on 4 HDD's in RAID
> 10. The setup will be done with synchronous_commit=off and
> synchronous_standby_names = ''

I'd be concerned that, assuming you're making the master high-risk for performance reasons, that the standby would not
keepup. 

> So as you might have noticed, clearly there is a risk of data loss,
> which is acceptable, since our data is not very crucial. However, I
> have quite a hard time figuring out, if there is a risk of total
> data corruption across both server in this setup? E.g. something
> goes wrong on the master and the wal files gets corrupt. Will the
> slave then apply the wal files INCLUDING the corruption (e.g. an
> unfinished transaction etc.), or will it automatically stop
> restoring at the point just BEFORE the corruption, so my only loss
> is data AFTER the corruption?

Well, in general RAID 1 really just protects you from HDD failure, not more subtle types of corruption which occur
onboardan HDD.  So from that respect, you haven't increased your chances of data corruption at all; if the master loses
adisk, it should just stop operating; a simple check that all WALs are 16MB on the standby would do the rest. I'd be
moreconcerned that you're likely to be yanking and completely rebuilding the master server every 4 or 5 months.