Re: root cause of corruption in hot standby

Поиск
Список
Период
Сортировка
От Rui DeSousa
Тема Re: root cause of corruption in hot standby
Дата
Msg-id DE4377AC-24BA-45AB-9D43-0A0A340475F9@crazybean.net
обсуждение исходный текст
Ответ на Re: root cause of corruption in hot standby  (Mike Broers <mbroers@gmail.com>)
Ответы Re: root cause of corruption in hot standby  (Mike Broers <mbroers@gmail.com>)
Список pgsql-admin
> On Oct 10, 2018, at 10:15 AM, Mike Broers <mbroers@gmail.com> wrote:
>
>
> I'll look into rsync checksums, but this corruption presented itself during a time when streaming replication was
workingfine and it wasnt restoring archived rsynced transaction logs, and hadnt done so for around 30 hours.  The table
itcomplained about it is accessed every minute with updates and monitoring so I dont think it would have taken so long
ifit was due to the application of a corrupted wal.  
>

I think you missed my point.  If you are dealing with some sort of bit rot and/or data corruption on your storage
deviceyou need to sort of prove it which is very difficult to do. 

You have WAL files on primary and the same WAL files on the replica via your rsync copy job.  If you check and recheck
allthe WALs daily to see if any of the files are changing and find a difference than proves that there is some sort of
corruption/bitrot occurring as the WAL files are static files. 

I’ve seen this type of corruption before with RAID controllers that are over taxed; where they would corrupt over time
periodically. I ended up changing from a RAID configuration to a JBOD and managing the disks via ZFS instead and never
againexperience data corruption using the exact same hardware.  ZFS also detects bit rot and correct for it as well as
theability to scrub the pool to ensure the disks are not slowly rotting away. 

What storage system is being used? Does it have any measures to prevent bit rot? What is the RAID configuration? I
wouldnot recommend RAID 5 for a database; under heavy load the performance degradation and increase likelihood of data
corruptionis not worth it. 

It sounds like you have some sort of environmental issues which is corrupting your data and it is not a Postgres issue.
The problem you face is that without some sort of definitive poof you’ll enter the realm of finger pointing… it’s a
databaseissue, no is storage issue, etc. 

You have two replicas; one periodically fails and the other does not — the only difference is the environment in which
theyoperate. 

В списке pgsql-admin по дате отправления:

Предыдущее
От: Mike Broers
Дата:
Сообщение: Re: root cause of corruption in hot standby
Следующее
От: Mike Broers
Дата:
Сообщение: Re: root cause of corruption in hot standby