Re: root cause of corruption in hot standby

Поиск
Список
Период
Сортировка
От Rui DeSousa
Тема Re: root cause of corruption in hot standby
Дата
Msg-id 8303E556-079F-440A-BE2A-00A0E92EB2C7@crazybean.net
обсуждение исходный текст
Ответ на Re: root cause of corruption in hot standby  (Mike Broers <mbroers@gmail.com>)
Ответы Re: root cause of corruption in hot standby  (Mike Broers <mbroers@gmail.com>)
Список pgsql-admin
> On Oct 9, 2018, at 1:04 PM, Mike Broers <mbroers@gmail.com> wrote:
>
> Ok so I have checksum errors in this replica AGAIN.

Mike,

I don’t think you are dealing with a “Postgres” issue but possibly bit rot from either faulty hardware or a
misconfigurationin your stack. 

If you recall the archive WAL file was originally corrupted.  Replicating the WAL files is outside the functionally of
Postgresthus it would either be a file replication issue, bit rot, or some other data corruption issue but not Postgres
bug.

This leaves me with the follow two points:

1. How was the replica instance instantiated? I would assume from your backup procedures as your backups should be used
tohelp validate them. 
2. Are there currently any WAL files that are corrupt?  You can quickly check using rsync with the “—checksum" option
butdon’t fix the file on the target but instead use "—dry-run" just to identify which files might have changed first.
Iwould check this every day until the issue is fully resolved. 

 i.e. rsync --archive --checksum --verbose --dry-run {source_wals}  {replica_wals}

Since you’re confident that you resolved the potential rsync race condition in archiving the WAL files we shouldn’t see
anydifferences between WALs that have already been transmitted.  If we do find WALs that are different then you’re
dealingwith data corruption on the replica and need to start looking into your stack and storage system; However, if
youdon’t find any corrupted WALs then question 1 needs to be scrutinized and you really need to ensure your backups are
rocksolid. 

I wouldn’t bother rebuilding the VM instance until the problem is identified — unless you’re moving it to an all new
hardwarestack. 

P.s. Is there any anti-virus software running on the the server or any other software that might modify files on your
behalf?




В списке pgsql-admin по дате отправления:

Предыдущее
От: Mike Broers
Дата:
Сообщение: Re: root cause of corruption in hot standby
Следующее
От: pavan95
Дата:
Сообщение: Re: Null value returned by function pg_last_wal_receive_lsn()inLogical Replication