incorrect resource manager data checksum in record

Поиск
Список
Период
Сортировка
От Devin Christensen
Тема incorrect resource manager data checksum in record
Дата
Msg-id CANQ55Tsoa6=vk2YkeVUN7qO-2YdqJf_AMVQxqsVTYJm0qqQQuw@mail.gmail.com
обсуждение исходный текст
Ответы Re: incorrect resource manager data checksum in record
Список pgsql-general
I've been seeing this issue in multiple separate hot standby replication chains of PostgreSQL servers (5 so far). There are 4 servers in each chain (some running Ubuntu 14.04 and others Ubuntu 16.04. and PostgreSQL >= 10.1 and <= 11). We also have a mix of ext4 and zfs file systems. Here are the details for each chain:

First chain
===========
dc1-pg105 (pg 10.1, ub 14.04.5) (primary)
   |
   V
dc1-pg205 (pg 10.3, ub 16.04.4)

   |
   V
dc2-pg105 (pg 10.1, ub 14.04.5) <-- error first occurs here
   |
   V
dc2-pg205 (pg 10.3, ub 16.04.4) <-- and also effects this node


Second chain
===========
dc1-pg106 (pg 10.1, ub 14.04.5, ext4) (primary)
   |
   V
dc1-pg206 (pg 10.3, ub 16.04.4, zfs)

   |
   V
dc2-pg106 (pg 10.1, ub 14.04.5, ext4) <-- error first occurs here
   |
   V
dc2-pg206 (pg 10.3, ub 16.04.4, zfs<-- and also effects this node


Third chain
===========
dc1-pg107 (pg 10.1, ub 14.04.5
, ext4) (primary)
   |
   V
dc1-pg207 (pg 10.3, ub 16.04.4
, zfs)
   |
   V
dc2-pg107 (pg 10.1, ub 14.04.5, ext4) <-- error first occurs here
   |
   V
dc2-pg207 (pg 10.3, ub 16.04.4, zfs<-- and also effects this node


Fourth chain
===========
dc1-pg108 (pg 10.3, ub 
16.04.4, ext4) (primary)
   |
   V
dc1-pg208 (pg 10.3, ub 16.04.4
, zfs)
   |
   V
dc2-pg108 (pg 10.3, ub 16.04.4, ext4) <-- error first occurs here
   |
   V
dc2-pg208 (pg 10.3, ub 16.04.4, zfs<-- and also effects this node


Fifth chain
===========
dc1-pg110 (pg 10.3, ub 
16.04.4, ext4) (primary)
   |
   V
dc1-pg210 (pg 10.3, ub 16.04.4
, zfs)
   |
   V
dc2-pg110 (pg 10.3, ub 16.04.4, ext4) <-- error first occurs here
   |
   V
dc2-pg210 (pg 10.3, ub 16.04.4, zfs<-- and also effects this node


The pattern is the same, regardless of ubuntu or postgresql versions. I'm concerned this is somehow a ZFS corruption bug, because the error always occurs downstream of the first ZFS node and ZFS is a recent addition. I don't know enough about what this error means, and haven't found much online. When I restart the nodes effected, replication resumes normally, with no known side-effects that I've discovered so far, but I'm no longer confident that the data downstream from the primary is valid. Really not sure how best to start tackling this issue, and hoping to get some guidance. The error is infrequent. We have 11 total replication chains, and this error has occurred on 5 of those chains in approximately 2 months.

В списке pgsql-general по дате отправления:

Предыдущее
От: "joby.john@nccgroup.trust"
Дата:
Сообщение: Re: Database name with semicolon
Следующее
От: Niles Oien
Дата:
Сообщение: Re: plperl and plperlu language extentsions