Re: 12.3 replicas falling over during WAL redo

Поиск
Список
Период
Сортировка
От Ben Chobot
Тема Re: 12.3 replicas falling over during WAL redo
Дата
Msg-id aa8d2d1c-3b30-f344-5412-b1f72d2fe011@silentmedia.com
обсуждение исходный текст
Ответ на Re: 12.3 replicas falling over during WAL redo  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-general
Alvaro Herrera wrote on 8/3/20 4:54 PM:
On 2020-Aug-03, Ben Chobot wrote:

Alvaro Herrera wrote on 8/3/20 2:34 PM:
On 2020-Aug-03, Ben Chobot wrote:
dd if=16605/16613/60529051 bs=8192 count=1 seek=6501 of=/tmp/page.6501
If I use skip instead of seek....
Argh, yes, I did correct that in my test and forgot to copy and paste.

     lsn      | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
--------------+----------+-------+-------+-------+---------+----------+---------+-----------
 A0A/99BA11F8 |     -215 |     0 |   180 |  7240 |    8176 |     8192
|       4 |         0

As I understand what we're looking at, this means the WAL stream was
assuming this page was last touched by A0A/AB2C43D0, but the page itself
thinks it was last touched by A0A/99BA11F8, which means at least one write
to the page is missing?
Yeah, that's exactly what we're seeing.  Somehow an older page version
was resurrected.  Of course, this should never happen.

So my theory has been proved.  What now?

Just to close the loop on this, we haven't seen the issue since we've stopped expanding our filesystems by moving LVM extents between devices, so while I don't know exactly where the bug lies, I feel it's quite likely not in Postgres.

В списке pgsql-general по дате отправления:

Предыдущее
От: Rich Shepard
Дата:
Сообщение: Re: Check for duplicates before inserting new rows
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: Bitmap scan seem like such a strange choice when "limit 1"