Re: "invalid contrecord" error on replica

Поиск
Список
Период
Сортировка
От Adrien Nayrat
Тема Re: "invalid contrecord" error on replica
Дата
Msg-id d3374925-79dc-fd0d-be9f-47fb4f967804@anayrat.info
обсуждение исходный текст
Ответ на Re: "invalid contrecord" error on replica  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: "invalid contrecord" error on replica  (Adrien Nayrat <adrien.nayrat@anayrat.info>)
Список pgsql-general
On 5/6/21 7:37 AM, Kyotaro Horiguchi wrote:
> At Sun, 2 May 2021 22:43:44 +0200, Adrien Nayrat <adrien.nayrat@anayrat.info> wrote in
>> I also dumped 00000001000000AA000000A1 on the secondary and it
>> contains all the records until AA/A1004018.
>>
>> It is really weird, I don't understand how the secondary can miss the
>> last 2 records of A0? It seems he did not received the
>> CHECKPOINT_SHUTDOWN record?
>>
>> Any idea?
> 
> This seems like stepping on the same issue with [1], in short, the
> secondary having received an incomplete record but the primary forgot
> of the record after restart.
> 
> Specifically, primary was writing a WAL record that starts at A0FFFB70
> and continues to A1xxxxxx segment. The secondary successfully received
> the first half of the record but the primary failed to write (then
> send) the last half of the record due to disk full.
> 
> At this time it seems that the primary's last completed record ended
> at A0FFB70. Then the CHECKPOINT_SHUTDOWN record overwrote the
> already-halfly-sent record up to A0FFBE8 while restarting.
> 
> On the secondary side, there's only the first half of the record,
> which had been forgotten by the primary and the last half starting at
> LSN A1000000 was still the future in the new history on the primary.
> 
> After some time the primary reaches A1000000 but the first record in
> the segment is of course disagrees with the history of the secondary.
> 
> 1: https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com
> 
> regards.
> 

Hello,

Thanks for your reply and your explanation! Now, I understand, it's good to know 
it is a known issue.
I'll follow this thread, I hope we will find a solution. It's annoying that your 
secondary breaks when your primary crash and the only solution is to either 
fetch an archived WAL file and replace it on the secondary, or completely 
rebuild your secondary.

Thanks





-- 
Adrien NAYRAT




В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Strange behavior of function date_trunc
Следующее
От: Droid Tools
Дата:
Сообщение: Optimizing search query with sorting by creation field