pg_rewind WAL segments deletion pitfall

Поиск
Список
Период
Сортировка
От Полина Бунгина
Тема pg_rewind WAL segments deletion pitfall
Дата
Msg-id CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com
обсуждение исходный текст
Ответы Re: pg_rewind WAL segments deletion pitfall  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-bugs
Hello,

It seems for me that there is currently a pitfall in the pg_rewind implementation. 

Imagine the following situation:


There is a cluster consisting of a primary with the following configuration: wal_level=‘replica’, archive_mode=‘on’ and a replica.

  1. The primary that is not fast enough in archiving WAL segments (e.g. network issues, high CPU/Disk load...) 
  2. The primary fails
  3. The replica is promoted 
  4. We are not lucky enough, the new and the old primary’s timelines diverged, we need to run pg_rewind
  5. We are even less lucky: the old primary still has some WAL segments with .ready signal files that were generated before the point of divergence and were not archived. (e.g. 000000020004D20200000095.done, 000000020004D20200000096.ready, 000000020004D20200000097.ready, 000000020004D20200000098.ready)
  6. The promoted primary runs for some time and recycles the old WAL segments.
  7. We revive the old primary and try to rewind it
  8. When pg_rewind finished successfully, we see that the WAL segments with .ready files are removed, because they were already absent on the promoted replica. We end up in a situation where we completely lose some WAL segments, even though we had a clear sign that they were not archived and more importantly, pg_rewind read these segments while collecting information about the data blocks.
  9. The old primary fails to start because of the missing WAL segments (more strictly, the records between the last common checkpoint and the point of divergence) with the following log record: "ERROR:  requested WAL segment 000000020004D20200000096 has already been removed"


In this situation, after pg_rewind:
archived:

000000020004D20200000095

000000020004D20200000099.partial

000000030004D20200000099


the following segments are lost:

000000020004D20200000096

000000020004D20200000097

000000020004D20200000098


Thus, my thoughts are: why can’t pg_rewind be a little bit wiser in terms of creating filemap for WALs? Can it preserve the WAL segments that contain those potentially lost records (> the last common checkpoint and  < the point of divergence) on the target? (see the patch attached)


If I am missing something however, please correct me or explain why it is not possible to implement this straightforward solution.


Thank you,

Polina Bungina

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Excessive number of replication slots for 12->14 logical replication
Следующее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Regression in pipeline mode in libpq 14.5