[BUG] non archived WAL removed during production crash recovery

Поиск
Список
Период
Сортировка
От Jehan-Guillaume de Rorthais
Тема [BUG] non archived WAL removed during production crash recovery
Дата
Msg-id 20200331172229.40ee00dc@firost
обсуждение исходный текст
Ответы Re: [BUG] non archived WAL removed during production crash recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-bugs
Hello,

A colleague of mine reported an expected behavior.

On production cluster is in crash recovery, eg. after killing a backend, the
WALs ready to be archived are removed before being archived.

See in attachment the reproduction script "non-arch-wal-on-recovery.bash".

This behavior has been introduced in 78ea8b5daab9237fd42d7a8a836c1c451765499f.
Function XLogArchiveCheckDone() badly consider the in crashed recovery
production cluster as a standby without archive_mode=always. So the check
conclude the WAL can be removed safely.

  bool inRecovery = RecoveryInProgress();
  
  /*
   * The file is always deletable if archive_mode is "off".  On standbys
   * archiving is disabled if archive_mode is "on", and enabled with
   * "always".  On a primary, archiving is enabled if archive_mode is "on"
   * or "always".
   */
  if (!((XLogArchivingActive() && !inRecovery) ||
        (XLogArchivingAlways() && inRecovery)))
      return true;

Please find in attachment a patch that fix this issue using the following test
instead:

  if (!((XLogArchivingActive() && !StandbyModeRequested) ||
        (XLogArchivingAlways() && inRecovery)))
      return true;

I'm not sure if we should rely on StandbyModeRequested for the second part of
the test as well thought. What was the point to rely on RecoveryInProgress() to
get the recovery status from shared mem?

Regards,

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Devrim Gündüz
Дата:
Сообщение: Re: BUG #16307: pgdg11-updates-debuginfo YUM repository missingRHEL releasever directories
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #16330: psql accesses null pointer in connect.c:do_connect