Re: Race condition in recovery?

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: Race condition in recovery?
Дата
Msg-id CAFiTN-tPh8eR1zHc7WCMbBMKn4bOfwvKK0fqKKhY6phVV4ENpg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Race condition in recovery?  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Wed, Jun 9, 2021 at 2:07 AM Robert Haas <robertmhaas@gmail.com> wrote:
Then I tried to get things working on 9.6. There's a patch attached to
back-port a couple of PostgresNode.pm methods from 10 to 9.6, and also
a version of the main patch attached with the necessary wal->xlog,
lsn->location renaming. Unfortunately ... the new test case still
fails on 9.6 in a way that looks an awful lot like the bug isn't
actually fixed:

LOG:  primary server contains no more WAL on requested timeline 1
cp: /Users/rhaas/pgsql/src/test/recovery/tmp_check/data_primary_enMi/archives/000000010000000000000003:
No such file or directory
(repeated many times)

I find that the same failure happens if I back-port the master version
of the patch to v10 or v11,

I think this fails because prior to v12 the recovery target tli was not set to the latest by default because it was not GUC at that time.  So after below fix it started passing on v11(only tested on v11 so far).


diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 842878a..b3ce5da 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -50,6 +50,9 @@ my $node_cascade = get_new_node('cascade');
 $node_cascade->init_from_backup($node_standby, $backup_name,
        has_streaming => 1);
 $node_cascade->enable_restoring($node_primary);
+$node_cascade->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
 
But now it started passing even without the fix and the log says that it never tried to stream from primary using TL 1 so it never hit the defect location.

2021-06-09 12:11:08.618 IST [122456] LOG:  entering standby mode
2021-06-09 12:11:08.622 IST [122456] LOG:  restored log file "00000002.history" from archive
cp: cannot stat ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/000000010000000000000002’: No such file or directory
2021-06-09 12:11:08.627 IST [122456] LOG:  redo starts at 0/2000028
2021-06-09 12:11:08.627 IST [122456] LOG:  consistent recovery state reached at 0/3000000

Next, I will investigate, without a fix on v11 (maybe v12, v10..) why it is not hitting the defect location at all.  And after that, I will check the status on other older versions. 

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Следующее
От: Tatsuro Yamada
Дата:
Сообщение: Re: Duplicate history file?