Re: Startup PANIC on standby promotion due to zero-filled WAL segment

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: Startup PANIC on standby promotion due to zero-filled WAL segment
Дата	23 декабря 12:47:37
Msg-id	aUplOdaM4kGYd4t3@paquier.xyz обсуждение исходный текст
Ответ на	Re: Startup PANIC on standby promotion due to zero-filled WAL segment (Alena Vinter <dlaaren8@gmail.com>)
Ответы	Re: Startup PANIC on standby promotion due to zero-filled WAL segment
Список	pgsql-hackers

Дерево обсуждения

On Tue, Dec 23, 2025 at 04:33:30PM +0700, Alena Vinter wrote:
> Thanks for the review. To clarify: TLI 1 does not diverge — it is fully
> replicated to the standby before the timeline switch. The test then
> intentionally slows down replication on TLI 2 (e.g., by delaying WAL
> shipping), reproducing the scenario I illustrated. As far as I’m aware,
> `fsync` is `on` by default, and the test does not modify it — so no WAL
> records are lost due to unsafe flushing.

Don't think so, based on what is in the tree:
$ git grep "fsync = " -- *.pm
src/test/perl/PostgreSQL/Test/Cluster.pm:   print $conf "fsync = off\n";

> The core issue is that the new timeline’s segment is zero-initialized
> instead of copying the same segment from the previous timeline (as done in
> crash-recovery startup).  As a result, startup cannot finish recovery due
> to non-replicated end of WAL causing failures like “invalid magic number”.

The following addition to your proposed test is telling me an entirely
 different story, making the test pass as the records of TLI 1 are
 around:
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 $node_primary->init(allows_streaming => 1);
+#$node_primary->append_conf('postgresql.conf', 'fsync=on');
 $node_primary->start;
--
Michael

Вложения

signature.asc

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Startup PANIC on standby promotion due to zero-filled WAL segment

Вложения