Re: [BUG] non archived WAL removed during production crash recovery

Поиск
Список
Период
Сортировка
От Jehan-Guillaume de Rorthais
Тема Re: [BUG] non archived WAL removed during production crash recovery
Дата
Msg-id 20200424150300.1b3b0c20@firost
обсуждение исходный текст
Ответ на Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
On Fri, 24 Apr 2020 12:43:51 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Apr 23, 2020 at 10:21:15PM -0400, Tom Lane wrote:
> > Looks like the news is not good :-(  
> 
> Yes, I was looking at that for the last couple of hours, and just
> pushed something to put back the buildfarm to a green state for now
> (based on the first results things seem stable now) by removing the
> defective subset of tests.
> 
> > I see that my own florican is one of the failing critters, though
> > it failed only on HEAD which seems odd.  Any suggestions what to
> > look for?  
> 
> The issue comes from the parts of the test where we expect some .ready
> files to exist (or not) after triggering a restartpoint to force some
> segments to be recycled.  And looking more at it, I suspect that the
> issue is actually that we don't make sure in the test that the
> standbys started have replayed up to the segment switch record
> triggered on the primary (the one within generate_series(10,20)), and
> then the follow-up restart point does not actually recycle the
> segments we expect to recycle.  That's more likely going to be a
> problem on slower machines as the window gets wider between the moment
> the standbys reach their consistency point and the moment the switch
> record is replayed.

Indeed.

In regard with your fix, as we don't know if the standby caught up with the
latest available record, there's really no point to keep this test either:

  # Recovery with archive_mode=on should not create .ready files.
  # Note that this segment did not exist in the backup.
  ok( !-f "$standby1_data/$segment_path_2_ready",
     ".ready file for WAL segment $segment_name_2 not created on standby
      when archive_mode=on on standby" );

I agree the three tests could be removed as they were not covering the bug we
were chasing. However, they might still be useful to detect futur non expected
behavior changes. If you agree with this, please, find in attachment a patch
proposal against HEAD that recreate these three tests **after** a waiting loop
on both standby1 and standby2. This waiting loop is inspired from the tests in
9.5 -> 10.

Regards,

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Devrim Gündüz
Дата:
Сообщение: Re: BUG #16385: Postgres YUM repo broke
Следующее
От: Euler Taveira
Дата:
Сообщение: Re: BUG #16386: drop contraint in inherited table is missing inpg_dump backup