Andres Freund <andres@anarazel.de> writes:
> On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
>> This actually causes parallel check-world to fail altogether on florican's
>> host, because the initial fsync of the recovered primary takes more than 3
>> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.
> Ugh.
I misspoke there: it's the standby that is performing an fsync'd
checkpoint and timing out, during the test's promote-the-standby
step.
This test attempt revealed another problem too: the standby never
shut down, and thus the calling "make" never quit, until I intervened
manually. I'm not sure why. I see that Cluster::promote uses
system_or_bail() to run "pg_ctl promote" ... could it be that
BAIL_OUT causes the normal script END hooks to not get run?
But it seems like we'd have noticed that long ago.
regards, tom lane