On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> > I'm fairly certain that the bug here is a simple race condition in the
> > test (not the main code!):
>
> I wonder if it may perhaps be a case of both?
See my other message about the atomic fallback bit.
> > It's
> > exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> > make sure that a process has finished exiting. Then followup tests fail
> > because the process is still running
>
> I can reproduce the error when building with --disable-atomics, and it seems
> that all the failing members either do that, lack atomic.h, lack atomics or a
> combination.
atomics.h isn't important, it's just relevant for solaris (IIRC). Only
one of the failing ones lacks atomics afaict. See
On 2018-04-06 14:19:09 -0700, Andres Freund wrote:
> Is that an explanation for
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16
> ? Those all don't seem fall under that? Having proper atomics?
So there it's the timing. Note that they didn't always fail either.
> > really? Let's just force the test take at least 6s purely from
> > sleeping?
>
> The test needs continuous reading in a session to try and trigger any bugs in
> read access on the cluster during checksumming, is there a good way to do that
> in the isolationtester? I have failed to find a good way to repeat a step like
> that, but I might be missing something.
IDK, I know this isn't right.
Greetings,
Andres Freund