> On 20 Aug 2021, at 20:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> If we want the test to run but not fail the entire test suite if it fails then
>> it should use a TODO block instead, but that’s intended for tests known to fail
>> and this doesn’t seem to fall in that category.
>
> That seems pretty useless. If we did break things in this area,
> such a test would not help us notice.
For sure. I wasn’t advocating it, merely indicating that the SKIP block isn’t
working the way attributed to upthread.
> The problem with the test seems blindingly obvious from here: it
> is assuming first that psql will start fast enough to print its
> PID within one second, and next that we'll be able to issue
> the cancel (and have the backend react) in less than 2 seconds
> more. This seems about guaranteed to fail on cache-clobber
> animals, for example, but animals that are merely slow or overloaded
> would have issues too.
>
> I think you should drop the overly-cute bit with a SIGALRM handler,
> and instead have a loop-with-delay around an attempt to read the
> psql.pid file, after launching the psql run without an immediate
> wait for termination. That gets rid of the first problem (though
> you still want the loop to timeout eventually, it could wait up
> to say 180 seconds, as we do elsewhere). Then the second problem
> is easy to solve by making the pg_sleep delay twice as much.
This could perhaps be done with a PostgresNode::interactive_psql session? I
used that in a similar, but far from the same, test setup in the online
checksums patchset.
--
Daniel Gustafsson https://vmware.com/