Michael Paquier <michael@paquier.xyz> writes:
> The buildfarm has provided some feedback, and the new tests have been
> unstable on mamba:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-10-15%2005%3A04%3A21
Yeah. FWIW, I tried to reproduce this on mamba's host, but did not
see it again in 100 or so tries.
> In more details:
> # poll_query_until timed out executing this query:
> # SELECT datname, usename, wait_event FROM pg_stat_activity
> # WHERE backend_type = 'worker_spi dynamic' AND
> # pid = ;
> # with stderr:
> # ERROR: syntax error at or near ";"
> # LINE 3: pid = ;
> So this looks like a hard failure in starting the worker that should
> bypass the role login check. The logs don't offer much information,
> but I think I know what's going on here: at this stage of the tests,
> the number of workers created is 7, very close to the limit of
> max_worker_processes, at 8 by default. So one parallel worker spawned
> by any of the other bgworkers would be enough to prevent the last one
> to start, and mamba has been slow enough in the startup of the static
> workers to show that this could be possible.
I agree that that probably is the root cause, and we should fix it
by bumping up max_worker_processes in this test.
But this failure is annoying in another way. Evidently,
worker_spi_launch returned NULL, which must have been from here:
if (!RegisterDynamicBackgroundWorker(&worker, &handle))
PG_RETURN_NULL();
Why in the world is that "return NULL", and not "ereport(ERROR)"
like all the other failure conditions in that function?
If there's actually a defensible reason for the C code to act
like that, then all the call sites need to have checks for
a null result.
(cc'ing Robert, as this coding originated at 090d0f205.)
regards, tom lane