Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
Дата
Msg-id 20231011034026.75@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
On Sun, Oct 08, 2023 at 10:00:03PM -0700, David G. Johnston wrote:
> On Sun, Oct 8, 2023 at 9:10 PM Noah Misch <noah@leadboat.com> wrote:
> > I didn't think of any phrasing that clearly explained things without the
> > reader consulting the code.  I considered these:

I'm leaning toward:

  "socket file descriptor out of range for select(): %d" [style guide violation]

A true style guide purist might bury it in a detail message:

  ERROR: socket file descriptor out of range: %d
  DETAIL: select() accepts file descriptors from 0 to 1023, inclusive, in this build.
  HINT: Try fewer concurrent database clients.

> >   "socket file descriptor out of range: %d" [what range?]
> >
> Quick drive-by...but it seems that < 0 is a distinctly different problem
> than exceeding FD_SETSIZE.  I'm unsure what would cause the former but the
> error for the later seems like:
> 
> error: "Requested socket file descriptor %d exceeds connection limit of
> %d", fd, FD_SETSIZE-1
> hint: Reduce the requested number of concurrent connections
> 
> In short, the concept of range doesn't seem applicable here.  There is an
> error state at the max, and some invalid system state condition where the
> position within a set is somehow negative.  These should be separated -
> with the < 0 check happening first.

I view it as: the range of select()-able file descriptors is [0,FD_SETSIZE).
Negative is out of range.

> And apparently this condition isn't applicable when dealing with jobs in
> connect_slot?

For both pgbench.c and parallel_slot.c, there are sufficient negative-FD
checks elsewhere in the file.  Ideally, either both files would have redundant
checks, or neither file would.  I didn't feel the need to mess with that part
of the status quo.

> Also, I see that for connections we immediately issue FD_SET
> so this check on the boundary of the file descriptor makes sense.  But the
> remaining code in connect_slot doesn't involve FD_SET so the test for the
> file descriptor falling within its maximum seems to be coming out of
> nowhere.  Likely this is all good, and the lack of symmetry is just due to
> the natural progressive development of the code, but it stands out to the
> uninitiated looking at this patch.

True.  The approach in parallel_slot.c is to check the FD number each time it
opens a socket, after which its loop with FD_SET() doesn't recheck.  That's a
bit more efficient than the pgbench.c way, because each file may call FD_SET()
many times per socket.  Again, I didn't mess with that part of the status quo.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: stopgap fix for signal handling during restore_command
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag