Re: pgbench: could not connect to server: Resource temporarily unavailable

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pgbench: could not connect to server: Resource temporarily unavailable
Дата
Msg-id CA+hUKGKPyXKf2jrnSUMKc8XvRTYs+kkiZY9GA6nMdMUgLG6EaQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgbench: could not connect to server: Resource temporarily unavailable  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Mon, Aug 22, 2022 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Yeah retrying doesn't seem that nice.  +1 for a bit of documentation,
> > which I guess belongs in the server tuning part where we talk about
> > sysctls, perhaps with a link somewhere near max_connections?  More
> > recent Linux kernels bumped it to 4096 by default so I doubt it'll
> > come up much in the future, though.
>
> Hmm.  It'll be awhile till the 128 default disappears entirely
> though, especially if assorted BSDen use that too.  Probably
> worth the trouble to document.

I could try to write a doc patch if you aren't already on it.

> > Note that we also call listen()
> > with a backlog value capped to our own PG_SOMAXCONN which is 1000.  I
> > doubt many people benchmark with higher numbers of connections but
> > it'd be nicer if it worked when you do...
>
> Actually it's 10000.  Still, I wonder if we couldn't just remove
> that limit now that we've desupported a bunch of stone-age kernels.
> It's hard to believe any modern kernel can't defend itself against
> silly listen-queue requests.

Oh, right.  Looks like that was just  paranoia in commit 153f4006763,
back when you got away from using the (very conservative) SOMAXCONN
macro.  Looks like that was 5 on ancient systems going back to the
original sockets stuff, and later 128 was a popular number.  Yeah I'd
say +1 for removing our cap.  I'm pretty sure every system will
internally cap whatever value we pass in if it doesn't like it, as
POSIX explicitly says it can freely do with this "hint".

The main thing I learned today is that Linux's connect(AF_UNIX)
implementation doesn't refuse connections when the listen backlog is
full, unlike other OSes.  Instead, for blocking sockets, it sleeps and
wakes with everyone else to fight over space.  I *guess* for
non-blocking sockets that introduced a small contradiction -- there
isn't the state space required to give you a working EINPROGRESS with
the same sort of behaviour (if you reified a secondary queue for that
you might as well make the primary one larger...), but they also
didn't want to give you ECONNREFUSED just because you're non-blocking,
so they went with EAGAIN, because you really do need to call again
with the sockaddr.  The reason I wouldn't want to call it again is
that I guess it'd be a busy CPU burning loop until progress can be
made, which isn't nice, and failing with "Resource temporarily
unavailable" to the user does in fact describe the problem, if
somewhat vaguely.  Hmm, maybe we could add a hint to the error,
though?



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pgbench: could not connect to server: Resource temporarily unavailable
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pgbench: could not connect to server: Resource temporarily unavailable