Обсуждение: pgbench: could not connect to server: Resource temporarily unavailable
Hi Everyone,
I'm trying to run pgbench with various numbers of connections. However, my DB seems to be hitting some limit around 147-150 connections. I'd like to run with at least 500 and even up to 2000 if possible.
I've already increased the max_connections, shared_buffers and kernel.shmmax. All by 20 times.
What's limiting my DB from allowing more connections?
This is a sample of the output I'm getting, which repeats the error 52 times (one for each failed connection)
-bash-4.2$ pgbench -c 200 -j 200 -t 100 benchy
...
connection to database "benchy" failed:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 200
number of threads: 200
number of transactions per client: 100
number of transactions actually processed: 14800/20000
latency average = 165.577 ms
tps = 1207.895829 (including connections establishing)
tps = 1255.496312 (excluding connections establishing)
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 200
number of threads: 200
number of transactions per client: 100
number of transactions actually processed: 14800/20000
latency average = 165.577 ms
tps = 1207.895829 (including connections establishing)
tps = 1255.496312 (excluding connections establishing)
Thanks,
Kevin
Kevin McKibbin <kevinmckibbin123@gmail.com> writes: > What's limiting my DB from allowing more connections? > This is a sample of the output I'm getting, which repeats the error 52 > times (one for each failed connection) > -bash-4.2$ pgbench -c 200 -j 200 -t 100 benchy > ... > connection to database "benchy" failed: > could not connect to server: Resource temporarily unavailable > Is the server running locally and accepting > connections on Unix domain socket > "/var/run/postgresql/.s.PGSQL.5432"? This is apparently a client-side failure not a server-side failure (you could confirm that by seeing whether any corresponding failure shows up in the postmaster log). That means that the kernel wouldn't honor pgbench's attempt to open a connection, which implies you haven't provisioned enough networking resources to support the number of connections you want. Since you haven't mentioned what platform this is on, it's impossible to say more than that --- but it doesn't look like Postgres configuration settings are at issue at all. regards, tom lane
Sorry Tom for the duplicate email. Resending with the mailing list.
Thanks for your response. I'm using a Centos Linux environment and have the open files set very high:-bash-4.2$ ulimit -a|grep open
open files (-n) 65000What else could be limiting the connections?KevinOn Sat, 20 Aug 2022 at 21:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:Kevin McKibbin <kevinmckibbin123@gmail.com> writes:
> What's limiting my DB from allowing more connections?
> This is a sample of the output I'm getting, which repeats the error 52
> times (one for each failed connection)
> -bash-4.2$ pgbench -c 200 -j 200 -t 100 benchy
> ...
> connection to database "benchy" failed:
> could not connect to server: Resource temporarily unavailable
> Is the server running locally and accepting
> connections on Unix domain socket
> "/var/run/postgresql/.s.PGSQL.5432"?
This is apparently a client-side failure not a server-side failure
(you could confirm that by seeing whether any corresponding
failure shows up in the postmaster log). That means that the
kernel wouldn't honor pgbench's attempt to open a connection,
which implies you haven't provisioned enough networking resources
to support the number of connections you want. Since you haven't
mentioned what platform this is on, it's impossible to say more
than that --- but it doesn't look like Postgres configuration
settings are at issue at all.
regards, tom lane
On 2022-08-20 Sa 23:20, Tom Lane wrote: > Kevin McKibbin <kevinmckibbin123@gmail.com> writes: >> What's limiting my DB from allowing more connections? >> This is a sample of the output I'm getting, which repeats the error 52 >> times (one for each failed connection) >> -bash-4.2$ pgbench -c 200 -j 200 -t 100 benchy >> ... >> connection to database "benchy" failed: >> could not connect to server: Resource temporarily unavailable >> Is the server running locally and accepting >> connections on Unix domain socket >> "/var/run/postgresql/.s.PGSQL.5432"? > This is apparently a client-side failure not a server-side failure > (you could confirm that by seeing whether any corresponding > failure shows up in the postmaster log). That means that the > kernel wouldn't honor pgbench's attempt to open a connection, > which implies you haven't provisioned enough networking resources > to support the number of connections you want. Since you haven't > mentioned what platform this is on, it's impossible to say more > than that --- but it doesn't look like Postgres configuration > settings are at issue at all. The first question in my mind from the above is where this postgres instance is actually listening. Is it really /var/run/postgresql? Its postmaster.pid will tell you. I have often seen client programs pick up a system libpq which is compiled with a different default socket directory. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 2022-08-20 Sa 23:20, Tom Lane wrote: >> Kevin McKibbin <kevinmckibbin123@gmail.com> writes: >>> What's limiting my DB from allowing more connections? > The first question in my mind from the above is where this postgres > instance is actually listening. Is it really /var/run/postgresql? Its > postmaster.pid will tell you. I have often seen client programs pick up > a system libpq which is compiled with a different default socket directory. I wouldn't think that'd explain a symptom of some connections succeeding and others not within the same pgbench run. I tried to duplicate this behavior locally (on RHEL8) and got something interesting. After increasing the server's max_connections to 1000, I can do $ pgbench -S -c 200 -j 100 -t 100 bench and it goes through fine. But: $ pgbench -S -c 200 -j 200 -t 100 bench pgbench (16devel) starting vacuum...end. pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable Is the server running locally and accepting connections on that socket? pgbench: error: could not create connection for client 154 So whatever is triggering this has nothing to do with the server, but with how many threads are created inside pgbench. I notice also that sometimes it works, making it seem like possibly a race condition. Either that or there's some limitation on how fast threads within a process can open sockets. Also, I determined that libpq's connect() call is failing synchronously (we get EAGAIN directly from the connect() call, not later). I wondered if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no: that just makes it fail on the next touch of the socket. The only documented reason for connect(2) to fail with EAGAIN is EAGAIN Insufficient entries in the routing cache. which seems pretty unlikely to be the issue here, since all these connections are being made to the same local address. On the whole this is smelling more like a Linux kernel bug than anything else. regards, tom lane
On 2022-08-21 Su 17:15, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 2022-08-20 Sa 23:20, Tom Lane wrote: >>> Kevin McKibbin <kevinmckibbin123@gmail.com> writes: >>>> What's limiting my DB from allowing more connections? >> The first question in my mind from the above is where this postgres >> instance is actually listening. Is it really /var/run/postgresql? Its >> postmaster.pid will tell you. I have often seen client programs pick up >> a system libpq which is compiled with a different default socket directory. > I wouldn't think that'd explain a symptom of some connections succeeding > and others not within the same pgbench run. Oh, yes, I agree, I missed that aspect of it. > > I tried to duplicate this behavior locally (on RHEL8) and got something > interesting. After increasing the server's max_connections to 1000, > I can do > > $ pgbench -S -c 200 -j 100 -t 100 bench > > and it goes through fine. But: > > $ pgbench -S -c 200 -j 200 -t 100 bench > pgbench (16devel) > starting vacuum...end. > pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable > Is the server running locally and accepting connections on that socket? > pgbench: error: could not create connection for client 154 > > So whatever is triggering this has nothing to do with the server, > but with how many threads are created inside pgbench. I notice > also that sometimes it works, making it seem like possibly a race > condition. Either that or there's some limitation on how fast > threads within a process can open sockets. > > Also, I determined that libpq's connect() call is failing synchronously > (we get EAGAIN directly from the connect() call, not later). I wondered > if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no: > that just makes it fail on the next touch of the socket. > > The only documented reason for connect(2) to fail with EAGAIN is > > EAGAIN Insufficient entries in the routing cache. > > which seems pretty unlikely to be the issue here, since all these > connections are being made to the same local address. > > On the whole this is smelling more like a Linux kernel bug than > anything else. > > *nod* cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 2022-08-21 Su 17:15, Tom Lane wrote: >> On the whole this is smelling more like a Linux kernel bug than >> anything else. > *nod* Conceivably we could work around this in libpq: on EAGAIN, just retry the failed connect(), or maybe better to close the socket and take it from the top with the same target server address. On the one hand, reporting EAGAIN certainly sounds like an invitation to do just that. On the other hand, if the failure is persistent then libpq is locked up in a tight loop --- and "Insufficient entries in the routing cache" doesn't seem like a condition that would clear immediately. It's also pretty unclear why the kernel would want to return EAGAIN instead of letting the nonblock connection path do the waiting, which is why I'm suspecting a bug rather than designed behavior. I think I'm disinclined to install such a workaround unless we get confirmation from some kernel hacker that it's operating as designed and application-level retry is intended. regards, tom lane
On Mon, Aug 22, 2022 at 9:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's also pretty unclear why the kernel would want to return > EAGAIN instead of letting the nonblock connection path do the > waiting, which is why I'm suspecting a bug rather than designed > behavior. Could it be that it fails like that if the listen queue is full on the other side? https://github.com/torvalds/linux/blob/master/net/unix/af_unix.c#L1493 If it's something like that, maybe increasing /proc/sys/net/core/somaxconn would help? I think older kernels only had 128 here.
Hi, On 2022-08-21 17:15:01 -0400, Tom Lane wrote: > I tried to duplicate this behavior locally (on RHEL8) and got something > interesting. After increasing the server's max_connections to 1000, > I can do > > $ pgbench -S -c 200 -j 100 -t 100 bench > > and it goes through fine. But: > > $ pgbench -S -c 200 -j 200 -t 100 bench > pgbench (16devel) > starting vacuum...end. > pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable > Is the server running locally and accepting connections on that socket? > pgbench: error: could not create connection for client 154 > > So whatever is triggering this has nothing to do with the server, > but with how many threads are created inside pgbench. I notice > also that sometimes it works, making it seem like possibly a race > condition. Either that or there's some limitation on how fast > threads within a process can open sockets. I think it's more likely to be caused by the net.core.somaxconn sysctl limiting the size of the listen backlog. The threads part just influences the speed at which new connections are made, and thus how quickly the backlog is filled. Do you get the same behaviour if you set net.core.somaxconn to higher than the number of connections? IIRC you need to restart postgres for it to take effect. Greetings, Andres Freund
Thomas Munro <thomas.munro@gmail.com> writes: > If it's something like that, maybe increasing > /proc/sys/net/core/somaxconn would help? I think older kernels only > had 128 here. Bingo! I see $ cat /proc/sys/net/core/somaxconn 128 by default, which is right about where the problem starts. After $ sudo sh -c 'echo 1000 >/proc/sys/net/core/somaxconn' *and restarting the PG server*, I can do a lot more threads without a problem. Evidently, the server's socket's listen queue length is fixed at creation and adjusting the kernel limit won't immediately change it. So what we've got is that EAGAIN from connect() on a Unix socket can mean "listen queue overflow" and the kernel won't treat that as a nonblock-waitable condition. Still seems like a kernel bug perhaps, or at least a misfeature. Not sure what I think at this point about making libpq retry after EAGAIN. It would make sense for this particular undocumented use of EAGAIN, but I'm worried about others, especially the documented reason. On the whole I'm inclined to leave the code alone; but is there sufficient reason to add something about adjusting somaxconn to our documentation? regards, tom lane
On Mon, Aug 22, 2022 at 10:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Not sure what I think at this point about making libpq retry after > EAGAIN. It would make sense for this particular undocumented use > of EAGAIN, but I'm worried about others, especially the documented > reason. On the whole I'm inclined to leave the code alone; > but is there sufficient reason to add something about adjusting > somaxconn to our documentation? My Debian system apparently has a newer man page: EAGAIN For nonblocking UNIX domain sockets, the socket is nonblocking, and the connection cannot be completed immediately. For other socket families, there are insufficient entries in the routing cache. Yeah retrying doesn't seem that nice. +1 for a bit of documentation, which I guess belongs in the server tuning part where we talk about sysctls, perhaps with a link somewhere near max_connections? More recent Linux kernels bumped it to 4096 by default so I doubt it'll come up much in the future, though. Note that we also call listen() with a backlog value capped to our own PG_SOMAXCONN which is 1000. I doubt many people benchmark with higher numbers of connections but it'd be nicer if it worked when you do... I was curious and checked how FreeBSD would handle this. Instead of EAGAIN you get ECONNREFUSED here, until you crank up kern.ipc.somaxconn, which also defaults to 128 like older Linux.
Thomas Munro <thomas.munro@gmail.com> writes: > Yeah retrying doesn't seem that nice. +1 for a bit of documentation, > which I guess belongs in the server tuning part where we talk about > sysctls, perhaps with a link somewhere near max_connections? More > recent Linux kernels bumped it to 4096 by default so I doubt it'll > come up much in the future, though. Hmm. It'll be awhile till the 128 default disappears entirely though, especially if assorted BSDen use that too. Probably worth the trouble to document. > Note that we also call listen() > with a backlog value capped to our own PG_SOMAXCONN which is 1000. I > doubt many people benchmark with higher numbers of connections but > it'd be nicer if it worked when you do... Actually it's 10000. Still, I wonder if we couldn't just remove that limit now that we've desupported a bunch of stone-age kernels. It's hard to believe any modern kernel can't defend itself against silly listen-queue requests. regards, tom lane
On Mon, Aug 22, 2022 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Yeah retrying doesn't seem that nice. +1 for a bit of documentation, > > which I guess belongs in the server tuning part where we talk about > > sysctls, perhaps with a link somewhere near max_connections? More > > recent Linux kernels bumped it to 4096 by default so I doubt it'll > > come up much in the future, though. > > Hmm. It'll be awhile till the 128 default disappears entirely > though, especially if assorted BSDen use that too. Probably > worth the trouble to document. I could try to write a doc patch if you aren't already on it. > > Note that we also call listen() > > with a backlog value capped to our own PG_SOMAXCONN which is 1000. I > > doubt many people benchmark with higher numbers of connections but > > it'd be nicer if it worked when you do... > > Actually it's 10000. Still, I wonder if we couldn't just remove > that limit now that we've desupported a bunch of stone-age kernels. > It's hard to believe any modern kernel can't defend itself against > silly listen-queue requests. Oh, right. Looks like that was just paranoia in commit 153f4006763, back when you got away from using the (very conservative) SOMAXCONN macro. Looks like that was 5 on ancient systems going back to the original sockets stuff, and later 128 was a popular number. Yeah I'd say +1 for removing our cap. I'm pretty sure every system will internally cap whatever value we pass in if it doesn't like it, as POSIX explicitly says it can freely do with this "hint". The main thing I learned today is that Linux's connect(AF_UNIX) implementation doesn't refuse connections when the listen backlog is full, unlike other OSes. Instead, for blocking sockets, it sleeps and wakes with everyone else to fight over space. I *guess* for non-blocking sockets that introduced a small contradiction -- there isn't the state space required to give you a working EINPROGRESS with the same sort of behaviour (if you reified a secondary queue for that you might as well make the primary one larger...), but they also didn't want to give you ECONNREFUSED just because you're non-blocking, so they went with EAGAIN, because you really do need to call again with the sockaddr. The reason I wouldn't want to call it again is that I guess it'd be a busy CPU burning loop until progress can be made, which isn't nice, and failing with "Resource temporarily unavailable" to the user does in fact describe the problem, if somewhat vaguely. Hmm, maybe we could add a hint to the error, though?
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Aug 22, 2022 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm. It'll be awhile till the 128 default disappears entirely >> though, especially if assorted BSDen use that too. Probably >> worth the trouble to document. > I could try to write a doc patch if you aren't already on it. I haven't done anything about it yet, but could do so tomorrow or so. (BTW, I just finished discovering that NetBSD has the same 128 limit. It looks like they intended to make that settable via sysctl, because it's a variable not a constant; but they haven't actually wired up the variable to sysctl yet.) > Oh, right. Looks like that was just paranoia in commit 153f4006763, > back when you got away from using the (very conservative) SOMAXCONN > macro. Looks like that was 5 on ancient systems going back to the > original sockets stuff, and later 128 was a popular number. Yeah I'd > say +1 for removing our cap. I'm pretty sure every system will > internally cap whatever value we pass in if it doesn't like it, as > POSIX explicitly says it can freely do with this "hint". Yeah. I hadn't thought to check the POSIX text, but their listen(2) page is pretty clear that implementations should *silently* reduce the value to what they can handle, not fail. Also, SUSv2 says the same thing in different words, so the requirement's been that way for a very long time. I think we could drop this ancient bit of paranoia. > ... Hmm, maybe we could add a hint to the error, > though? libpq doesn't really have a notion of hints --- perhaps we ought to fix that sometime. But this doesn't seem like a very exciting place to start, given the paucity of prior complaints. (And anyway people using other client libraries wouldn't be helped.) I think some documentation in the "Managing Kernel Resources" section should be plenty for this. regards, tom lane
On Mon, Aug 22, 2022 at 2:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Mon, Aug 22, 2022 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Hmm. It'll be awhile till the 128 default disappears entirely > >> though, especially if assorted BSDen use that too. Probably > >> worth the trouble to document. > > > I could try to write a doc patch if you aren't already on it. > > I haven't done anything about it yet, but could do so tomorrow or so. Cool. BTW small correction to something I said about FreeBSD: it'd be better to document the new name kern.ipc.soacceptqueue (see listen(2) HISTORY) even though the old name still works and matches OpenBSD and macOS.
Thomas Munro <thomas.munro@gmail.com> writes: > Cool. BTW small correction to something I said about FreeBSD: it'd be > better to document the new name kern.ipc.soacceptqueue (see listen(2) > HISTORY) even though the old name still works and matches OpenBSD and > macOS. Thanks. Sounds like we get to document at least three different sysctl names for this setting :-( regards, tom lane
OK, here's some proposed patches. 0001 adds a para about how to raise the listen queue length. 0002 isn't quite related, but while writing 0001 I noticed a nearby use of /proc/sys/... which I thought should be converted to sysctl. IMO /proc/sys pretty much sucks, at least for documentation purposes, for multiple reasons: * It's unlike the way you do things on other platforms. * "man sysctl" will lead you to useful documentation about how to use that command. There's no obvious way to find documentation about /proc/sys. * It's not at all sudo-friendly. Compare sudo sh -c 'echo 0 >/proc/sys/kernel/randomize_va_space' sudo sysctl -w kernel.randomize_va_space=0 The former is a lot longer and it's far from obvious why you have to do it that way. * You have to think in sysctl terms anyway if you want to make the setting persist across reboots, which you almost always do. * Everywhere else in runtime.sgml, we use sysctl not /proc/sys. 0003 removes PG_SOMAXCONN. While doing that I noticed that this computation hadn't been touched throughout all the various changes fooling with exactly what gets counted in MaxBackends. I think the most appropriate definition for the listen queue length is now MaxConnections * 2, not MaxBackends * 2, because the other processes counted in MaxBackends don't correspond to incoming connections. I propose 0003 for HEAD only, but the docs changes could be back-patched. regards, tom lane diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 963b18ed85..1192faa6ae 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1298,6 +1298,22 @@ default:\ linkend="guc-max-files-per-process"/> configuration parameter to limit the consumption of open files. </para> + + <para> + Another kernel limit that may be of concern when supporting large + numbers of client connections is the maximum socket connection queue + length. If more than that many connection requests arrive within a + very short period, some may get rejected before the postmaster can + service the requests, with those clients receiving unhelpful + connection failure errors such as <quote>Resource temporarily + unavailable</quote>. The default queue length limit is 128 on many + platforms. To raise it, adjust the appropriate kernel parameter + via <application>sysctl</application>, then restart the postmaster. + The parameter is variously named <varname>net.core.somaxconn</varname> + on Linux, <varname>kern.ipc.soacceptqueue</varname> on newer FreeBSD, + and <varname>kern.ipc.somaxconn</varname> on macOS and other BSD + variants. + </para> </sect2> <sect2 id="linux-memory-overcommit"> diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 963b18ed85..1192faa6ae 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1258,11 +1258,12 @@ default:\ <itemizedlist> <listitem> <para> - On <productname>Linux</productname> - <filename>/proc/sys/fs/file-max</filename> determines the - maximum number of open files that the kernel will support. It can - be changed by writing a different number into the file or by - adding an assignment in <filename>/etc/sysctl.conf</filename>. + On <productname>Linux</productname> the kernel parameter + <varname>fs.file-max</varname> determines the maximum number of open + files that the kernel will support. It can be changed with + <literal>sysctl -w fs.file-max=<replaceable>N</replaceable></literal>. + To make the setting persist across reboots, add an assignment + in <filename>/etc/sysctl.conf</filename>. The maximum limit of files per process is fixed at the time the kernel is compiled; see <filename>/usr/src/linux/Documentation/proc.txt</filename> for diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 8a038d1b2a..1664fcee2a 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4891,7 +4891,7 @@ SubPostmasterMain(int argc, char *argv[]) * If testing EXEC_BACKEND on Linux, you should run this as root before * starting the postmaster: * - * echo 0 >/proc/sys/kernel/randomize_va_space + * sysctl -w kernel.randomize_va_space=0 * * This prevents using randomized stack and code addresses that cause the * child process's memory map to be different from the parent's, making it diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 8ff3be611d..7112e9751b 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -537,13 +537,11 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber, } /* - * Select appropriate accept-queue length limit. PG_SOMAXCONN is only - * intended to provide a clamp on the request on platforms where an - * overly large request provokes a kernel error (are there any?). + * Select appropriate accept-queue length limit. It seems reasonable + * to use a value similar to the maximum number of child processes + * that the postmaster will permit. */ - maxconn = MaxBackends * 2; - if (maxconn > PG_SOMAXCONN) - maxconn = PG_SOMAXCONN; + maxconn = MaxConnections * 2; err = listen(fd, maxconn); if (err < 0) diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h index 844c3e0f09..f2a106f983 100644 --- a/src/include/pg_config_manual.h +++ b/src/include/pg_config_manual.h @@ -114,17 +114,6 @@ */ #define MAXPGPATH 1024 -/* - * PG_SOMAXCONN: maximum accept-queue length limit passed to - * listen(2). You'd think we should use SOMAXCONN from - * <sys/socket.h>, but on many systems that symbol is much smaller - * than the kernel's actual limit. In any case, this symbol need be - * twiddled only if you have a kernel that refuses large limit values, - * rather than silently reducing the value to what it can handle - * (which is what most if not all Unixen do). - */ -#define PG_SOMAXCONN 10000 - /* * You can try changing this if you have a machine with bytes of * another size, but no guarantee...
Thanks for your input everyone! I wanted to confirm that increasing the somaxconn also fixed the issue for me.
Kevin
$ cat /proc/sys/net/core/somaxconn
128
by default, which is right about where the problem starts. After
$ sudo sh -c 'echo 1000 >/proc/sys/net/core/somaxconn'
*and restarting the PG server*, I can do a lot more threads without
a problem. Evidently, the server's socket's listen queue length
is fixed at creation and adjusting the kernel limit won't immediately
change it.
On Tue, Aug 23, 2022 at 4:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > 0001 adds a para about how to raise the listen queue length. + service the requests, with those clients receiving unhelpful + connection failure errors such as <quote>Resource temporarily + unavailable</quote>. LGTM but I guess I would add "... or Connection refused"? > 0002 isn't quite related, but while writing 0001 I noticed a nearby > use of /proc/sys/... which I thought should be converted to sysctl. > IMO /proc/sys pretty much sucks, at least for documentation purposes, > for multiple reasons: +1 > 0003 removes PG_SOMAXCONN. While doing that I noticed that this > computation hadn't been touched throughout all the various > changes fooling with exactly what gets counted in MaxBackends. > I think the most appropriate definition for the listen queue > length is now MaxConnections * 2, not MaxBackends * 2, because > the other processes counted in MaxBackends don't correspond to > incoming connections. +1 > I propose 0003 for HEAD only, but the docs changes could be > back-patched. +1
Just curious, *backlog* defines the maximum pending connections, why do we need to double the MaxConnections as the queue size? It seems *listen* with larger *backlog* will tell the OS maintain a larger buffer? - maxconn = MaxBackends * 2; - if (maxconn > PG_SOMAXCONN) - maxconn = PG_SOMAXCONN; + maxconn = MaxConnections * 2; On Tue, Aug 23, 2022 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > OK, here's some proposed patches. > > 0001 adds a para about how to raise the listen queue length. > > 0002 isn't quite related, but while writing 0001 I noticed a nearby > use of /proc/sys/... which I thought should be converted to sysctl. > IMO /proc/sys pretty much sucks, at least for documentation purposes, > for multiple reasons: > > * It's unlike the way you do things on other platforms. > > * "man sysctl" will lead you to useful documentation about how to > use that command. There's no obvious way to find documentation > about /proc/sys. > > * It's not at all sudo-friendly. Compare > sudo sh -c 'echo 0 >/proc/sys/kernel/randomize_va_space' > sudo sysctl -w kernel.randomize_va_space=0 > The former is a lot longer and it's far from obvious why you have > to do it that way. > > * You have to think in sysctl terms anyway if you want to make the > setting persist across reboots, which you almost always do. > > * Everywhere else in runtime.sgml, we use sysctl not /proc/sys. > > 0003 removes PG_SOMAXCONN. While doing that I noticed that this > computation hadn't been touched throughout all the various > changes fooling with exactly what gets counted in MaxBackends. > I think the most appropriate definition for the listen queue > length is now MaxConnections * 2, not MaxBackends * 2, because > the other processes counted in MaxBackends don't correspond to > incoming connections. > > I propose 0003 for HEAD only, but the docs changes could be > back-patched. > > regards, tom lane > -- Regards Junwang Zhao
Junwang Zhao <zhjwpku@gmail.com> writes: > Just curious, *backlog* defines the maximum pending connections, > why do we need to double the MaxConnections as the queue size? The postmaster allows up to twice MaxConnections child processes to exist, per the comment in canAcceptConnections: * We allow more connections here than we can have backends because some * might still be authenticating; they might fail auth, or some existing * backend might exit before the auth cycle is completed. The exact * MaxBackends limit is enforced when a new backend tries to join the * shared-inval backend array. You can argue that 2X might not be the right multiplier, and you can argue that the optimal listen queue length might be more or less than the limit on number of child processes, but that's how we've historically done it. I'm not especially interested in changing that without somebody making a well-reasoned case for some other number. regards, tom lane
Ok, thanks for the clarification. On Tue, Aug 23, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Junwang Zhao <zhjwpku@gmail.com> writes: > > Just curious, *backlog* defines the maximum pending connections, > > why do we need to double the MaxConnections as the queue size? > > The postmaster allows up to twice MaxConnections child processes > to exist, per the comment in canAcceptConnections: > > * We allow more connections here than we can have backends because some > * might still be authenticating; they might fail auth, or some existing > * backend might exit before the auth cycle is completed. The exact > * MaxBackends limit is enforced when a new backend tries to join the > * shared-inval backend array. > > You can argue that 2X might not be the right multiplier, and you > can argue that the optimal listen queue length might be more or > less than the limit on number of child processes, but that's how > we've historically done it. I'm not especially interested in > changing that without somebody making a well-reasoned case for > some other number. > > regards, tom lane -- Regards Junwang Zhao
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Aug 23, 2022 at 4:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > + service the requests, with those clients receiving unhelpful > + connection failure errors such as <quote>Resource temporarily > + unavailable</quote>. > LGTM but I guess I would add "... or Connection refused"? Is that the spelling that appears on FreeBSD? Happy to add it. regards, tom lane
On Tue, Aug 23, 2022 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Tue, Aug 23, 2022 at 4:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > + service the requests, with those clients receiving unhelpful > > + connection failure errors such as <quote>Resource temporarily > > + unavailable</quote>. > > > LGTM but I guess I would add "... or Connection refused"? > > Is that the spelling that appears on FreeBSD? Happy to add it. Yep.
On Tue, Aug 23, 2022 at 2:42 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > 0002 isn't quite related, but while writing 0001 I noticed a nearby > > use of /proc/sys/... which I thought should be converted to sysctl. > > IMO /proc/sys pretty much sucks, at least for documentation purposes, > > for multiple reasons: Oh, one comment there is actually obsolete now AFAIK. Unless there is some reason to think personality(ADDR_NO_RANDOMIZE) might not work in some case where sysctl -w kernel.randomize_va_space=0 will, I think we can just remove that.
Вложения
Thomas Munro <thomas.munro@gmail.com> writes: > Oh, one comment there is actually obsolete now AFAIK. Unless there is > some reason to think personality(ADDR_NO_RANDOMIZE) might not work in > some case where sysctl -w kernel.randomize_va_space=0 will, I think we > can just remove that. AFAICS, f3e78069db7 silently does nothing on platforms lacking ADDR_NO_RANDOMIZE and PROC_ASLR_FORCE_DISABLE. Are you asserting there are no such platforms? (I'm happy to lose the comment if it's really useless now, but I think we have little evidence of that.) regards, tom lane
On Wed, Aug 24, 2022 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Oh, one comment there is actually obsolete now AFAIK. Unless there is > > some reason to think personality(ADDR_NO_RANDOMIZE) might not work in > > some case where sysctl -w kernel.randomize_va_space=0 will, I think we > > can just remove that. > > AFAICS, f3e78069db7 silently does nothing on platforms lacking > ADDR_NO_RANDOMIZE and PROC_ASLR_FORCE_DISABLE. Are you asserting > there are no such platforms? That's a Linux-only sysctl. ADDR_NO_RANDOMIZE is also Linux-only. Both controls are old enough to be in any kernel that anyone's developing on. On further reflection, though, I guess the comment is still useful. ADDR_NO_RANDOMIZE only helps you with clusters launched by pg_ctl and pg_regress. A developer trying to run "postgres" directly might still want to know about the sysctl, so I withdraw that idea. As for whether there are platforms where it does nothing: definitely. These are highly OS-specific, and we've only tackled Linux and FreeBSD (with other solutions for macOS and Windows elsewhere in the tree), but I doubt it matters: these are just the OSes that have ASLR on by default, that someone in our community uses as a daily driver to hack PostgreSQL on, that has been annoyed enough to look up how to turn it off :-)