Re: Refactoring postmaster's code to cleanup after child exit
От | Heikki Linnakangas |
---|---|
Тема | Re: Refactoring postmaster's code to cleanup after child exit |
Дата | |
Msg-id | 217d43af-0287-4769-a825-cde4cfa00e6c@iki.fi обсуждение исходный текст |
Ответ на | Re: Refactoring postmaster's code to cleanup after child exit (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Refactoring postmaster's code to cleanup after child exit
|
Список | pgsql-hackers |
On 05/10/2024 01:03, Thomas Munro wrote: > On Sat, Oct 5, 2024 at 7:41 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> My test for dead-end backends opens 20 TCP (or unix domain) connections >> to the server, in quick succession. That works fine my system, and it >> passed cirrus CI on other platforms, but on FreeBSD it failed >> repeatedly. The behavior in that scenario is apparently >> platform-dependent: it depends on the accept queue size, but what >> happens when you reach the queue size also seems to depend on the >> platform. On my Linux system, the connect() calls in the client are >> blocked, if the server is doesn't call accept() fast enough, but >> apparently you get an error on *BSD systems. > > Right, we've analysed that difference in AF_UNIX implementation > before[1], which shows up in the real world, where client sockets ie > libpq's are usually non-blocking, as EAGAIN on Linux (which is not > valid per POSIX) vs ECONNREFUSED on other OSes. All fail to connect, > but the error message is different. Thanks for the pointer! > For blocking AF_UNIX client sockets like in your test, Linux > effectively has an infinite queue made from two layers. The listen > queue (a queue of connecting sockets) does respect the requested > backlog size, but when it's full it has an extra trick: the connect() > call waits (in a queue of threads) for space to become free in the > listen queue, so it's effectively unlimited (but only for blocking > sockets), while FreeBSD and I suspect any other implementation > deriving from or reimplementing the BSD socket code gives you > ECONNREFUSED. macOS behaves just the same as FreeBSD AFAICT, so I > don't know why you didn't see the same thing... I guess it's just > racing against accept() draining the queue. In fact I misremembered: the failure happened on macOS, *not* FreeBSD. It could be just luck I didn't see it on FreeBSD though. > It's possible that Windows copied the Linux behaviour for AF_UNIX, > given that it probably has something to do with the WSL project for > emulating Linux, but IDK. Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or at least on this perl distribution we're using in Cirrus CI): Socket::pack_sockaddr_un not implemented on this architecture at C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872. so I'll have to disable this test on Windows anyway. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: