Обсуждение: Sporadic connection-setup-related test failures on Cygwin in v15-
Hello hackers, A recent lorikeet (a Cygwin animal) failure [1] revealed one more long-standing (see also [2], [3], [4]) issue related to Cygwin: SELECT dblink_connect('dtest1', connection_parameters()); - dblink_connect ----------------- - OK -(1 row) - +ERROR: could not establish connection +DETAIL: could not connect to server: Connection refused where inst/logfile contains: 2024-07-16 05:38:21.492 EDT [66963f67.7823:4] LOG: could not accept new connection: Software caused connection abort 2024-07-16 05:38:21.492 EDT [66963f8c.79e5:170] pg_regress/dblink ERROR: could not establish connection 2024-07-16 05:38:21.492 EDT [66963f8c.79e5:171] pg_regress/dblink DETAIL: could not connect to server: Connection refused Is the server running locally and accepting connections on Unix domain socket "/home/andrew/bf/root/tmp/buildfarm-DK1yh4/.s.PGSQL.5838"? I made a standalone reproducing script (assuming the dblink extension installed): numclients=50 for ((i=1;i<=1000;i++)); do echo "iteration $i" for ((c=1;c<=numclients;c++)); do cat << 'EOF' | /usr/local/pgsql/bin/psql >/dev/null 2>&1 & SELECT 'dbname='|| current_database()||' port='||current_setting('port') AS connstr \gset SELECT * FROM dblink('service=no_service', 'SELECT 1') AS t(i int); SELECT * FROM dblink(:'connstr', 'SELECT 1') AS t1(i int), dblink(:'connstr', 'SELECT 2') AS t2(i int), dblink(:'connstr', 'SELECT 3') AS t3(i int), dblink(:'connstr', 'SELECT 4') AS t4(i int), dblink(:'connstr', 'SELECT 5') AS t5(i int); EOF done wait grep -A1 "Software caused connection abort" server.log && break; done which fails for me as below: iteration 318 2024-07-24 04:19:46.511 PDT [29062:6][postmaster][:0] LOG: could not accept new connection: Software caused connection abort 2024-07-24 04:19:46.512 PDT [25312:8][client backend][36/1996:0] ERROR: could not establish connection The important fact here is that this failure is not reproduced after 7389aad63 (in v16), so it seems that it's somehow related to signal processing. Given that, I'm inclined to stop here, without digging deeper, at least until there are plans to backport that fix or something... [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2024-07-16%2009%3A18%3A31 (REL_13_STABLE) [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2022-07-21%2000%3A36%3A44 (REL_14_STABLE) [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2023-07-06%2009%3A19%3A36 (REL_12_STABLE) [4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2022-02-12%2001%3A40%3A56 (REL_13_STABLE, postgres_fdw) Best regards, Alexander
On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > The important fact here is that this failure is not reproduced after > 7389aad63 (in v16), so it seems that it's somehow related to signal > processing. Given that, I'm inclined to stop here, without digging deeper, > at least until there are plans to backport that fix or something... +1. I'm not planning to back-patch that work. Perhaps lorikeet could stop testing releases < 16? They don't work and it's not our bug[1]. We decided not to drop Cygwin support[2], but I don't think we're learning anything from investigating that noise in the known-broken branches. [1] https://sourceware.org/legacy-ml/cygwin/2017-08/msg00048.html [2] https://www.postgresql.org/message-id/5e6797e9-bc26-ced7-6c9c-59bca415598b%40dunslane.net
24.07.2024 23:58, Thomas Munro wrote: > +1. I'm not planning to back-patch that work. Perhaps lorikeet > could stop testing releases < 16? They don't work and it's not our > bug[1]. We decided not to drop Cygwin support[2], but I don't think > we're learning anything from investigating that noise in the > known-broken branches. Yeah, it looks like lorikeet votes +[1] for your proposal. (I suppose it failed due to the same signal processing issue, just another way.) [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2024-07-24%2008%3A54%3A07 Best regards, Alexander
On 2024-07-24 We 4:58 PM, Thomas Munro wrote:
On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:The important fact here is that this failure is not reproduced after 7389aad63 (in v16), so it seems that it's somehow related to signal processing. Given that, I'm inclined to stop here, without digging deeper, at least until there are plans to backport that fix or something...+1. I'm not planning to back-patch that work. Perhaps lorikeet could stop testing releases < 16? They don't work and it's not our bug[1]. We decided not to drop Cygwin support[2], but I don't think we're learning anything from investigating that noise in the known-broken branches.
Sure, it can. I've made that change.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
25.07.2024 19:25, Andrew Dunstan wrote: >> +1. I'm not planning to back-patch that work. Perhaps lorikeet >> could stop testing releases < 16? They don't work and it's not our >> bug[1]. We decided not to drop Cygwin support[2], but I don't think >> we're learning anything from investigating that noise in the >> known-broken branches. > > > Sure, it can. I've made that change. > Thank you, Andrew! I've moved those issues to the "Fixed" category. Best regards, Alexander