ssl tests fail due to TCP port conflict

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема ssl tests fail due to TCP port conflict
Дата
Msg-id d6ee8761-39d1-0033-1afb-d5a57ee056f2@gmail.com
обсуждение исходный текст
Ответы Re: ssl tests fail due to TCP port conflict
Список pgsql-hackers
Hello hackers,

As buildfarm shows, ssl tests are not stable enough when running via meson.
I can see the following failures for the last 90 days:
REL_16_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-12%2023%3A15%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-03-21%2000%3A35%3A23
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-03-27%2011%3A15%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-04-16%2016%3A10%3A45

master:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-08%2011%3A19%3A42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-03-11%2022%3A23%3A28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-03-17%2023%3A03%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-20%2009%3A21%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-03-20%2016%3A53%3A27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-04-07%2012%3A25%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-04-08%2019%3A50%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-04-19%2021%3A24%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-04-22%2006%3A17%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-04-29%2023%3A27%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-30%2000%3A24%3A28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-06-04%2011%3A20%3A07

All the failures are caused by the server inability to restart with a
previously chosen random port on TCP/IP. For example:
2024-06-04 11:30:40.227 UTC [3373644][postmaster][:0] LOG:  starting PostgreSQL 17beta1 on x86_64-linux, compiled by 
clang-13.0.1-11, 64-bit
2024-06-04 11:30:40.231 UTC [3373644][postmaster][:0] LOG: listening on Unix socket "/tmp/tUmT8ItNQ2/.s.PGSQL.60362"
2024-06-04 11:30:40.337 UTC [3373798][startup][:0] LOG:  database system was shut down at 2024-06-04 11:21:25 UTC
...
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] LOG:  starting PostgreSQL 17beta1 on x86_64-linux, compiled by 
clang-13.0.1-11, 64-bit
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] LOG:  could not bind IPv4 address "127.0.0.1": Address already in
use
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] HINT:  Is another postmaster already running on port 60362? If 
not, wait a few seconds and retry.
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] WARNING: could not create listen socket for "127.0.0.1"

I've managed to reproduce the failure locally with the following change:
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -149,7 +149,7 @@ INIT
         $ENV{PGDATABASE} = 'postgres';

         # Tracking of last port value assigned to accelerate free port lookup.
-       $last_port_assigned = int(rand() * 16384) + 49152;
+       $last_port_assigned = int(rand() * 1024) + 49152;

and multiplying one of the tests.
for i in `seq 50`; do cp .../src/test/ssl/t/001_ssltests.pl \
   .../src/test/ssl/t/001_ssltests_$i.pl; \
   sed -E "s|('t/001_ssltests.pl',)|\1\n't/001_ssltests_$i.pl',|" -i \
     .../src/test/ssl/meson.build; done

Then `meson test --suite ssl` fails for me as below:
...
26/53 postgresql:ssl / ssl/001_ssltests_26         OK 9.03s   205 subtests passed
27/53 postgresql:ssl / ssl/001_ssltests_18         ERROR 3.55s   (exit status 255 or signal 127 SIGinvalid)
 >>> OPENSSL=/usr/bin/openssl ...
28/53 postgresql:ssl / ssl/001_ssltests_25         OK 10.98s   205 subtests passed
29/53 postgresql:ssl / ssl/001_ssltests_24         OK 10.84s   205 subtests passed

testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log contains:
2024-06-05 11:51:22.005 UTC [710541] LOG:  could not bind IPv4 address "127.0.0.1": Address already in use
2024-06-05 11:51:22.005 UTC [710541] HINT:  Is another postmaster already running on port 49632? If not, wait a few 
seconds and retry.

`grep '\b49632\b' -r testrun/` finds:
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:# Checking port 49632
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:# Found port 49632
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:Connection string: port=49632 host=/tmp/sp_VLbpjJF
testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log:2024-06-05 11:51:18.896 UTC [710082] LOG:  listening on
Unix
 
socket "/tmp/sp_VLbpjJF/.s.PGSQL.49632"
testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log:2024-06-05 11:51:22.005 UTC [710541] HINT:  Is another 
postmaster already running on port 49632? If not, wait a few seconds and retry.
...
testrun/ssl/001_ssltests_23/log/regress_log_001_ssltests_23:# Checking port 49632
testrun/ssl/001_ssltests_23/log/regress_log_001_ssltests_23:# Found port 49632
testrun/ssl/001_ssltests_23/log/regress_log_001_ssltests_23:Connection string: port=49632 host=/tmp/3lxVDNzuGC
testrun/ssl/001_ssltests_23/log/001_ssltests_23_primary.log:2024-06-05 11:51:13.337 UTC [708377] LOG:  listening on
Unix
 
socket "/tmp/3lxVDNzuGC/.s.PGSQL.49632"
testrun/ssl/001_ssltests_23/log/001_ssltests_23_primary.log:2024-06-05 11:51:14.333 UTC [708715] LOG:  listening on
IPv4
 
address "127.0.0.1", port 49632
...

Another case (with psql using the port):
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# Checking port 49448
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# Found port 49448
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 12:20:50.178 UTC [976826] LOG:  listening on
Unix
 
socket "/tmp/GePu6gmouP/.s.PGSQL.49448"
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 12:20:50.491 UTC [976927] HINT:  Is another 
postmaster already running on port 49448? If not, wait a few seconds and retry.
...
testrun/ssl/001_ssltests_48/log/001_ssltests_48_primary.log:2024-06-05 12:20:50.491 UTC [976943] [unknown] LOG:  
connection received: host=localhost port=49448
The broader excerpt:
2024-06-05 12:20:50.415 UTC [976918] [unknown] LOG:  connection received: host=localhost port=50326
2024-06-05 12:20:50.418 UTC [976918] [unknown] LOG:  could not accept SSL connection: EOF detected
2024-06-05 12:20:50.433 UTC [976920] [unknown] LOG:  connection received: host=localhost port=49420
2024-06-05 12:20:50.435 UTC [976920] [unknown] LOG:  could not accept SSL connection: EOF detected
2024-06-05 12:20:50.447 UTC [976922] [unknown] LOG:  connection received: host=localhost port=49430
2024-06-05 12:20:50.452 UTC [976922] [unknown] LOG:  could not accept SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.466 UTC [976933] [unknown] LOG:  connection received: host=localhost port=49440
2024-06-05 12:20:50.472 UTC [976933] [unknown] LOG:  could not accept SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.491 UTC [976943] [unknown] LOG:  connection received: host=localhost port=49448
2024-06-05 12:20:50.497 UTC [976943] [unknown] LOG:  could not accept SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.513 UTC [976969] [unknown] LOG:  connection received: host=localhost port=49464
2024-06-05 12:20:50.517 UTC [976969] [unknown] LOG:  could not accept SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.532 UTC [976971] [unknown] LOG:  connection received: host=localhost port=49468

Maybe the ssl tests should not be considered failed in case of the TCP port
binding error?

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Conflict Detection and Resolution