BUG #16678: The ecpg connect/test5 test sometimes fails on Windows

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #16678: The ecpg connect/test5 test sometimes fails on Windows
Дата
Msg-id 16678-253e48d34dc0c376@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #16678: The ecpg connect/test5 test sometimes fails on Windows  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16678
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 13.0
Operating system:   Windows
Description:

When executing `call vcregress ecpgcheck` (under a heavy load), I get
connect/test5 failures sometimes:
test connect/test4                ... ok           67 ms
test connect/test5                ... stderr FAILED      945 ms
test pgtypeslib/dt_test           ... ok          271 ms

regression.diffs contains:
diff -w -U3 .../src/interfaces/ecpg/test/expected/connect-test5.stderr
.../src/interfaces/ecpg/test/results/connect-test5.stderr
--- .../src/interfaces/ecpg/test/expected/connect-test5.stderr    2020-10-13
21:51:14 +0300
+++ .../src/interfaces/ecpg/test/results/connect-test5.stderr    2020-10-19
19:31:41 +0300
@@ -36,7 +36,9 @@
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port
<DEFAULT>  for user regress_ecpg_user2
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ECPGconnect: could not open database: FATAL:  database
"regress_ecpg_user2" does not exist
+[NO_PID]: ECPGconnect: could not open database: server closed the
connection unexpectedly
+    This probably means the server terminated abnormally
+    before or while processing the request.
 
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_finish: connection main closed

postmaster.log contains:
...
2020-10-19 19:31:40.150 MSK client backend[928] ecpg/connect-test4 LOG:
could not receive data from client: An existing connection was forcibly
closed by the remote host.
    
2020-10-19 19:31:40.677 MSK client backend[2032] [unknown] FATAL:  database
"regress_ecpg_user2" does not exist
2020-10-19 19:31:41.052 MSK client backend[2532] [unknown] FATAL:  database
"regress_ecpg_user2" does not exist
2020-10-19 19:31:49.834 MSK client backend[2544] ecpg/preproc-whenever
ERROR:  relation "nonexistent" does not exist at character 15
...
So the server didn't crash, it logged the expected message correctly.

The following delay added to pqReadData (src/interfaces/libpq/fe-misc.c):
@@ -662,6 +662,8 @@ pqReadData(PGconn *conn)
                }
        }

+       pg_usleep(10000L); /* wait for 10 msec */
+
        /* OK, try to read some data */
 retry3:
        nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,

Makes the failure reproduce reliably (on each run) for me.
Also this affects the pg_dump test:
vcregress taptest src/bin/pg_dump
...
t/001_basic.pl ................ ok
t/002_pg_dump.pl .............. 1/6315
#   Failed test 'connecting to a non-existent database: matches'
#   at t/002_pg_dump.pl line 3460.
#                   'pg_dump: error: connection to database "qqq" failed:
server
 closed the connection unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.
# '
#     doesn't match '(?^:pg_dump\:\ error\:\ connection\ to\ database\
\"qqq\"\
failed\:\ FATAL\:\ \ database\ \"qqq\"\ does\ not\ exist)'
t/002_pg_dump.pl .............. 6113/6315 # Looks like you failed 1 test of
6315
.
t/002_pg_dump.pl .............. Dubious, test returned 1 (wstat 256,
0x100)
Failed 1/6315 subtests
t/003_pg_dump_with_server.pl .. ok
...

On Linux the same delay (and even 100 msec) doesn't cause these test
failures.
It seems that on Windows the postmaster process can close a connection
before slow client receives a FATAL message.


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Postgres 13 signal 11: Segmentation fault tested on 2 independent machines