Обсуждение: Postgres v15 windows bincheck regression test failures
Hi All: I upgraded to postgres v15, and I am getting intermittent failures for some of the bin regression tests when building on Windows 10. Example: perl vcregress.pl bincheck Installation complete. t/001_initdb.pl .. ok All tests successful. Files=1, Tests=25, 12 wallclock secs ( 0.03 usr + 0.01 sys = 0.05 CPU) Result: PASS t/001_basic.pl ........... ok t/002_nonesuch.pl ........ 1/? # Failed test 'checking a non-existent database stderr /(?^:FATAL: database "qqq" does not exist)/' # at t/002_nonesuch.pl line 25. # 'pg_amcheck: error: connection to server at "127.0.0.1", port 49393 failed: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # ' # doesn't match '(?^:FATAL: database "qqq" does not exist)' t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100. t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100) Failed 1/100 subtests t/003_check.pl ........... ok t/004_verify_heapam.pl ... ok t/005_opclass_damage.pl .. ok Test Summary Report ------------------- t/002_nonesuch.pl (Wstat: 256 Tests: 100 Failed: 1) Failed test: 3 Non-zero exit status: 1 Files=5, Tests=196, 86 wallclock secs ( 0.11 usr + 0.08 sys = 0.19 CPU) Result: FAIL ... I see a similar failure on the build farm at: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07 I have also received the same error in the pg_dump test as the build server above. Are these errors expected? Are they due to the fact that windows tests use SSPI? It seems to work correctly if I recreate all of the steps with an HBA that does not use SSPI. thanks, Russell
On 2023-06-08 Th 13:41, Russell Foster wrote:
Hi All: I upgraded to postgres v15, and I am getting intermittent failures for some of the bin regression tests when building on Windows 10. Example: perl vcregress.pl bincheck Installation complete. t/001_initdb.pl .. ok All tests successful. Files=1, Tests=25, 12 wallclock secs ( 0.03 usr + 0.01 sys = 0.05 CPU) Result: PASS t/001_basic.pl ........... ok t/002_nonesuch.pl ........ 1/? # Failed test 'checking a non-existent database stderr /(?^:FATAL: database "qqq" does not exist)/' # at t/002_nonesuch.pl line 25. # 'pg_amcheck: error: connection to server at "127.0.0.1", port 49393 failed: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # ' # doesn't match '(?^:FATAL: database "qqq" does not exist)' t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100. t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100) Failed 1/100 subtests t/003_check.pl ........... ok t/004_verify_heapam.pl ... ok t/005_opclass_damage.pl .. ok Test Summary Report ------------------- t/002_nonesuch.pl (Wstat: 256 Tests: 100 Failed: 1) Failed test: 3 Non-zero exit status: 1 Files=5, Tests=196, 86 wallclock secs ( 0.11 usr + 0.08 sys = 0.19 CPU) Result: FAIL ... I see a similar failure on the build farm at: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07 I have also received the same error in the pg_dump test as the build server above. Are these errors expected? Are they due to the fact that windows tests use SSPI? It seems to work correctly if I recreate all of the steps with an HBA that does not use SSPI.
In general you're better off using something like this
set PG_TEST_USE_UNIX_SOCKETS=1 set PG_REGRESS_SOCK_DIR=%LOCALAPPDATA%\Local\temp That avoids several sorts of issues. cheers andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Jun 8, 2023 at 3:33 PM Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 2023-06-08 Th 13:41, Russell Foster wrote: > > Hi All: > > I upgraded to postgres v15, and I am getting intermittent failures for > some of the bin regression tests when building on Windows 10. Example: > > perl vcregress.pl bincheck > > Installation complete. > t/001_initdb.pl .. ok > All tests successful. > Files=1, Tests=25, 12 wallclock secs ( 0.03 usr + 0.01 sys = 0.05 CPU) > Result: PASS > t/001_basic.pl ........... ok > t/002_nonesuch.pl ........ 1/? > # Failed test 'checking a non-existent database stderr /(?^:FATAL: > database "qqq" does not exist)/' > # at t/002_nonesuch.pl line 25. > # 'pg_amcheck: error: connection to server at > "127.0.0.1", port 49393 failed: server closed the connection > unexpectedly > # This probably means the server terminated abnormally > # before or while processing the request. > # ' > # doesn't match '(?^:FATAL: database "qqq" does not exist)' > t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100. > t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100) > Failed 1/100 subtests > t/003_check.pl ........... ok > t/004_verify_heapam.pl ... ok > t/005_opclass_damage.pl .. ok > > Test Summary Report > ------------------- > t/002_nonesuch.pl (Wstat: 256 Tests: 100 Failed: 1) > Failed test: 3 > Non-zero exit status: 1 > Files=5, Tests=196, 86 wallclock secs ( 0.11 usr + 0.08 sys = 0.19 CPU) > Result: FAIL > ... > > I see a similar failure on the build farm at: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07 > > I have also received the same error in the pg_dump test as the build > server above. Are these errors expected? Are they due to the fact that > windows tests use SSPI? It seems to work correctly if I recreate all > of the steps with an HBA that does not use SSPI. > > > In general you're better off using something like this > > > set PG_TEST_USE_UNIX_SOCKETS=1 > set PG_REGRESS_SOCK_DIR=%LOCALAPPDATA%\Local\temp > > > That avoids several sorts of issues. > > > cheers > > andrew > Thanks for responding! This does indeed work, but again it is no longer using SSPI, nor the sockets that are used in the runtime. Plus there is this scary comment in code: /* * We don't use Unix-domain sockets on Windows by default, even if the * build supports them. (See comment at remove_temp() for a reason.) * Override at your own risk. */ Is there some sort of race condition in the SSPI code that sometimes doesn't gracefully finish/close the connection when the backend decides to exit due to error? > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com
On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote: > /* > * We don't use Unix-domain sockets on Windows by default, even if the > * build supports them. (See comment at remove_temp() for a reason.) > * Override at your own risk. > */ > > Is there some sort of race condition in the SSPI code that sometimes > doesn't gracefully finish/close the connection when the backend > decides to exit due to error? No. remove_temp() is part of test driver "pg_regress". Non-test usage is unaffected. Even for test usage, folks have reported no failures from the cause mentioned in the remove_temp() comment.
Hello, 28.07.2023 05:17, Noah Misch wrote: > On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote: >> /* >> * We don't use Unix-domain sockets on Windows by default, even if the >> * build supports them. (See comment at remove_temp() for a reason.) >> * Override at your own risk. >> */ >> >> Is there some sort of race condition in the SSPI code that sometimes >> doesn't gracefully finish/close the connection when the backend >> decides to exit due to error? > No. remove_temp() is part of test driver "pg_regress". Non-test usage is > unaffected. Even for test usage, folks have reported no failures from the > cause mentioned in the remove_temp() comment. It seems to me that it's just another manifestation of bug #16678 ([1]). See also commits 6051857fc and 29992a6a5. [1] https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org Best regards, Alexander
On Fri, Jul 28, 2023 at 07:00:01AM +0300, Alexander Lakhin wrote: > 28.07.2023 05:17, Noah Misch wrote: > >On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote: > >>/* > >>* We don't use Unix-domain sockets on Windows by default, even if the > >>* build supports them. (See comment at remove_temp() for a reason.) > >>* Override at your own risk. > >>*/ > >> > >>Is there some sort of race condition in the SSPI code that sometimes > >>doesn't gracefully finish/close the connection when the backend > >>decides to exit due to error? > >No. remove_temp() is part of test driver "pg_regress". Non-test usage is > >unaffected. Even for test usage, folks have reported no failures from the > >cause mentioned in the remove_temp() comment. > > It seems to me that it's just another manifestation of bug #16678 ([1]). > See also commits 6051857fc and 29992a6a5. > > [1] https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org That was about a bug that appears when using TCP sockets. The remove_temp() comment involves code that doesn't run when using TCP sockets. I don't think they can be manifestations of the same phenomenon.
28.07.2023 14:42, Noah Misch wrpte: > That was about a bug that appears when using TCP sockets. ... Yes, and according to the failed test output, TCP sockets were used: # 'pg_amcheck: error: connection to server at "127.0.0.1", port 49393 failed: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. Best regards, Alexander
On Fri, Jul 28, 2023 at 04:00:00PM +0300, Alexander Lakhin wrote: > 28.07.2023 14:42, Noah Misch wrpte: > >That was about a bug that appears when using TCP sockets. ... > > Yes, and according to the failed test output, TCP sockets were used: > > # 'pg_amcheck: error: connection to server at > "127.0.0.1", port 49393 failed: server closed the connection > unexpectedly > # This probably means the server terminated abnormally > # before or while processing the request. I think we were talking about different details. Agreed, bug #16678 probably did cause the failure in the original post. I was saying that bug has no connection to the "scary comment", though.