Обсуждение: Postgres v15 windows bincheck regression test failures

Поиск
Список
Период
Сортировка

Postgres v15 windows bincheck regression test failures

От
Russell Foster
Дата:
Hi All:

I upgraded to postgres v15, and I am getting intermittent failures for
some of the bin regression tests when building on Windows 10. Example:

perl vcregress.pl bincheck

Installation complete.
t/001_initdb.pl .. ok
All tests successful.
Files=1, Tests=25, 12 wallclock secs ( 0.03 usr +  0.01 sys =  0.05 CPU)
Result: PASS
t/001_basic.pl ........... ok
t/002_nonesuch.pl ........ 1/?
#   Failed test 'checking a non-existent database stderr /(?^:FATAL:
database "qqq" does not exist)/'
#   at t/002_nonesuch.pl line 25.
#                   'pg_amcheck: error: connection to server at
"127.0.0.1", port 49393 failed: server closed the connection
unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.
# '
#     doesn't match '(?^:FATAL:  database "qqq" does not exist)'
t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100.
t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/100 subtests
t/003_check.pl ........... ok
t/004_verify_heapam.pl ... ok
t/005_opclass_damage.pl .. ok

Test Summary Report
-------------------
t/002_nonesuch.pl      (Wstat: 256 Tests: 100 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=5, Tests=196, 86 wallclock secs ( 0.11 usr +  0.08 sys =  0.19 CPU)
Result: FAIL
...

I see a similar failure on the build farm at:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07

I have also received the same error in the pg_dump test as the build
server above. Are these errors expected? Are they due to the fact that
windows tests use SSPI? It seems to work correctly if I recreate all
of the steps with an HBA that does not use SSPI.

thanks,
Russell



Re: Postgres v15 windows bincheck regression test failures

От
Andrew Dunstan
Дата:


On 2023-06-08 Th 13:41, Russell Foster wrote:
Hi All:

I upgraded to postgres v15, and I am getting intermittent failures for
some of the bin regression tests when building on Windows 10. Example:

perl vcregress.pl bincheck

Installation complete.
t/001_initdb.pl .. ok
All tests successful.
Files=1, Tests=25, 12 wallclock secs ( 0.03 usr +  0.01 sys =  0.05 CPU)
Result: PASS
t/001_basic.pl ........... ok
t/002_nonesuch.pl ........ 1/?
#   Failed test 'checking a non-existent database stderr /(?^:FATAL:
database "qqq" does not exist)/'
#   at t/002_nonesuch.pl line 25.
#                   'pg_amcheck: error: connection to server at
"127.0.0.1", port 49393 failed: server closed the connection
unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.
# '
#     doesn't match '(?^:FATAL:  database "qqq" does not exist)'
t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100.
t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/100 subtests
t/003_check.pl ........... ok
t/004_verify_heapam.pl ... ok
t/005_opclass_damage.pl .. ok

Test Summary Report
-------------------
t/002_nonesuch.pl      (Wstat: 256 Tests: 100 Failed: 1)  Failed test:  3  Non-zero exit status: 1
Files=5, Tests=196, 86 wallclock secs ( 0.11 usr +  0.08 sys =  0.19 CPU)
Result: FAIL
...

I see a similar failure on the build farm at:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07

I have also received the same error in the pg_dump test as the build
server above. Are these errors expected? Are they due to the fact that
windows tests use SSPI? It seems to work correctly if I recreate all
of the steps with an HBA that does not use SSPI.


In general you're better off using something like this


set PG_TEST_USE_UNIX_SOCKETS=1
set PG_REGRESS_SOCK_DIR=%LOCALAPPDATA%\Local\temp


That avoids several sorts of issues.


cheers

andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Postgres v15 windows bincheck regression test failures

От
Russell Foster
Дата:
On Thu, Jun 8, 2023 at 3:33 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 2023-06-08 Th 13:41, Russell Foster wrote:
>
> Hi All:
>
> I upgraded to postgres v15, and I am getting intermittent failures for
> some of the bin regression tests when building on Windows 10. Example:
>
> perl vcregress.pl bincheck
>
> Installation complete.
> t/001_initdb.pl .. ok
> All tests successful.
> Files=1, Tests=25, 12 wallclock secs ( 0.03 usr +  0.01 sys =  0.05 CPU)
> Result: PASS
> t/001_basic.pl ........... ok
> t/002_nonesuch.pl ........ 1/?
> #   Failed test 'checking a non-existent database stderr /(?^:FATAL:
> database "qqq" does not exist)/'
> #   at t/002_nonesuch.pl line 25.
> #                   'pg_amcheck: error: connection to server at
> "127.0.0.1", port 49393 failed: server closed the connection
> unexpectedly
> #       This probably means the server terminated abnormally
> #       before or while processing the request.
> # '
> #     doesn't match '(?^:FATAL:  database "qqq" does not exist)'
> t/002_nonesuch.pl ........ 97/? # Looks like you failed 1 test of 100.
> t/002_nonesuch.pl ........ Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/100 subtests
> t/003_check.pl ........... ok
> t/004_verify_heapam.pl ... ok
> t/005_opclass_damage.pl .. ok
>
> Test Summary Report
> -------------------
> t/002_nonesuch.pl      (Wstat: 256 Tests: 100 Failed: 1)
>   Failed test:  3
>   Non-zero exit status: 1
> Files=5, Tests=196, 86 wallclock secs ( 0.11 usr +  0.08 sys =  0.19 CPU)
> Result: FAIL
> ...
>
> I see a similar failure on the build farm at:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-06-03%2020%3A03%3A07
>
> I have also received the same error in the pg_dump test as the build
> server above. Are these errors expected? Are they due to the fact that
> windows tests use SSPI? It seems to work correctly if I recreate all
> of the steps with an HBA that does not use SSPI.
>
>
> In general you're better off using something like this
>
>
> set PG_TEST_USE_UNIX_SOCKETS=1
> set PG_REGRESS_SOCK_DIR=%LOCALAPPDATA%\Local\temp
>
>
> That avoids several sorts of issues.
>
>
> cheers
>
> andrew
>
Thanks for responding! This does indeed work, but again it is no
longer using SSPI, nor the sockets that are used in the runtime. Plus
there is this scary comment in code:

/*
* We don't use Unix-domain sockets on Windows by default, even if the
* build supports them.  (See comment at remove_temp() for a reason.)
* Override at your own risk.
*/

Is there some sort of race condition in the SSPI code that sometimes
doesn't gracefully finish/close the connection when the backend
decides to exit due to error?

>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com



Re: Postgres v15 windows bincheck regression test failures

От
Noah Misch
Дата:
On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote:
> /*
> * We don't use Unix-domain sockets on Windows by default, even if the
> * build supports them.  (See comment at remove_temp() for a reason.)
> * Override at your own risk.
> */
> 
> Is there some sort of race condition in the SSPI code that sometimes
> doesn't gracefully finish/close the connection when the backend
> decides to exit due to error?

No.  remove_temp() is part of test driver "pg_regress".  Non-test usage is
unaffected.  Even for test usage, folks have reported no failures from the
cause mentioned in the remove_temp() comment.



Re: Postgres v15 windows bincheck regression test failures

От
Alexander Lakhin
Дата:
Hello,

28.07.2023 05:17, Noah Misch wrote:
> On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote:
>> /*
>> * We don't use Unix-domain sockets on Windows by default, even if the
>> * build supports them.  (See comment at remove_temp() for a reason.)
>> * Override at your own risk.
>> */
>>
>> Is there some sort of race condition in the SSPI code that sometimes
>> doesn't gracefully finish/close the connection when the backend
>> decides to exit due to error?
> No.  remove_temp() is part of test driver "pg_regress".  Non-test usage is
> unaffected.  Even for test usage, folks have reported no failures from the
> cause mentioned in the remove_temp() comment.

It seems to me that it's just another manifestation of bug #16678 ([1]).
See also commits 6051857fc and 29992a6a5.

[1] https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org

Best regards,
Alexander



Re: Postgres v15 windows bincheck regression test failures

От
Noah Misch
Дата:
On Fri, Jul 28, 2023 at 07:00:01AM +0300, Alexander Lakhin wrote:
> 28.07.2023 05:17, Noah Misch wrote:
> >On Tue, Jun 20, 2023 at 07:49:52AM -0400, Russell Foster wrote:
> >>/*
> >>* We don't use Unix-domain sockets on Windows by default, even if the
> >>* build supports them.  (See comment at remove_temp() for a reason.)
> >>* Override at your own risk.
> >>*/
> >>
> >>Is there some sort of race condition in the SSPI code that sometimes
> >>doesn't gracefully finish/close the connection when the backend
> >>decides to exit due to error?
> >No.  remove_temp() is part of test driver "pg_regress".  Non-test usage is
> >unaffected.  Even for test usage, folks have reported no failures from the
> >cause mentioned in the remove_temp() comment.
> 
> It seems to me that it's just another manifestation of bug #16678 ([1]).
> See also commits 6051857fc and 29992a6a5.
> 
> [1] https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org

That was about a bug that appears when using TCP sockets.  The remove_temp()
comment involves code that doesn't run when using TCP sockets.  I don't think
they can be manifestations of the same phenomenon.



Re: Postgres v15 windows bincheck regression test failures

От
Alexander Lakhin
Дата:
28.07.2023 14:42, Noah Misch wrpte:
> That was about a bug that appears when using TCP sockets. ...

Yes, and according to the failed test output, TCP sockets were used:

#                   'pg_amcheck: error: connection to server at
"127.0.0.1", port 49393 failed: server closed the connection
unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.

Best regards,
Alexander



Re: Postgres v15 windows bincheck regression test failures

От
Noah Misch
Дата:
On Fri, Jul 28, 2023 at 04:00:00PM +0300, Alexander Lakhin wrote:
> 28.07.2023 14:42, Noah Misch wrpte:
> >That was about a bug that appears when using TCP sockets. ...
> 
> Yes, and according to the failed test output, TCP sockets were used:
> 
> #                   'pg_amcheck: error: connection to server at
> "127.0.0.1", port 49393 failed: server closed the connection
> unexpectedly
> #       This probably means the server terminated abnormally
> #       before or while processing the request.

I think we were talking about different details.  Agreed, bug #16678 probably
did cause the failure in the original post.  I was saying that bug has no
connection to the "scary comment", though.