Обсуждение: tests fail on windows with default git settings
Hi, Git on windows defaults to core.autocrlf being enabled. Which means that a normal git clone will convert all lineendings in text files. Unfortunately that causes a few tests to fail, at least: test_json_parser/001_test_json_parser_incremental test_json_parser/003_test_semantic pg_bsd_indent/001_pg_bsd_indent In the case of test_json_parser the problem is that test_json_parser_incremental.c assumes one can read statbuf.st_size bytes via fread() - but that doesn't work if the input has crlf inside. Due to the crlf conversion we reach EOF before we've read statbuf.st_size bytes, triggering an error. I suspect the issue with pg_bsd_indent is similar. Do we want to support checking out with core.autocrlf? I suspect it might just take using binary mode in a few more places. If we do not want to support that, ISTM we ought to raise an error somewhere? This kind of thing is pretty time consuming to track down, at least for the windows-noob writing this email. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Do we want to support checking out with core.autocrlf? -1. It would be a constant source of breakage, and you could never expect that (for example) making a tarball from such a checkout would match anyone else's results. > If we do not want to support that, ISTM we ought to raise an error somewhere? +1, if we can figure out how. regards, tom lane
Hi, On 2024-07-07 01:26:13 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Do we want to support checking out with core.autocrlf? > > -1. It would be a constant source of breakage, and you could never > expect that (for example) making a tarball from such a checkout > would match anyone else's results. WFM. > > If we do not want to support that, ISTM we ought to raise an error somewhere? > > +1, if we can figure out how. I can see two paths: 1) we prevent eol conversion, by using the right magic incantation in .gitattributes 2) we check that some canary file is correctly encoded, e.g. during meson configure (should suffice, this is realistically only a windows issue) It seems that the only realistic way to achieve 1) is to remove the "text" attribute from all files. That had me worried for a bit, thinking that might have a larger blast radius. However, it looks like this is solely used for line-ending conversion. The man page says: "This attribute marks the path as a text file, which enables end-of-line conversion:" Which sounds like it'd work well - except that it appears to behave oddly when updating to such a change in an existing repo - cd /tmp/; rm -rf pg-eol; git -c core.eol=crlf -c core.autocrlf=true clone ~/src/postgresql pg-eol; cd pg-eol; git config core.eol crlf; git config core.autocrlf true; stat src/test/modules/test_json_parser/tiny.json -> 6748 bytes cd ~/src/postgresql stat src/test/modules/test_json_parser/tiny.json -> 6604 bytes echo '* -text' >> .gitattributes git commit -a -m tmp cd /tmp/pg-eol git pull git st ... nothing to commit, working tree clean stat src/test/modules/test_json_parser/tiny.json -> 6748 bytes I.e. the repo still is in CRLF state. But if I reclone at that point, the line endings are in a sane state. IIUC this is because line-ending conversion is done only during checkout/checkin. There are ways to get git to redo the normalization, but it's somewhat awkward [1]. OTOH, given that the tests already fail, I assume our windows contributors already have disabled autocrlf? Greetings, Andres Freund [1] https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings#refreshing-a-repository-after-changing-line-endings
On 2024-07-07 Su 1:26 AM, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> Do we want to support checking out with core.autocrlf? > -1. It would be a constant source of breakage, and you could never > expect that (for example) making a tarball from such a checkout > would match anyone else's results. Yeah, totally agree. >> If we do not want to support that, ISTM we ought to raise an error somewhere? > +1, if we can figure out how. > > ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when opening the files, at least in the case if the test_json_parser tests. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi,
On 2024-07-07 01:26:13 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Do we want to support checking out with core.autocrlf?
>
> -1. It would be a constant source of breakage, and you could never
> expect that (for example) making a tarball from such a checkout
> would match anyone else's results.
WFM.
> > If we do not want to support that, ISTM we ought to raise an error somewhere?
>
> +1, if we can figure out how.
I can see two paths:
1) we prevent eol conversion, by using the right magic incantation in
.gitattributes
2) we check that some canary file is correctly encoded, e.g. during meson
configure (should suffice, this is realistically only a windows issue)
It seems that the only realistic way to achieve 1) is to remove the "text"
attribute from all files. That had me worried for a bit, thinking that might
have a larger blast radius. However, it looks like this is solely used for
line-ending conversion. The man page says:
"This attribute marks the path as a text file, which enables end-of-line conversion:"
Which sounds like it'd work well - except that it appears to behave oddly when
updating to such a change in an existing repo -
cd /tmp/;
rm -rf pg-eol;
git -c core.eol=crlf -c core.autocrlf=true clone ~/src/postgresql pg-eol;
cd pg-eol;
git config core.eol crlf; git config core.autocrlf true;
stat src/test/modules/test_json_parser/tiny.json -> 6748 bytes
cd ~/src/postgresql
stat src/test/modules/test_json_parser/tiny.json -> 6604 bytes
echo '* -text' >> .gitattributes
git commit -a -m tmp
cd /tmp/pg-eol
git pull
git st
...
nothing to commit, working tree clean
stat src/test/modules/test_json_parser/tiny.json -> 6748 bytes
I.e. the repo still is in CRLF state.
But if I reclone at that point, the line endings are in a sane state.
IIUC this is because line-ending conversion is done only during
checkout/checkin.
There are ways to get git to redo the normalization, but it's somewhat
awkward [1].
OTOH, given that the tests already fail, I assume our windows contributors
already have disabled autocrlf?
On 2024-07-07 06:30:57 -0400, Andrew Dunstan wrote: > > On 2024-07-07 Su 1:26 AM, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Do we want to support checking out with core.autocrlf? > > -1. It would be a constant source of breakage, and you could never > > expect that (for example) making a tarball from such a checkout > > would match anyone else's results. > Yeah, totally agree. > > > > > If we do not want to support that, ISTM we ought to raise an error somewhere? > > +1, if we can figure out how. > > > > > > > > ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when > opening the files, at least in the case if the test_json_parser tests. That does seem like it'd fix this issue, assuming the parser can cope with \r\n. I'm actually mildly surprised that the tests don't fail when *not* using autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to binary and thus we presumably end up with \r\n in the output? Except that that can't be true, because the test does pass on repos without autocrlf... That approach does seem to mildly conflict with Tom and your preference for fixing this by disallowing core.autocrlf? If we do so, the test never ought to see a crlf? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-07-07 06:30:57 -0400, Andrew Dunstan wrote: >> ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when >> opening the files, at least in the case if the test_json_parser tests. > That approach does seem to mildly conflict with Tom and your preference for > fixing this by disallowing core.autocrlf? If we do so, the test never ought to > see a crlf? Is this code that will *never* be applied to user-supplied files? We certainly should tolerate \r\n in the general case (we even have written-down project policy about that!). While I wouldn't complain too hard about assuming that our own test files don't contain \r\n, if the code might get copied into a non-test scenario then it could create problems later. regards, tom lane
On 2024-07-08 Mo 4:16 PM, Andres Freund wrote: > On 2024-07-07 06:30:57 -0400, Andrew Dunstan wrote: >> On 2024-07-07 Su 1:26 AM, Tom Lane wrote: >>> Andres Freund <andres@anarazel.de> writes: >>>> Do we want to support checking out with core.autocrlf? >>> -1. It would be a constant source of breakage, and you could never >>> expect that (for example) making a tarball from such a checkout >>> would match anyone else's results. >> Yeah, totally agree. >> >> >>>> If we do not want to support that, ISTM we ought to raise an error somewhere? >>> +1, if we can figure out how. >>> >>> >> >> >> ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when >> opening the files, at least in the case if the test_json_parser tests. > That does seem like it'd fix this issue, assuming the parser can cope with > \r\n. Yes, the parser can handle \r\n. Note that they can only be white space in JSON - they can only be present in string values via escapes. > > I'm actually mildly surprised that the tests don't fail when *not* using > autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to > binary and thus we presumably end up with \r\n in the output? Except that that > can't be true, because the test does pass on repos without autocrlf... > > > That approach does seem to mildly conflict with Tom and your preference for > fixing this by disallowing core.autocrlf? If we do so, the test never ought to > see a crlf? > IDK. I normally use core.autocrlf=false core.eol=lf on Windows. The editors I use are reasonably well behaved ;-) What I suggest (see attached) is we run the diff command with --strip-trailing-cr on Windows. Then we just won't care if the expected file and/or the output file has CRs. Not sure what the issue is with pg_bsd_indent, though. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Вложения
Hi, On 2024-07-08 16:56:10 -0400, Andrew Dunstan wrote: > On 2024-07-08 Mo 4:16 PM, Andres Freund wrote: > > I'm actually mildly surprised that the tests don't fail when *not* using > > autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to > > binary and thus we presumably end up with \r\n in the output? Except that that > > can't be true, because the test does pass on repos without autocrlf... > > > > > > That approach does seem to mildly conflict with Tom and your preference for > > fixing this by disallowing core.autocrlf? If we do so, the test never ought to > > see a crlf? > > > > IDK. I normally use core.autocrlf=false core.eol=lf on Windows. The editors > I use are reasonably well behaved ;-) :) > What I suggest (see attached) is we run the diff command with > --strip-trailing-cr on Windows. Then we just won't care if the expected file > and/or the output file has CRs. I was wondering about that too, but I wasn't sure we can rely on that flag being supported... > Not sure what the issue is with pg_bsd_indent, though. I think it's purely that we *read* with fopen("r") and write with fopen("wb"). Which means that any \r\n in the input will be converted to \n in the output. That's not a problem if the repo has been cloned without autocrlf, as there are no crlf in the expected files, but if autocrlf has been used, the expected files don't match. It doesn't look like it'd be trivial to make indent remember what was used in the input. So I think for now the best path is to just use .gitattributes to exclude the expected files from crlf conversion. If we don't want to do so repo wide, we can do so just for these files. Greetings, Andres Freund
What I suggest (see attached) is we run the diff command with --strip-trailing-cr on Windows. Then we just won't care if the expected file and/or the output file has CRs.I was wondering about that too, but I wasn't sure we can rely on that flag being supported...
Well, my suggestion was to use it only on Windows. I'm using the diffutils from chocolatey, which has it, as does Msys2 diff. Not sure what you have in the CI setup.
Not sure what the issue is with pg_bsd_indent, though.I think it's purely that we *read* with fopen("r") and write with fopen("wb"). Which means that any \r\n in the input will be converted to \n in the output. That's not a problem if the repo has been cloned without autocrlf, as there are no crlf in the expected files, but if autocrlf has been used, the expected files don't match. It doesn't look like it'd be trivial to make indent remember what was used in the input. So I think for now the best path is to just use .gitattributes to exclude the expected files from crlf conversion. If we don't want to do so repo wide, we can do so just for these files.
either that or we could use the --strip-trailing-cr gadget here too.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi,
On 2024-07-08 16:56:10 -0400, Andrew Dunstan wrote:
> On 2024-07-08 Mo 4:16 PM, Andres Freund wrote:
> > I'm actually mildly surprised that the tests don't fail when *not* using
> > autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to
> > binary and thus we presumably end up with \r\n in the output? Except that that
> > can't be true, because the test does pass on repos without autocrlf...
> >
> >
> > That approach does seem to mildly conflict with Tom and your preference for
> > fixing this by disallowing core.autocrlf? If we do so, the test never ought to
> > see a crlf?
> >
>
> IDK. I normally use core.autocrlf=false core.eol=lf on Windows. The editors
> I use are reasonably well behaved ;-)
:)
> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the expected file
> and/or the output file has CRs.
I was wondering about that too, but I wasn't sure we can rely on that flag
being supported...
> Not sure what the issue is with pg_bsd_indent, though.
44/298 postgresql:recovery / recovery/027_stream_regress ERROR 383.08s exit status 1
50/298 postgresql:recovery / recovery/035_standby_logical_decoding ERROR 138.06s exit status 25
68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync ERROR 132.87s exit status 25
170/298 postgresql:pg_dump / pg_dump/002_pg_dump ERROR 93.45s exit status 2
233/298 postgresql:bloom / bloom/001_wal ERROR 54.47s exit status 2
236/298 postgresql:subscription / subscription/001_rep_changes ERROR 46.46s exit status 2
246/298 postgresql:subscription / subscription/010_truncate ERROR 47.69s exit status 2
253/298 postgresql:subscription / subscription/013_partition ERROR 125.63s exit status 25
255/298 postgresql:subscription / subscription/022_twophase_cascade ERROR 58.13s exit status 2
257/298 postgresql:subscription / subscription/015_stream ERROR 128.32s exit status 2
262/298 postgresql:subscription / subscription/028_row_filter ERROR 43.14s exit status 2
263/298 postgresql:subscription / subscription/027_nosuperuser ERROR 102.02s exit status 2
269/298 postgresql:subscription / subscription/031_column_list ERROR 123.16s exit status 2
271/298 postgresql:subscription / subscription/032_subscribe_use_index ERROR 139.33s exit status 2
> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the expected file
> and/or the output file has CRs.
I was wondering about that too, but I wasn't sure we can rely on that flag
being supported...I have 4 different diff.exe's on my ~6 week old build VM (not counting shims), all of which seem to support --strip-trailing-cr. Those builds came with:- git- VC++- diffutils (installed by chocolatey)- vcpkgI think it's reasonable to assume it'll be supported.
Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent tests to use it on Windows, later today unless there's some objection.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2024-07-09 14:52:39 +0100, Dave Page wrote: > I have 4 different diff.exe's on my ~6 week old build VM (not counting > shims), all of which seem to support --strip-trailing-cr. Those builds came > with: > > - git > - VC++ > - diffutils (installed by chocolatey) > - vcpkg > > I think it's reasonable to assume it'll be supported. I think the more likely issue would be an older setup with an older diff, people on windows seem to not want to touch a working setup ever :). But we can deal with that if reports about it come in. > > > Not sure what the issue is with pg_bsd_indent, though. > > > > Yeah - that's odd, as that test always passes for me, with or without > autocrlf. Huh. > The other failures I see are the following, which I'm just starting to dig > into: > > 26/298 postgresql:recovery / recovery/019_replslot_limit > ERROR 43.05s exit status 2 > 44/298 postgresql:recovery / recovery/027_stream_regress > ERROR 383.08s exit status 1 > 50/298 postgresql:recovery / recovery/035_standby_logical_decoding > ERROR 138.06s exit status 25 > 68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync > ERROR 132.87s exit status 25 > 170/298 postgresql:pg_dump / pg_dump/002_pg_dump > ERROR 93.45s exit status 2 > 233/298 postgresql:bloom / bloom/001_wal > ERROR 54.47s exit status 2 > 236/298 postgresql:subscription / subscription/001_rep_changes > ERROR 46.46s exit status 2 > 246/298 postgresql:subscription / subscription/010_truncate > ERROR 47.69s exit status 2 > 253/298 postgresql:subscription / subscription/013_partition > ERROR 125.63s exit status 25 > 255/298 postgresql:subscription / subscription/022_twophase_cascade > ERROR 58.13s exit status 2 > 257/298 postgresql:subscription / subscription/015_stream > ERROR 128.32s exit status 2 > 262/298 postgresql:subscription / subscription/028_row_filter > ERROR 43.14s exit status 2 > 263/298 postgresql:subscription / subscription/027_nosuperuser > ERROR 102.02s exit status 2 > 269/298 postgresql:subscription / subscription/031_column_list > ERROR 123.16s exit status 2 > 271/298 postgresql:subscription / subscription/032_subscribe_use_index > ERROR 139.33s exit status 2 Hm, it'd be good to see some of errors behind that ([1]). I suspect it might be related to conflicting ports. I had to use PG_TEST_USE_UNIX_SOCKETS to avoid random tests from failing: # use unix socket to prevent port conflicts $env:PG_TEST_USE_UNIX_SOCKETS = 1; # otherwise pg_regress insists on creating the directory and does it # in a non-existing place, this needs to be fixed :( mkdir d:/sockets $env:PG_REGRESS_SOCK_DIR = "d:/sockets/" FWIW, building a tree with the patches I sent to the list last night and changes to make postgresql-dev.yml use a git checkout, I get: https://github.com/anarazel/winpgbuild/actions/runs/9852370209/job/27200784987#step:12:469 Ok: 281 Expected Fail: 0 Fail: 0 Unexpected Pass: 0 Skipped: 17 Timeout: 0 This is without readline and pltcl, as neither is currently built as part of winpgbuild. Otherwise it has all applicable dependencies enabled (no bonjour, bsd_auth, dtrace, llvm, pam, selinux, systemd, but that's afaict expected). Greetings, Andres Freund [1] I plan to submit a PR that'll collect the necessary information
Hi, On 2024-07-09 06:26:12 -0400, Andrew Dunstan wrote: > On 2024-07-08 Mo 5:44 PM, Andres Freund wrote: > > > What I suggest (see attached) is we run the diff command with > > > --strip-trailing-cr on Windows. Then we just won't care if the expected file > > > and/or the output file has CRs. > > I was wondering about that too, but I wasn't sure we can rely on that flag > > being supported... > > > > Well, my suggestion was to use it only on Windows. I'm using the diffutils > from chocolatey, which has it, as does Msys2 diff. Not sure what you have in > the CI setup. IIRC it's git's, which in turn is based on msys/mingw. Greetings, Andres Freund
Hi,
On 2024-07-09 14:52:39 +0100, Dave Page wrote:
> I have 4 different diff.exe's on my ~6 week old build VM (not counting
> shims), all of which seem to support --strip-trailing-cr. Those builds came
> with:
>
> - git
> - VC++
> - diffutils (installed by chocolatey)
> - vcpkg
>
> I think it's reasonable to assume it'll be supported.
I think the more likely issue would be an older setup with an older diff,
people on windows seem to not want to touch a working setup ever :). But we
can deal with that if reports about it come in.
> > > Not sure what the issue is with pg_bsd_indent, though.
> >
>
> Yeah - that's odd, as that test always passes for me, with or without
> autocrlf.
Huh.
> The other failures I see are the following, which I'm just starting to dig
> into:
>
> 26/298 postgresql:recovery / recovery/019_replslot_limit
> ERROR 43.05s exit status 2
> 44/298 postgresql:recovery / recovery/027_stream_regress
> ERROR 383.08s exit status 1
> 50/298 postgresql:recovery / recovery/035_standby_logical_decoding
> ERROR 138.06s exit status 25
> 68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync
> ERROR 132.87s exit status 25
> 170/298 postgresql:pg_dump / pg_dump/002_pg_dump
> ERROR 93.45s exit status 2
> 233/298 postgresql:bloom / bloom/001_wal
> ERROR 54.47s exit status 2
> 236/298 postgresql:subscription / subscription/001_rep_changes
> ERROR 46.46s exit status 2
> 246/298 postgresql:subscription / subscription/010_truncate
> ERROR 47.69s exit status 2
> 253/298 postgresql:subscription / subscription/013_partition
> ERROR 125.63s exit status 25
> 255/298 postgresql:subscription / subscription/022_twophase_cascade
> ERROR 58.13s exit status 2
> 257/298 postgresql:subscription / subscription/015_stream
> ERROR 128.32s exit status 2
> 262/298 postgresql:subscription / subscription/028_row_filter
> ERROR 43.14s exit status 2
> 263/298 postgresql:subscription / subscription/027_nosuperuser
> ERROR 102.02s exit status 2
> 269/298 postgresql:subscription / subscription/031_column_list
> ERROR 123.16s exit status 2
> 271/298 postgresql:subscription / subscription/032_subscribe_use_index
> ERROR 139.33s exit status 2
Hm, it'd be good to see some of errors behind that ([1]).
I suspect it might be related to conflicting ports. I had to use
PG_TEST_USE_UNIX_SOCKETS to avoid random tests from failing:
# use unix socket to prevent port conflicts
$env:PG_TEST_USE_UNIX_SOCKETS = 1;
# otherwise pg_regress insists on creating the directory and does it
# in a non-existing place, this needs to be fixed :(
mkdir d:/sockets
$env:PG_REGRESS_SOCK_DIR = "d:/sockets/"
FWIW, building a tree with the patches I sent to the list last night and
changes to make postgresql-dev.yml use a git checkout, I get:
https://github.com/anarazel/winpgbuild/actions/runs/9852370209/job/27200784987#step:12:469
Ok: 281
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 17
Timeout: 0
This is without readline and pltcl, as neither is currently built as part of
winpgbuild. Otherwise it has all applicable dependencies enabled (no bonjour,
bsd_auth, dtrace, llvm, pam, selinux, systemd, but that's afaict expected).
Greetings,
Andres Freund
[1] I plan to submit a PR that'll collect the necessary information
On Tue, 9 Jul 2024 at 17:32, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2024-07-09 14:52:39 +0100, Dave Page wrote:
> I have 4 different diff.exe's on my ~6 week old build VM (not counting
> shims), all of which seem to support --strip-trailing-cr. Those builds came
> with:
>
> - git
> - VC++
> - diffutils (installed by chocolatey)
> - vcpkg
>
> I think it's reasonable to assume it'll be supported.
I think the more likely issue would be an older setup with an older diff,
people on windows seem to not want to touch a working setup ever :). But we
can deal with that if reports about it come in.They've got to move to meson/ninja anyway, so... <shrug>.
> > > Not sure what the issue is with pg_bsd_indent, though.
> >
>
> Yeah - that's odd, as that test always passes for me, with or without
> autocrlf.
Huh.
> The other failures I see are the following, which I'm just starting to dig
> into:
>
> 26/298 postgresql:recovery / recovery/019_replslot_limit
> ERROR 43.05s exit status 2
> 44/298 postgresql:recovery / recovery/027_stream_regress
> ERROR 383.08s exit status 1
> 50/298 postgresql:recovery / recovery/035_standby_logical_decoding
> ERROR 138.06s exit status 25
> 68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync
> ERROR 132.87s exit status 25
> 170/298 postgresql:pg_dump / pg_dump/002_pg_dump
> ERROR 93.45s exit status 2
> 233/298 postgresql:bloom / bloom/001_wal
> ERROR 54.47s exit status 2
> 236/298 postgresql:subscription / subscription/001_rep_changes
> ERROR 46.46s exit status 2
> 246/298 postgresql:subscription / subscription/010_truncate
> ERROR 47.69s exit status 2
> 253/298 postgresql:subscription / subscription/013_partition
> ERROR 125.63s exit status 25
> 255/298 postgresql:subscription / subscription/022_twophase_cascade
> ERROR 58.13s exit status 2
> 257/298 postgresql:subscription / subscription/015_stream
> ERROR 128.32s exit status 2
> 262/298 postgresql:subscription / subscription/028_row_filter
> ERROR 43.14s exit status 2
> 263/298 postgresql:subscription / subscription/027_nosuperuser
> ERROR 102.02s exit status 2
> 269/298 postgresql:subscription / subscription/031_column_list
> ERROR 123.16s exit status 2
> 271/298 postgresql:subscription / subscription/032_subscribe_use_index
> ERROR 139.33s exit status 2
Hm, it'd be good to see some of errors behind that ([1]).
I suspect it might be related to conflicting ports. I had to use
PG_TEST_USE_UNIX_SOCKETS to avoid random tests from failing:
# use unix socket to prevent port conflicts
$env:PG_TEST_USE_UNIX_SOCKETS = 1;
# otherwise pg_regress insists on creating the directory and does it
# in a non-existing place, this needs to be fixed :(
mkdir d:/sockets
$env:PG_REGRESS_SOCK_DIR = "d:/sockets/"No, it all seems to be fallout from GSSAPI being included in the build. If I build without that, everything passes. Most of the tests are failing with a "too many clients already" error, but a handful do seem to include GSSAPI auth related errors as well. For example, this is from
connection error: 'psql: error: connection to server at "127.0.0.1", port 58059 failed: could not initiate GSSAPI security context: No credentials were supplied, or the credentials were unavailable or inaccessible: Credential cache is empty
connection to server at "127.0.0.1", port 58059 failed: FATAL: sorry, too many clients already'
while running 'psql -XAtq -d port=58059 host=127.0.0.1 dbname='postgres' -f - -v ON_ERROR_STOP=1' at C:/Users/dpage/git/postgresql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2129.
# Postmaster PID for node "publisher" is 14488
### Stopping node "publisher" using mode immediate
# Running: pg_ctl -D C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_publisher_data/pgdata -m immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "publisher"
# Postmaster PID for node "subscriber" is 15012
### Stopping node "subscriber" using mode immediate
# Running: pg_ctl -D C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_subscriber_data/pgdata -m immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "subscriber"
[14:46:59.068](1.346s) # Tests were run but no plan was declared and done_testing() was not seen.
[14:46:59.069](0.000s) # Looks like your test exited with 2 just after 11.
FWIW, building a tree with the patches I sent to the list last night and
changes to make postgresql-dev.yml use a git checkout, I get:
https://github.com/anarazel/winpgbuild/actions/runs/9852370209/job/27200784987#step:12:469
Ok: 281
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 17
Timeout: 0
This is without readline and pltcl, as neither is currently built as part of
winpgbuild. Otherwise it has all applicable dependencies enabled (no bonjour,
bsd_auth, dtrace, llvm, pam, selinux, systemd, but that's afaict expected).
Greetings,
Andres Freund
[1] I plan to submit a PR that'll collect the necessary information--Dave PagepgAdmin: https://www.pgadmin.org
On 2024-07-09 Tu 9:52 AM, Dave Page wrote:
> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the expected file
> and/or the output file has CRs.
I was wondering about that too, but I wasn't sure we can rely on that flag
being supported...I have 4 different diff.exe's on my ~6 week old build VM (not counting shims), all of which seem to support --strip-trailing-cr. Those builds came with:- git- VC++- diffutils (installed by chocolatey)- vcpkgI think it's reasonable to assume it'll be supported.
Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent tests to use it on Windows, later today unless there's some objection.
As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009.
Thoughts?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-07-09 Tu 11:34 AM, Andrew Dunstan wrote:
On 2024-07-09 Tu 9:52 AM, Dave Page wrote:
> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the expected file
> and/or the output file has CRs.
I was wondering about that too, but I wasn't sure we can rely on that flag
being supported...I have 4 different diff.exe's on my ~6 week old build VM (not counting shims), all of which seem to support --strip-trailing-cr. Those builds came with:- git- VC++- diffutils (installed by chocolatey)- vcpkgI think it's reasonable to assume it'll be supported.
Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent tests to use it on Windows, later today unless there's some objection.
As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009.
Dave Page <dpage@pgadmin.org> writes: > On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan <andrew@dunslane.net> wrote: >> As I was looking at this I wondered if there might be anywhere else that >> needed adjustment. One thing that occurred to me was that that maybe we >> should replace the use of "-w" in pg_regress.c with this rather less >> dangerous flag, so instead of ignoring any white space difference we would >> only ignore line end differences. The use of "-w" apparently dates back to >> 2009. > That seems like a good improvement to me. +1 regards, tom lane
On 2024-07-10 We 9:25 AM, Tom Lane wrote: > Dave Page <dpage@pgadmin.org> writes: >> On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan <andrew@dunslane.net> wrote: >>> As I was looking at this I wondered if there might be anywhere else that >>> needed adjustment. One thing that occurred to me was that that maybe we >>> should replace the use of "-w" in pg_regress.c with this rather less >>> dangerous flag, so instead of ignoring any white space difference we would >>> only ignore line end differences. The use of "-w" apparently dates back to >>> 2009. >> That seems like a good improvement to me. > +1 > > OK, done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 2024-07-10 We 9:25 AM, Tom Lane wrote: > > Dave Page <dpage@pgadmin.org> writes: > >> On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan <andrew@dunslane.net> wrote: > >>> As I was looking at this I wondered if there might be anywhere else that > >>> needed adjustment. One thing that occurred to me was that that maybe we > >>> should replace the use of "-w" in pg_regress.c with this rather less > >>> dangerous flag, so instead of ignoring any white space difference we would > >>> only ignore line end differences. The use of "-w" apparently dates back to > >>> 2009. > >> That seems like a good improvement to me. > > +1 > > > > > > > OK, done. It looks like Postgres CI did not like this change. 'Windows - Server 2019, VS 2019 - Meson & ninja' [1] task started to fail after this commit, there is one extra space at the end of line in regress test's output. [1] https://cirrus-ci.com/task/6753781205958656 -- Regards, Nazir Bilal Yavuz Microsoft
On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote: > Hi, > > On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan <andrew@dunslane.net> wrote: >> >> On 2024-07-10 We 9:25 AM, Tom Lane wrote: >>> Dave Page <dpage@pgadmin.org> writes: >>>> On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan <andrew@dunslane.net> wrote: >>>>> As I was looking at this I wondered if there might be anywhere else that >>>>> needed adjustment. One thing that occurred to me was that that maybe we >>>>> should replace the use of "-w" in pg_regress.c with this rather less >>>>> dangerous flag, so instead of ignoring any white space difference we would >>>>> only ignore line end differences. The use of "-w" apparently dates back to >>>>> 2009. >>>> That seems like a good improvement to me. >>> +1 >>> >>> >> >> OK, done. > It looks like Postgres CI did not like this change. 'Windows - Server > 2019, VS 2019 - Meson & ninja' [1] task started to fail after this > commit, there is one extra space at the end of line in regress test's > output. > > [1] https://cirrus-ci.com/task/6753781205958656 > Oh, that's annoying. Will investigate. Thanks for the heads up. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-07-11 Th 7:29 AM, Andrew Dunstan wrote: > > On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote: >> Hi, >> >> On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan <andrew@dunslane.net> >> wrote: >>> >>> On 2024-07-10 We 9:25 AM, Tom Lane wrote: >>>> Dave Page <dpage@pgadmin.org> writes: >>>>> On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan <andrew@dunslane.net> >>>>> wrote: >>>>>> As I was looking at this I wondered if there might be anywhere >>>>>> else that >>>>>> needed adjustment. One thing that occurred to me was that that >>>>>> maybe we >>>>>> should replace the use of "-w" in pg_regress.c with this rather less >>>>>> dangerous flag, so instead of ignoring any white space difference >>>>>> we would >>>>>> only ignore line end differences. The use of "-w" apparently >>>>>> dates back to >>>>>> 2009. >>>>> That seems like a good improvement to me. >>>> +1 >>>> >>>> >>> >>> OK, done. >> It looks like Postgres CI did not like this change. 'Windows - Server >> 2019, VS 2019 - Meson & ninja' [1] task started to fail after this >> commit, there is one extra space at the end of line in regress test's >> output. >> >> [1] https://cirrus-ci.com/task/6753781205958656 >> > > Oh, that's annoying. Will investigate. Thanks for the heads up. > > > I have reverted the pg_regress.c portion of the patch. I will investigate non line-end differences on Windows further. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
> The other failures I see are the following, which I'm just starting to dig
> into:
>
> 26/298 postgresql:recovery / recovery/019_replslot_limit
> ERROR 43.05s exit status 2
> 44/298 postgresql:recovery / recovery/027_stream_regress
> ERROR 383.08s exit status 1
> 50/298 postgresql:recovery / recovery/035_standby_logical_decoding
> ERROR 138.06s exit status 25
> 68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync
> ERROR 132.87s exit status 25
> 170/298 postgresql:pg_dump / pg_dump/002_pg_dump
> ERROR 93.45s exit status 2
> 233/298 postgresql:bloom / bloom/001_wal
> ERROR 54.47s exit status 2
> 236/298 postgresql:subscription / subscription/001_rep_changes
> ERROR 46.46s exit status 2
> 246/298 postgresql:subscription / subscription/010_truncate
> ERROR 47.69s exit status 2
> 253/298 postgresql:subscription / subscription/013_partition
> ERROR 125.63s exit status 25
> 255/298 postgresql:subscription / subscription/022_twophase_cascade
> ERROR 58.13s exit status 2
> 257/298 postgresql:subscription / subscription/015_stream
> ERROR 128.32s exit status 2
> 262/298 postgresql:subscription / subscription/028_row_filter
> ERROR 43.14s exit status 2
> 263/298 postgresql:subscription / subscription/027_nosuperuser
> ERROR 102.02s exit status 2
> 269/298 postgresql:subscription / subscription/031_column_list
> ERROR 123.16s exit status 2
> 271/298 postgresql:subscription / subscription/032_subscribe_use_index
> ERROR 139.33s exit status 2
Hm, it'd be good to see some of errors behind that ([1]).
I suspect it might be related to conflicting ports. I had to use
PG_TEST_USE_UNIX_SOCKETS to avoid random tests from failing:
# use unix socket to prevent port conflicts
$env:PG_TEST_USE_UNIX_SOCKETS = 1;
# otherwise pg_regress insists on creating the directory and does it
# in a non-existing place, this needs to be fixed :(
mkdir d:/sockets
$env:PG_REGRESS_SOCK_DIR = "d:/sockets/"No, it all seems to be fallout from GSSAPI being included in the build. If I build without that, everything passes. Most of the tests are failing with a "too many clients already" error, but a handful do seem to include GSSAPI auth related errors as well. For example, this is from... this is from subscription/001_rep_changes:[14:46:57.723](2.318s) ok 11 - check rows on subscriber after table drop from publication
connection error: 'psql: error: connection to server at "127.0.0.1", port 58059 failed: could not initiate GSSAPI security context: No credentials were supplied, or the credentials were unavailable or inaccessible: Credential cache is empty
connection to server at "127.0.0.1", port 58059 failed: FATAL: sorry, too many clients already'
while running 'psql -XAtq -d port=58059 host=127.0.0.1 dbname='postgres' -f - -v ON_ERROR_STOP=1' at C:/Users/dpage/git/postgresql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2129.
# Postmaster PID for node "publisher" is 14488
### Stopping node "publisher" using mode immediate
# Running: pg_ctl -D C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_publisher_data/pgdata -m immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "publisher"
# Postmaster PID for node "subscriber" is 15012
### Stopping node "subscriber" using mode immediate
# Running: pg_ctl -D C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_subscriber_data/pgdata -m immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "subscriber"
[14:46:59.068](1.346s) # Tests were run but no plan was declared and done_testing() was not seen.
[14:46:59.069](0.000s) # Looks like your test exited with 2 just after 11.
[15:28:42.693](0.001s) # Failed test 'connecting to a non-existent database: matches'
# at C:/Users/dpage/git/postgresql/src/bin/pg_dump/t/002_pg_dump.pl line 4689.
[15:28:42.694](0.001s) # 'pg_dump: error: connection to server at "127.0.0.1", port 53834 failed: could not initiate GSSAPI security context: No credentials were supplied, or the credentials were unavailable or inaccessible: Credential cache is empty
# connection to server at "127.0.0.1", port 53834 failed: FATAL: database "qqq" does not exist
# '
# doesn't match '(?^:pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist)'
# Running: pg_dump -d regression_invalid
On Fri, Jul 12, 2024 at 3:49 AM Dave Page <dpage@pgadmin.org> wrote: > So I received an off-list tip to checkout [1], a discussion around GSSAPI causing test failures on windows that AlexanderLakhin was looking at. Thomas Munro's v2 patch to try to address the issue brought me down to just a single testfailure with GSSAPI enabled on 17b2 (with a second, simple fix for the OpenSSL/Kerberos/x509 issue): pg_dump/002_pg_dump.The relevant section from the log looks like this: I pushed that (ba9fcac7). > [15:28:42.692](0.006s) not ok 2 - connecting to a non-existent database: matches > [15:28:42.693](0.001s) # Failed test 'connecting to a non-existent database: matches' > # at C:/Users/dpage/git/postgresql/src/bin/pg_dump/t/002_pg_dump.pl line 4689. > [15:28:42.694](0.001s) # 'pg_dump: error: connection to server at "127.0.0.1", port 53834 failed: couldnot initiate GSSAPI security context: No credentials were supplied, or the credentials were unavailable or inaccessible:Credential cache is empty > # connection to server at "127.0.0.1", port 53834 failed: FATAL: database "qqq" does not exist > # ' > # doesn't match '(?^:pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist)' Does it help if you revert 29992a6?
On Sun, Jul 14, 2024 at 10:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Jul 12, 2024 at 3:49 AM Dave Page <dpage@pgadmin.org> wrote: > > # doesn't match '(?^:pg_dump: error: connection to server .* failed: FATAL: database "qqq" does not exist)' > > Does it help if you revert 29992a6? FWIW I just happened to notice the same failure on Cirrus, in the github.com/postgres/postgres master branch: https://cirrus-ci.com/task/5382396705505280 Your failure mentions GSSAPI and the above doesn't, but that'd be because for Cirrus CI we have PG_TEST_USE_UNIX_SOCKETS so it's using AF_UNIX. At one point I proposed deleting that weird GSAPPI stuff and using AF_UNIX always on Windows[1], but the feedback was that I should instead teach the whole test suite to be able to use AF_UNIX or AF_INET* on all OSes and I never got back to it. The error does seem be the never-ending saga from this and other threads: https://www.postgresql.org/message-id/flat/90b34057-4176-7bb0-0dbb-9822a5f6425b%40greiz-reinsdorf.de My uninformed impression is that graceful socket shutdowns would very likely fix the class of lost-final-message problem where the client does recv() next, including this case IIUC. It's only a partial improvement though: if the client calls send() next, I think it can still drop buffered received data, so this graceful shutdown stuff doesn't quite get us to the same situation as Unix all points in the protocol. The real world case where that second case comes up is where the client sends a new query and on Unix gets a buffered error message saying the backend has exited due to idle timeout, but on Windows gets a connection reset message. I've wondered before if you could fix (or narrow to almost zero?) that by giving libpq a mode where it calls poll() to check for buffered readable data every single time it's about to send. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK30uLx9dpgkYwomgH0WVLUHytkChDgf3iUM2zp0pf_nA%40mail.gmail.com