Обсуждение: Adding CI to our tree

Поиск
Список
Период
Сортировка

Adding CI to our tree

От
Andres Freund
Дата:
Hi,

For several development efforts I found it to be incredibly valuable to push
changes to a personal repository and see a while later whether tests succeed
on a number of different platforms.  This is especially useful for platforms
that are quite different from ones own platform, like e.g. windows in my case.

Of course everybody can set this up for themselves. However, doing so well is
a significant effort, particularly if windows is to be supported well. And
doubly so if useful things like getting backtraces for crashes is desirable
([1])

We do a form of pre-commit CI via cfbot. That is valuable. But it's not really
comparable to having CI in tree - one need to post to the list and one cannot
adjust the dependencies etc installed for the CI runs.


New contributors (and quite a bit of older ones too) IMO expect to be able to
see whether their changes work as-is, without sending a patch to the list.


An obvious criticism of the effort to put CI runner infrastructure into core
is that they are effectively all proprietary technology, and that we should be
hesistant to depend too much on one of them. I think that's a valid
concern. However, once one CI integration is done, a good chunk (but not all!)
the work is transferrable to another CI solution, which I do think reduces the
dependency sufficiently.


The attached patch adds CI using cirrus-ci. The reason for choosing cirrus
were that
a) Thomas has ended up using cirrus for cfbot
b) cirrus provides a comparatively wide variety of operating systems
c) it allows custom VM images to be used.
d) it does not require a login to look at

c) is very valuable to be able to test e.g. upcoming linux versions,
pre-installing software on systems that do not support docker (freebsd), and
being faster to boot once the image is more than a trivial size.  I've created
a number of images for testing of the aio patchset [2]


Right now the patch attached
- runs check-world on FreeBSD, Linux, macOS - all using gcc
  - freebsd, linux use a custom generated image
  - macOS installs missing dependencies at runtime, with some caching
  - all use ccache to make subsequent compilation faster
- runs all the tests I could find on windows, via vcregress.pl
- checks for compiler warnings on linux, with both clang and gcc

- captures all logs after a failing run
- generates backtraces from core files (the output format differs between platforms)
- allows to limit CI to certain OSs, by adding
  ci-os-only: (freebsd|linux|macos|windows)+ to the commit message
  (useful when fixing a platform dependent problem)

Example output of a
- successful run: https://cirrus-ci.com/build/4625606928236544
- github interface for the same: https://github.com/anarazel/postgres/runs/3772435617
- failed run on windows, with backtrace: https://cirrus-ci.com/task/6640561307254784?logs=cat_dumps#L150

Comments? Rotten tomatoes?


There's some polishing we should do before actually adding this to the
tree. But I wanted to discuss the idea before investing even more time.

One policy discussion that we'd have to have is who should control the images
used for CI. Right now that's on my personal google cloud account - which I am
happy to do, but medium - long term that'd not be optimal.


Thanks to Thomas - I based this on hist .cirrus.yml file. And made several
contributions later. Also thanks to Andrew, who helped out with some windows
issues I hit at some point...

Greetings,

Andres Freund

[1] I did get this to work on windows, but it does require a small set of
changes to work reliably, I'll start a separate thread about it.
[2] https://github.com/anarazel/pg-vm-images/

Вложения

Re: Adding CI to our tree

От
Peter Geoghegan
Дата:
On Fri, Oct 1, 2021 at 3:28 PM Andres Freund <andres@anarazel.de> wrote:
> For several development efforts I found it to be incredibly valuable to push
> changes to a personal repository and see a while later whether tests succeed
> on a number of different platforms.  This is especially useful for platforms
> that are quite different from ones own platform, like e.g. windows in my case.

> An obvious criticism of the effort to put CI runner infrastructure into core
> is that they are effectively all proprietary technology, and that we should be
> hesistant to depend too much on one of them. I think that's a valid
> concern. However, once one CI integration is done, a good chunk (but not all!)
> the work is transferrable to another CI solution, which I do think reduces the
> dependency sufficiently.

I agree with everything you've said, including the nuanced parts about
the possible downsides.

We already know what happens when one of these CI providers stops
providing open source projects with free resources, because that's
exactly what happened with Travis CI: projects that use their
infrastructure are mildly inconvenienced. I can't see any real notable
downside, as long as we just use the resources that they make
available for development work. Clearly these services could never
replace the buildfarm.

-- 
Peter Geoghegan



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 01:49:39 +0200, 0010203112132233 wrote:
> On Sat, 2 Oct 2021 at 00:28, Andres Freund <andres@anarazel.de> wrote:
> > New contributors (and quite a bit of older ones too) IMO expect to be able to
> > see whether their changes work as-is, without sending a patch to the list.
> 
> Have they checked 'make installcheck-world'? I believe that is one of
> the first action items on the 'So, you want to become a developer?'
> wiki page, after downloading the sources. Of course, that is limited
> to only the environment of the user, but that's what they're generally
> developing for, right?

If you want to get a change into postgres, it almost always needs to actually
work on all operating systems, and always needs to at least not cause build
failures on all platforms.



> Furthermore, after looking it through, I think that Cirrus is an
> unfortunate choice as a CI platform of preference, as you cannot use
> it without access to Github (which is problematic for people located
> in certain localities due to USA regulations).

I agree that it's not optimal that cirrus isn't available on all git hosting
platforms. Hence saying that I think it's likely we'd end up adding a few more
platforms over time.  If we factor the meat of the work into an helper script,
so that the CI specific bit is just a couple invocation of that script, it's
not a lot of overhead to have 2-3 CI platforms.


> If we're going to include CI configuration for private use, I'd prefer if it
> were a CI that can be enjoyed in private without pushing code to a 3rd
> party.

FWIW, you can use cirrus locally on your machine:
https://github.com/cirruslabs/cirrus-cli

It'll not be able to run all kinds of tasks though (e.g. no windows docker on
a linux host, dealing with the license costs for that presumably would be
nontrivial).


> Lastly, I consider CI configuration similar to IDE configuration: each
> developer has their own preferred tools which they use, but we don't
> favour one over the other. We don't include IDE-specific configuration
> files either, or at least, the policy is against that.
> 
> So, I greatly appreciate the effort, but I don't think this is
> something that should be committed into core. Maybe as a dedicated
> wiki page detailing configurations for CI, similar to the Buildfarm
> page?

That doesn't scale - I've actually added CI to all my substantial development
work in my private branches, and it's pretty annoying to need to do so every
time. And there's many developers who won't go through the effort most of the
time.

It's not like this forces you to use cirrus or anything. For people that don't
want to use CI, It'll make cfbot a bit more effective (because people can
adjust what it tests as appropriate for $patch), but that's it.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Sat, Oct 2, 2021 at 1:10 PM Andres Freund <andres@anarazel.de> wrote:
> On 2021-10-02 01:49:39 +0200, 0010203112132233 wrote:
> > Furthermore, after looking it through, I think that Cirrus is an
> > unfortunate choice as a CI platform of preference, as you cannot use
> > it without access to Github (which is problematic for people located
> > in certain localities due to USA regulations).
>
> I agree that it's not optimal that cirrus isn't available on all git hosting
> platforms. Hence saying that I think it's likely we'd end up adding a few more
> platforms over time.  If we factor the meat of the work into an helper script,
> so that the CI specific bit is just a couple invocation of that script, it's
> not a lot of overhead to have 2-3 CI platforms.

BTW I think they might be considering supporting other code hosting
platforms (at least they ask for feedback on this at
https://cirrus-ci.org/guide/quick-start/ ).

> > Lastly, I consider CI configuration similar to IDE configuration: each
> > developer has their own preferred tools which they use, but we don't
> > favour one over the other. We don't include IDE-specific configuration
> > files either, or at least, the policy is against that.

We have some files in the tree to help users of Emacs, vim, and even
make github format text the way we like.

Personally, I think that if someone is willing to develop and maintain
high quality CI control files that work for any public
free-for-open-source CI system, then we should accept them too.  It
costs very little to have a few .something.yml files at top level.  If
at any point the file for a given provider is showing signs of being
unmaintained, we can remove it.  Personally, I'm willing and able to
help maintain Cirrus control files, not least because it means that
cfbot will become simpler and will match exactly what you can get in
your own github account.

I really like Cirrus because our project has more portability concerns
than most, and most other CIs are like "we got both kinds, country and
western!".  I wanted to add FreeBSD to cfbot, which is something they
advertise as a feature, but it looks like at least 3 other OSes we
target would probably work just as well given a suitable image.



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> It's not like this forces you to use cirrus or anything. For people that don't
> want to use CI, It'll make cfbot a bit more effective (because people can
> adjust what it tests as appropriate for $patch), but that's it.

Yeah.  I cannot see any reason to object to Andres' 0002 patch: you can
just ignore those files if you don't want to use cirrus.  It does set a
precedent that we'd also accept infrastructure for other CI systems,
but as long as they're similarly noninvasive, why not?  (Maybe there
needs to be one more directory level though, ie ci/cirrus/whatever.
I don't want to end up with one toplevel directory per CI platform.)

I don't know enough about Windows to evaluate 0001, but I'm a little
worried about it because it looks like it's changing our *production*
error handling on that platform.

As for 0003, wasn't that committed already?

            regards, tom lane



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 2 Oct 2021, at 00:27, Andres Freund <andres@anarazel.de> wrote:

> For several development efforts I found it to be incredibly valuable to push
> changes to a personal repository and see a while later whether tests succeed
> on a number of different platforms.  This is especially useful for platforms
> that are quite different from ones own platform, like e.g. windows in my case.

Same, and for my case I run several CI jobs to compile/test against different
OpenSSL versions etc.

> Of course everybody can set this up for themselves. However, doing so well is
> a significant effort, particularly if windows is to be supported well. And
> doubly so if useful things like getting backtraces for crashes is desirable
> ([1])

+1 on adding these, rather than having everyone duplicate the effort.  Those
who don't want to use them can disregard them.

> Right now the patch attached
> - runs check-world on FreeBSD, Linux, macOS - all using gcc
>  - freebsd, linux use a custom generated image
>  - macOS installs missing dependencies at runtime, with some caching
>  - all use ccache to make subsequent compilation faster
> - runs all the tests I could find on windows, via vcregress.pl
> - checks for compiler warnings on linux, with both clang and gcc

Why not compiling with OpenSSL on FreeBSD and macOS?  On FreeBSD all you need
is --with-ssl=openssl while on macOS you need to point to the headers and libs
like:

  --with-includes=/usr/local/include:/usr/local/opt/openssl/include
--with-libs=/usr/local/libs:/usr/local/opt/openssl/lib

One thing to note for Cirrus on macOS (I've never seen it anywhere else) is
that it intermittently will fail on a too long socketpath:

  Unix-domain socket path
"/private/var/folders/wh/z5_y2cv53sg24tzvtw_f_y1m0000gn/T/cirrus-ci-build/src/bin/pg_upgrade/.s.PGSQL.51696"is too long
(maximum103 bytes) 

Exporting PGSOCKETDIR can avoid that annoyance.

+  tests_script:
+    - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8'
Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run?

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote:
> > On 2 Oct 2021, at 00:27, Andres Freund <andres@anarazel.de> wrote:
> > Right now the patch attached
> > - runs check-world on FreeBSD, Linux, macOS - all using gcc
> >  - freebsd, linux use a custom generated image
> >  - macOS installs missing dependencies at runtime, with some caching
> >  - all use ccache to make subsequent compilation faster
> > - runs all the tests I could find on windows, via vcregress.pl
> > - checks for compiler warnings on linux, with both clang and gcc
> 
> Why not compiling with OpenSSL on FreeBSD and macOS?  On FreeBSD all you need
> is --with-ssl=openssl while on macOS you need to point to the headers and libs
> like:
>
>   --with-includes=/usr/local/include:/usr/local/opt/openssl/include
--with-libs=/usr/local/libs:/usr/local/opt/openssl/lib

Yea, there's several things like that, that should be added. The CI files
originated from development where breakage around SSL wasn't likely (AIO,
shared memory stats, procarray scalability etc), so I didn't focussed on that
angle.

Needing to get all that stuff right on multiple platforms is one of the
reasons why I think having this thing in-tree would be good. No need for
everyone to discover the magic incantations themselves. Even if you e.g. might
want to extend them to test multiple SSL versions or such, it's a lot easier
to do that if the basics are there.


> One thing to note for Cirrus on macOS (I've never seen it anywhere else) is
> that it intermittently will fail on a too long socketpath:

I've seen it somewhere else before. It wasn't even intermittent - it always
failed. I worked around that by setting CIRRUS_WORKING_DIR: ${HOME}/pgsql/ -
also made output including filenames easier to read ;)


> +  tests_script:
> +    - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8'
> Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run?

Probably. I quickly added that stuff, we'll see how many mistakes I made:
https://cirrus-ci.com/build/5846034501861376

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 12:41:07 -0700, Andres Freund wrote:
> On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote:
> > +  tests_script:
> > +    - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8'
> > Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run?
> 
> Probably. I quickly added that stuff, we'll see how many mistakes I made:
> https://cirrus-ci.com/build/5846034501861376

I wonder if we shouldn't stop skipping the ssl / kerberos / ldap (and perhaps
others) tests in the makefile, and instead do so in the tap tests
themselves. Then one can see them included as the skipped in the tap result
output, which seems like it'd make it easier to discover them?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 2 Oct 2021, at 21:45, Andres Freund <andres@anarazel.de> wrote:

> I wonder if we shouldn't stop skipping the ssl / kerberos / ldap (and perhaps
> others) tests in the makefile, and instead do so in the tap tests
> themselves. Then one can see them included as the skipped in the tap result
> output, which seems like it'd make it easier to discover them?

I am definitely in favor of doing that, better to see them skipped rather than
having to remember to opt in.  We even do so already to some extent already,
like for example the SSL tests:

    if ($ENV{with_ssl} ne 'openssl')
    {
        plan skip_all => 'OpenSSL not supported by this build';
    }

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 2 Oct 2021, at 21:41, Andres Freund <andres@anarazel.de> wrote:

>> One thing to note for Cirrus on macOS (I've never seen it anywhere else) is
>> that it intermittently will fail on a too long socketpath:
>
> I've seen it somewhere else before. It wasn't even intermittent - it always
> failed. I worked around that by setting CIRRUS_WORKING_DIR: ${HOME}/pgsql/ -
> also made output including filenames easier to read ;)

Aha, nice trick!  Didn't know about that one but that's easier than setting
specific dirs via PG* environment vars.

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 11:05:20 -0400, Tom Lane wrote:
> Yeah.  I cannot see any reason to object to Andres' 0002 patch: you can
> just ignore those files if you don't want to use cirrus.  It does set a
> precedent that we'd also accept infrastructure for other CI systems,
> but as long as they're similarly noninvasive, why not?

Exactly.


> (Maybe there needs to be one more directory level though, ie
> ci/cirrus/whatever.  I don't want to end up with one toplevel directory per
> CI platform.)

Good question - it definitely shouldn't be one toplevel directory per CI
platform (although some will require their own hidden toplevel directories,
like .github/workflows etc). I'd hope to share a bunch of the infrastructure
between them over time, so perhaps we don't need a deeper hierarchy.


> I don't know enough about Windows to evaluate 0001, but I'm a little
> worried about it because it looks like it's changing our *production*
> error handling on that platform.

Yea. It's clearly not ready as-is - it's the piece that I was planning to
write a separate email about.


It's hard to understand what *precisely* SEM_NOGPFAULTERRORBOX etc do.

What I do know is that without the _set_abort_behavior() stuff abort() doesn't
trigger windows' "crash" paths in at least debugging builds, and that the
SetErrorMode() and _CrtSetReportMode() changes are necessary to get segfaults
to reach the crash paths.

The in-tree behaviour turns out to make debugging on windows a major pain, at
least when compiling with msvc. Crashes never trigger core dumps or "just in
time" debugging (their term for invoking a debugger upon crash), so one has to
attach to processes before they crash, to have any chance of debugging.

As far as I can tell this also means that at least for debugging builds,
pgwin32_install_crashdump_handler() is pretty much dead weight -
crashDumpHandler() never gets invoked. I think it may get invoked for abort()s
in production builds, but probably not for segfaults.

And despite SEM_NOGPFAULTERRORBOX we display those annoying "popup" boxes
telling us about the crash and giving the option to retry, ignore, something
something.   It's all a bit baffling.



> As for 0003, wasn't that committed already?

Not at the time I was writing the email, but now it is, yes.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote:
> Same, and for my case I run several CI jobs to compile/test against different
> OpenSSL versions etc.

On that note: Did you do this for windows? If so, I'd rather not figure that
out myself...

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 2 Oct 2021, at 22:01, Andres Freund <andres@anarazel.de> wrote:
> On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote:
>> Same, and for my case I run several CI jobs to compile/test against different
>> OpenSSL versions etc.
>
> On that note: Did you do this for windows? If so, I'd rather not figure that
> out myself...

Not with Cirrus, I've been using Appveyor for Windows and they provide 1.0.2 -
3.0.0 which can easily set in config.pl with for example:

    $config->{openssl} = 'C:\OpenSSL-v111-Win64';

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 10/2/21 11:05 AM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> It's not like this forces you to use cirrus or anything. For people that don't
>> want to use CI, It'll make cfbot a bit more effective (because people can
>> adjust what it tests as appropriate for $patch), but that's it.
> Yeah.  I cannot see any reason to object to Andres' 0002 patch: you can
> just ignore those files if you don't want to use cirrus.  



Yeah. I enable cirrus selectively on my github repos, which makes it 
close to impossible to get an unwanted effect.


One of the things I like about this is that it institutionalizes some
knowledge that has hitherto been mostly private. I have a lot of this in
a setup I use for spinning up test instances, but this makes a lot of
that sort of knowledge more broadly available.


I hope it will also encourage people to test more widely, given how easy
it will make it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I hope it will also encourage people to test more widely, given how easy
> it will make it.

If you'd like that, there would need to be some (ahem) documentation
of how to use it.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 16:44:44 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > I hope it will also encourage people to test more widely, given how easy
> > it will make it.
>
> If you'd like that, there would need to be some (ahem) documentation
> of how to use it.

Yea, definitely necessary. Where would we want it to be? ci/README.md? That'd
be viewable on the various git hosting platforms. I guess there's an argument
for it to be in the sgml docs, but that doesn't seem all that useful in this
case.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-10-02 16:44:44 -0400, Tom Lane wrote:
>> If you'd like that, there would need to be some (ahem) documentation
>> of how to use it.

> Yea, definitely necessary. Where would we want it to be? ci/README.md? That'd
> be viewable on the various git hosting platforms. I guess there's an argument
> for it to be in the sgml docs, but that doesn't seem all that useful in this
> case.

A README seems plenty good enough to me.  Maybe -0.1 for making
it .md rather than plain text ... plain text is our habit everywhere
else AFAIR.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On October 2, 2021 1:18:38 PM PDT, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 2 Oct 2021, at 22:01, Andres Freund <andres@anarazel.de> wrote:
>> On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote:
>>> Same, and for my case I run several CI jobs to compile/test against different
>>> OpenSSL versions etc.
>>
>> On that note: Did you do this for windows? If so, I'd rather not figure that
>> out myself...
>
>Not with Cirrus, I've been using Appveyor for Windows and they provide 1.0.2 -
>3.0.0 which can easily set in config.pl with for example:
>
>    $config->{openssl} = 'C:\OpenSSL-v111-Win64';

Got the build part working (although the state of msvc compatible openssl distribution on windows seems a bit scary).
Howeverthe ssl tests don't fully succeed: 

https://cirrus-ci.com/task/6264790323560448?logs=ssl#L655

 I didn't see code in the bf client code running the test so perhaps that's not too surprising :/

Did you run those tests on windows?

Regards,

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



ssl tests fail on windows / slurp_file() offset doesn't work on win

От
Andres Freund
Дата:
Hi,

On 2021-10-02 21:05:17 -0700, Andres Freund wrote:
> Got the build part working (although the state of msvc compatible openssl
> distribution on windows seems a bit scary). However the ssl tests don't
> fully succeed:
> 
> https://cirrus-ci.com/task/6264790323560448?logs=ssl#L655
> 
>  I didn't see code in the bf client code running the test so perhaps that's
>  not too surprising :/
> 
> Did you run those tests on windows?

As you can see in the test output, every mismatch prints the whole file,
despite only intending to show the tail. Which appears to be because the
windows portion of 3c5b0685b921 doesn't actually work.  The reason for that in
turn is that afaict the setFilePointer doesn't change the file position in a
way that affects perl.

Consequently, if I force the !win32 path, the tests pass.

At first I assumed the cause of this is that while the setFilePointer() modifies the
state of the underlying handle, it doesn't actually let perl know about
that. Due to buffering etc perl likely has its own bookeeping about the
position in the file. There's some pretty clear hints in
https://perldoc.perl.org/functions/seek

But the problem turns out to be that it's bogus to pass $fh to
setFilePointer(). That's a perl handle, not an win32 handle. Fixing that seems
to make the tests pass.


Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At
this point it's a perl filehandle, so we should just use perl seek?


Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a
single comment explaining why TestLib.pm is trying to use native windows
APIs.

Isn't the code as-is also "leaking" an open IO::Handle? There's a
CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some
perl magic cleaning things up? Even if so, loks like just closing $fh will
close the handle as well...

Greetings,

Andres Freund



Re: ssl tests fail on windows / slurp_file() offset doesn't work on win

От
Andres Freund
Дата:
Hi,

On 2021-10-03 10:18:31 -0700, Andres Freund wrote:
> As you can see in the test output, every mismatch prints the whole file,
> despite only intending to show the tail. Which appears to be because the
> windows portion of 3c5b0685b921 doesn't actually work.  The reason for that in
> turn is that afaict the setFilePointer doesn't change the file position in a
> way that affects perl.
> 
> Consequently, if I force the !win32 path, the tests pass.
> 
> At first I assumed the cause of this is that while the setFilePointer() modifies the
> state of the underlying handle, it doesn't actually let perl know about
> that. Due to buffering etc perl likely has its own bookeeping about the
> position in the file. There's some pretty clear hints in
> https://perldoc.perl.org/functions/seek
> 
> But the problem turns out to be that it's bogus to pass $fh to
> setFilePointer(). That's a perl handle, not an win32 handle. Fixing that seems
> to make the tests pass.

It does (I only let it run to the ssl test, then pushed a newer revision):
https://cirrus-ci.com/task/5345293928497152?logs=ssl#L5


> Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At
> this point it's a perl filehandle, so we should just use perl seek?
> 
> 
> Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a
> single comment explaining why TestLib.pm is trying to use native windows
> APIs.
> 
> Isn't the code as-is also "leaking" an open IO::Handle? There's a
> CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some
> perl magic cleaning things up? Even if so, loks like just closing $fh will
> close the handle as well...

I think something roughly like the attached might be a good idea. Runs locally
on linux, and hopefully still on windows

https://cirrus-ci.com/build/4857291573821440

Greetings,

Andres Freund

Вложения

Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 3 Oct 2021, at 06:05, Andres Freund <andres@anarazel.de> wrote:

> Did you run those tests on windows?

Sorry, failed to mention I only compile it for now, I hadn't reached trying to
run the tests yet.  I see you started on that in this thread, so thank you for
that!

--
Daniel Gustafsson        https://vmware.com/




Re: ssl tests fail on windows / slurp_file() offset doesn't work on win

От
Andrew Dunstan
Дата:
On 10/3/21 1:30 PM, Andres Freund wrote:
>
>> Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At
>> this point it's a perl filehandle, so we should just use perl seek?
>>
>>
>> Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a
>> single comment explaining why TestLib.pm is trying to use native windows
>> APIs.
>>
>> Isn't the code as-is also "leaking" an open IO::Handle? There's a
>> CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some
>> perl magic cleaning things up? Even if so, loks like just closing $fh will
>> close the handle as well...
> I think something roughly like the attached might be a good idea. Runs locally
> on linux, and hopefully still on windows
>
> https://cirrus-ci.com/build/4857291573821440
>

Looks sane, thanks.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests fail on windows / slurp_file() offset doesn't work on win

От
Andres Freund
Дата:
On 2021-10-04 11:07:07 -0400, Andrew Dunstan wrote:
> Looks sane, thanks.

Thanks for looking. Pushed to all branches.



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Sat, Oct 2, 2021 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> - runs check-world on FreeBSD, Linux, macOS - all using gcc

Small correction: on macOS and FreeBSD it's using the vendor compiler,
which is some kind of clang.

BTW, on those two OSes there are some messages like this each time a
submake dumps its output to the log:

[03:36:16.591] fcntl(): Bad file descriptor

It seems worth putting up with these compared to the alternatives of
either not using -j, not using -Otarget and having the output of
parallel tests all mashed up and unreadable (that still happen
sometimes but it's unlikely, because the submakes write() whole output
chunks at infrequent intervals), or redirecting to a file so you can't
see the realtime test output on the main CI page (not so fun, you have
to wait until it's finished and view it as an 'artifact').  I tried to
write a patch for GNU make to fix that[1], let's see if something
happens.

[1] https://savannah.gnu.org/bugs/?52922



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-06 17:01:53 +1300, Thomas Munro wrote:
> On Sat, Oct 2, 2021 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> > - runs check-world on FreeBSD, Linux, macOS - all using gcc
>
> Small correction: on macOS and FreeBSD it's using the vendor compiler,
> which is some kind of clang.

Oh, oops. I guess that's even better ;).


> I tried to write a patch for GNU make to fix that[1], let's see if something
> happens.
>
> [1] https://savannah.gnu.org/bugs/?52922

It'd be nice to get rid of these... They're definitely confusing.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Peter Eisentraut
Дата:
On 02.10.21 00:27, Andres Freund wrote:
> The attached patch adds CI using cirrus-ci.

I like this in principle.  But I don't understand what the docker stuff 
is about.  I have used Cirrus CI before, and didn't have to do anything 
about Docker.  This could use some explanation.



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-10 21:48:09 +0200, Peter Eisentraut wrote:
> On 02.10.21 00:27, Andres Freund wrote:
> > The attached patch adds CI using cirrus-ci.
> 
> I like this in principle.  But I don't understand what the docker stuff is
> about.  I have used Cirrus CI before, and didn't have to do anything about
> Docker.  This could use some explanation.

You don't *have* to do anything about docker - but especially for windows it
takes longer to build without your own container, because we'd need to install
our dependencies every time. And that turns out to take a while.

Right now the docker containers are built as part of CI (cirrus rebuilds them
when the container definition changes), but that doesn't have to be that way,
we could do so independently of cirrus, so that they are usable on other
platforms as well - although it's advantageous to use the cirrus containers as
the base, as they're cached on the buildhosts.


In principle we could also use docker for the linux tests, but I found that we
can get better results using full blown virtual machines. Those I currently
build from a separate repo, as mentioned upthread.


There is a linux docker container, but that currently runs a separate task
that compiles with -Werror for gcc, clang with / without asserts. That's a
separate task so that compile warnings don't prevent one from seeing whether
tests worked etc.

One thing I was thinking of adding to the "compile warning" task was to
cross-compile postgres from linux using mingw - that's a lot faster than
running the windows builds, and it's not too hard to break that accidentally.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Matthias van de Meent
Дата:
On Sat, 2 Oct 2021 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > It's not like this forces you to use cirrus or anything. For people that don't
> > want to use CI, It'll make cfbot a bit more effective (because people can
> > adjust what it tests as appropriate for $patch), but that's it.

I don't disagree on that part, but I fail to see what makes the
situations of an unused CI config file in the tree and an unused
`/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct
enough for it to be resolved differently. Both are quality-of-life
additions for those that use that tool, while non-users of that tool
can ignore those configuration entries.

> Yeah.  I cannot see any reason to object to Andres' 0002 patch: you can
> just ignore those files if you don't want to use cirrus.  It does set a
> precedent that we'd also accept infrastructure for other CI systems,
> but as long as they're similarly noninvasive, why not?  (Maybe there
> needs to be one more directory level though, ie ci/cirrus/whatever.
> I don't want to end up with one toplevel directory per CI platform.)

With the provided arguments I won't object to the addition of these
config files, but I would appreciate it if a clear policy could be
provided on the inclusion of configurations for external tools that
are not expected to be used by all users of the repository, such as
CI, editors and IDEs.

Kind regards,

Matthias van de Meent

[0]
https://www.postgresql.org/message-id/flat/OS3PR01MB71593D78DD857C2BBA9FB824F2A69%40OS3PR01MB7159.jpnprd01.prod.outlook.com
[1] https://www.postgresql.org/message-id/flat/15BFD11D-5D72-46B2-BDB1-2DF4E049C13D%40me.com



Re: Adding CI to our tree

От
Tom Lane
Дата:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> I don't disagree on that part, but I fail to see what makes the
> situations of an unused CI config file in the tree and an unused
> `/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct
> enough for it to be resolved differently. Both are quality-of-life
> additions for those that use that tool, while non-users of that tool
> can ignore those configuration entries.

Um ... I don't see a connection at all.  One is talking about files
we put into the git tree, and one is talking about files that are
*not* in the tree.

We do have a policy that files that are created by a supported build
process should be .gitignore'd, so that might lead to more .gitignore
entries as this idea moves ahead.  I'm not on board though with the
idea of .gitignore'ing anything that anybody anywhere thinks is junk.
That's more likely to lead to conflicts and mistakes than anything
useful.  We expect developers to have personal excludesfile lists
that block off editor backup files and other cruft from the tools
that they personally use but are not required by the build.

            regards, tom lane



Re: Adding CI to our tree

От
Andreas Karlsson
Дата:
On 10/21/21 5:55 PM, Matthias van de Meent wrote:
> On Sat, 2 Oct 2021 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Andres Freund <andres@anarazel.de> writes:
>>> It's not like this forces you to use cirrus or anything. For people that don't
>>> want to use CI, It'll make cfbot a bit more effective (because people can
>>> adjust what it tests as appropriate for $patch), but that's it.
> 
> I don't disagree on that part, but I fail to see what makes the
> situations of an unused CI config file in the tree and an unused
> `/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct
> enough for it to be resolved differently. Both are quality-of-life
> additions for those that use that tool, while non-users of that tool
> can ignore those configuration entries.

There is a better solution to that. Just add those files to the global 
gitignore on your machine. You will want to ignore those files in all 
git repositories on your machine anyway. On the other hand the 
configuration files for the CI are relevant to just the PostgreSQL repo.

Andreas



Re: Adding CI to our tree

От
Vladimir Sitnikov
Дата:
>Just add those files to the global gitignore on your machine

While global gitignore is a nice feature, it won't protect users who do not know they need to create a global ignore file.
Adding explicit excludes for well-known temporary files into PostgreSQL sources makes it easier to work with the sources for everybody.
Less manual configuration is better for having a productive environment.

On top of that, there is even a use-case for having .idea folder in Git:
.idea/icon.png and .idea/icon_dark.png files are displayed in JetBrains IDEs so the list of projects becomes easier to distinguish.
AFAIK, a standard icon configuration does not yet exist: https://github.com/editorconfig/editorconfig/issues/425


Vladimir

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Attached is an updated version of the CI patches.

Changes:
- more optional features are enabled on various platforms, including
  building with openssl on windows
- added somewhat minimal, README explaining how CI can enabled in a
  repository
- some cleanup to the windows crash reporting support. I also moved the
  code main.c code changes to after the CI stuff themselves, as it might
  be a bit harder to get into a committable shape (at least for me)

Greetings,

Andres Freund

Вложения

Re: Adding CI to our tree

От
Melanie Plageman
Дата:
Hi,

I reviewed the first patch in this set
(v2-0001-ci-Add-CI-for-FreeBSD-Linux-MacOS-and-Windows-uti.patch).

For the README, I found the instructions very clear. My only concern is
that the cirrus-ci UI will change and the instructions on how to enable
cirrus-ci on a repository will not be accessible in the same way in the
future.
That being said, I found your instructions easier to follow than those
on [1].
Perhaps it is better to wait until it becomes a problem and then, at
that point, change the README to guide people to the quickstart link.

I have attached a patch which does a small refactor using a yaml anchor
and aliases (tried it and it seems to work for me).

A few questions and thoughts:

- do you not need to change the default core resource limits for
  FreeBSD?

- Would you find it valuable to set a few more coredump_filter bits?
  Might be worth setting bits 2 and 3 (see [2])-- in addition to the
  defaults (on Linux -- I don't know what the equivalent is on other
  platforms).

- I found this line a bit confusing, so maybe it is worth a comment
  sysinfo_script:
    - export || true

- For the docker files, I think it is recommended to run "docker build"
  only from within the specific build context (not in the top-level
  directory), so I don't think you should need the dockerignore file.

  Also, instead of putting the two docker files in the same directory,
  you could put them in dedicated directories (making those directories
  their build context). That way if you change one you don't end up
  rebuilding the other.

- In ci/docker/linux_debian_bullseye, you can make this change:
  -  apt-get clean
  +  apt-get clean && \
  +  rm -f /var/lib/apt/lists/*

  to make that layer smaller.

- Melanie

[1] https://cirrus-ci.org/guide/quick-start/
[2] https://man7.org/linux/man-pages/man5/core.5.html

Вложения

Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Sat, Nov 20, 2021 at 11:17 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> - do you not need to change the default core resource limits for
>   FreeBSD?

Unfortunately the performance really sucks on that FreeBSD CI system
if you crank it up, and I haven't had time to figure out why yet :-/
Possibly something to do with large numbers of files being created and
unlinked concurrently under -jX on slow storage is going awry...



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-11-19 17:17:44 -0500, Melanie Plageman wrote:
> For the README, I found the instructions very clear. My only concern is
> that the cirrus-ci UI will change and the instructions on how to enable
> cirrus-ci on a repository will not be accessible in the same way in the
> future.

I think we can just adjust things at that point, I'm not too worried about
past instructions not working.


> I have attached a patch which does a small refactor using a yaml anchor
> and aliases (tried it and it seems to work for me).

Oh, neat. Yaml is so weird.



> - Would you find it valuable to set a few more coredump_filter bits?
>   Might be worth setting bits 2 and 3 (see [2])-- in addition to the
>   defaults (on Linux -- I don't know what the equivalent is on other
>   platforms).

I don't think we need 2/3 - we don't have file backed mappings. In some
situations setting bit 6 (shared huge pages) would make sense - but here we
don't configure them...


> - I found this line a bit confusing, so maybe it is worth a comment
>   sysinfo_script:
>     - export || true

We can probably just get rid of the ||. It was just so that a missing 'export'
builtin didn't cause execution to abort. But that won't randomly vanish, so
it's fine.


> - For the docker files, I think it is recommended to run "docker build"
>   only from within the specific build context (not in the top-level
>   directory), so I don't think you should need the dockerignore file.

We don't have control over that in this case - it's cirrus invoking docker,
and it just uses the whole repo as the context.


> - In ci/docker/linux_debian_bullseye, you can make this change:
>   -  apt-get clean
>   +  apt-get clean && \
>   +  rm -f /var/lib/apt/lists/*

Might not matter too much compared to the size of the whole thing, but it
definitely won't hurt...

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Attached is an updated version of the CI patches. An example of a test run
with the attached version of this
https://cirrus-ci.com/build/6501998521483264

I again included the commit allowing crash dumps to be collected on windows,
but I don't think it can be merged as-is, and should be left for later.


Changes since v2:
- Address review comments

- Build with further optional features enabled. I think just about everything
  reasonable is now enabled on freebsd, linux and macos. There's quite a bit
  more that could be done on windows, but I think it's good enough for now.

- I added cross-compilation to windows from linux, to the "warnings"
  task. Occasionally there are build-system issues specific to
  cross-compilation, and the set of warnings are different.

- Docs are now built as part of the 'CompilerWarnings' task.

- I improved the CI README a bit more, in particular I added docs for the
  'ci-os-only' tag I added to the CI logic, which lets one select which
  operating systems test get to run on.

- Some of the 'Warnings' tasks now build with --enable-dtrace (once with
  optimizations, once without). It's pretty easy to break probes without
  seeing the problem locally.

- Switched to using PG_TEST_USE_UNIX_SOCKETS for windows. Without that I was
  seeing occasional spurious test failures due to the use of PROVE_FLAGS=
  -j10, to make the otherwise serial execution of tests on windows bearable.

- switch macos task to use monterey

- plenty smaller changes / cleanups


There of course is a lot more that can be done [1], but I am pretty happy with
what this covers.


I'd like to commit this soon. There's two aspects that perhaps deserve a bit
more discussion before doing so though:

One I explicitly brought up before:

On 2021-10-01 15:27:52 -0700, Andres Freund wrote:
> One policy discussion that we'd have to have is who should control the images
> used for CI. Right now that's on my personal google cloud account - which I am
> happy to do, but medium - long term that'd not be optimal.

The proposed CI script uses custom images to run linux and freebsd tests. They
are automatically built every day from the repository https://github.com/anarazel/pg-vm-images/

These images have all the prerequisites pre-installed. For Linux something
similar can be achieved by using dockerfiles referenced the .cirrus.yml file,
but for FreeBSD that's not available. Installing the necessary dependencies on
every run is too time intensive. For linux, the tests run a lot faster in
full-blown VMs than in docker, and full VMs allow a lot more control /
debugging.

I think this may be OK for now, but I also could see arguments for wanting to
transfer the image specifications and the google account to PG properties.


The second attention-worthy point is the choice of a new toplevel ci/
directory as the location for resources referencenced by CI. A few other
projects also use ci/, but I can also see arguments for moving the contents to
e.g. src/tools/ci or such?

Greetings,

Andres Freund


[1] Some ideas for what could make sense to extend CI to in the future:

- also test with msys / mingw on windows

- provide more optional dependencies for windows build

- Extend the set of compiler warnings - as the compiler version is controlled,
  we could be more aggressive than we can be via configure.ac.

- Add further distributions / platforms. Possibly as "manual" tasks - the
  amount resources one CI user gets is limited, so running tests on all
  platforms all the time would make tests take longer. Interesting things
  could be:

  - further linux distributions, particularly long-term supported ones

  - Some of the other BSDs. There currently are no pre-made images for
    openbsd/netbsd, but it shouldn't be too hard to script building them.

  - running some tests on ARM could be interesting, cirrus supports that for
    container based builds now

- run checks like cpluspluscheck as part of the CompilerWarnings task

- add tasks for running tests with tools like asan / ubsan (valgrind will be
  too slow).

- consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES,
  and run-time force_parallel_mode = regress on some platforms. They seem to
  catch a lot of problems during development and are likely affordable enough.

Вложения

Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote:
> Hi,
> 
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

sudo is used exactly twice; maybe it's not needed at all ?

> +task:
> +  name: FreeBSD
...
> +  sysconfig_script:
> +    - sudo sysctl kern.corefile='/tmp/%N.%P.core'

> +task:
> +  name: macOS
...
> +  core_install_script:
> +    - sudo chmod 777 /cores

typos:
binararies
dont't



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
> On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote:
> > Hi,
> > 
> > Attached is an updated version of the CI patches. An example of a test run
> > with the attached version of this
> > https://cirrus-ci.com/build/6501998521483264
> 
> sudo is used exactly twice; maybe it's not needed at all ?

The macos one is needed, but the freebsd one indeed isn't.


> > +task:
> > +  name: FreeBSD
> ...
> > +  sysconfig_script:
> > +    - sudo sysctl kern.corefile='/tmp/%N.%P.core'
> 
> > +task:
> > +  name: macOS
> ...
> > +  core_install_script:
> > +    - sudo chmod 777 /cores
> 
> typos:
> binararies
> dont't

Oops, thanks.


Typos and sudo use are fixed in the repo [1].

Greetings,

Andres Freund

[1] https://github.com/anarazel/postgres/tree/ci



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
>> sudo is used exactly twice; maybe it's not needed at all ?

> The macos one is needed, but the freebsd one indeed isn't.

I'm with Justin on this one.  I would view a script trying to
mess with /cores as a hostile act.  PG cores on macOS tend to
be extremely large and can fill up your disk fairly quickly
if you don't know they're being accumulated.  I think it's okay
to suggest in the documentation that people might want to allow
cores to be dropped, but the script has NO business trying to
force that.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-13 18:14:52 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
> >> sudo is used exactly twice; maybe it's not needed at all ?
>
> > The macos one is needed, but the freebsd one indeed isn't.
>
> I'm with Justin on this one.  I would view a script trying to
> mess with /cores as a hostile act.  PG cores on macOS tend to
> be extremely large and can fill up your disk fairly quickly
> if you don't know they're being accumulated.  I think it's okay
> to suggest in the documentation that people might want to allow
> cores to be dropped, but the script has NO business trying to
> force that.

I'm not quite following. This is a ephemeral CI instance?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Tue, Dec 14, 2021 at 10:12 AM Andres Freund <andres@anarazel.de> wrote:
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

I've been pushing various versions of these patches into my own
development branches for a while now; they're working very nicely and
helping me a lot.  This is vastly better than anything I was doing
before, especially on Windows which is a blind spot for most of us.
It'll be great to see this committed, and continue improving it
in-tree.  I'd better go and figure out how to fix cfbot when this
lands...

> I think this may be OK for now, but I also could see arguments for wanting to
> transfer the image specifications and the google account to PG properties.

No clue on the GCP account side of it (does pginfra already have
one?), but for the repo I guess it would seem natural to have one on
git.postgresql.org infra, mirrored (just like the main repo) to a repo
on project-owned github.com/postgres, from which image building is
triggered.  Then it could be maintained by the whole PostgreSQL
project, patches discussed on -hackers, a bit like pg_bsd_indent.
Perhaps with some way to trigger test image builds, so that people
working on it don't need their own GCP account to do a trial run.

+  # Test that code can be built with gcc/clang without warnings

As a TODO note, I think we should eventually run a warnings check for
MSVC too.  IIUC we only aim to be warning free in assertion builds on
that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it
has it in C++ but not C?), but that's something.

+  # XXX: Only do this if there have been changes in doc/ since last build
+  always:
+    docs_build_script: |
+      time ./configure \
+        --cache gcc.cache CC="ccache gcc"
+      time make -s -j4 -C doc

Another TODO note: perhaps we could also make the documentation
results a browsable artefact with a short retention time, if that's a
thing.  (I've been confused about how to spell "artefact" for some
time now, and finally I know why: in the US it has an i; I blame Noah
Websta, whose name I have decided to improve.)

I feel like I should apologise in advance for this level of
nit-picking about English grammar, but:

+2) For not yet merged development work, CI can be enabled for some git hosting
+   providers. This allows to test patches on a number of platforms before they
+   are merged (or even submitted).

You can "allow testing" (gerund verb), you can "allow
developers/us/one/... to test" (infinitive, but with a noun phrase to
say who's allowed to do the thing), you can "allow verification of
..." (noun phrase), you can "be allowed to test" (passive), but you
can't "allow to test": it's not allowed![1] :-)

+# It might be nicer to switch to the openssl built as part of curl-for-win,
+# but recent releases only build openssl 3, and that still seems troublesome
+# on windows,

s/windows,/Windows./

+  name: FreeBSD

FreeBSD is missing --with-llvm.  If you add package "llvm" to your
image builder you'll currently get LLVM 9, then
LLVM_CONFIG="llvm-config" CXX="ccache c++" CLANG="ccache clang".  Or
we could opt for something more modern with package llvm13 and program
names llvm-config13 and clang13.

> The second attention-worthy point is the choice of a new toplevel ci/
> directory as the location for resources referencenced by CI. A few other
> projects also use ci/, but I can also see arguments for moving the contents to
> e.g. src/tools/ci or such?

I'd be +0.75 for moving it to src/tools/ci.

> [1] Some ideas for what could make sense to extend CI to in the future:

To your list, I'd add:

* 32 bit
* test coverage report
* ability to capture complete Window install directory as an artefact
so a Windows user without a dev environment could try out a proposed
change/CF entry/...

I hope we can get ccache working on Windows.

[1] https://english.stackexchange.com/questions/60271/grammatical-complements-for-allow/60285#60285



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-14 16:51:58 +1300, Thomas Munro wrote:
> I'd better go and figure out how to fix cfbot when this lands...

I assume it'd be:
- stop adding the CI stuff
- adjust links to CI tasks, appveyor wouldn't be used anymore
- perhaps reference individual tasks from the cfbot page?


> > I think this may be OK for now, but I also could see arguments for wanting to
> > transfer the image specifications and the google account to PG properties.
> 
> No clue on the GCP account side of it (does pginfra already have
> one?), but for the repo I guess it would seem natural to have one on
> git.postgresql.org infra, mirrored (just like the main repo) to a repo
> on project-owned github.com/postgres, from which image building is
> triggered.  Then it could be maintained by the whole PostgreSQL
> project, patches discussed on -hackers, a bit like pg_bsd_indent.
> Perhaps with some way to trigger test image builds, so that people
> working on it don't need their own GCP account to do a trial run.

I think that's a good medium-term goal, I'd not make it a prerequisite for
merging myself.


> +  # Test that code can be built with gcc/clang without warnings
> 
> As a TODO note, I think we should eventually run a warnings check for
> MSVC too.  IIUC we only aim to be warning free in assertion builds on
> that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it
> has it in C++ but not C?), but that's something.

Hm. Not entirely sure how to do that without doing a separate windows build,
which is too slow...


> +  # XXX: Only do this if there have been changes in doc/ since last build
> +  always:
> +    docs_build_script: |
> +      time ./configure \
> +        --cache gcc.cache CC="ccache gcc"
> +      time make -s -j4 -C doc
> 
> Another TODO note: perhaps we could also make the documentation
> results a browsable artefact with a short retention time, if that's a
> thing.

Might be doable, but I'd guess that the volume of data it'd generate make it
not particularly attractive.


> I feel like I should apologise in advance for this level of
> nit-picking about English grammar, but:

:)

Will try to fix.


> +  name: FreeBSD
> 
> FreeBSD is missing --with-llvm.

That was kind of intentional, I guess I should add a comment about it. The CI
image for freebsd already starts slower due to its size, and is on the slower
side compared to anything !windows, so I'm not sure it's worth enabling llvm
there? It's probably not bad to have one platform testing without llvm.


> > [1] Some ideas for what could make sense to extend CI to in the future:
> 
> To your list, I'd add:
> 
> * 32 bit

That'd be easy.


> * test coverage report

If the output size is reasonable, that should be doable as well.


> * ability to capture complete Window install directory as an artefact
> so a Windows user without a dev environment could try out a proposed
> change/CF entry/...

I think the size of these artifacts would make this not something to enable by
default. But perhaps a manually triggered task would make sense?


> I hope we can get ccache working on Windows.

They did merge a number of the other required changes for that over the
weekend. I'll try once they released...

Thanks!

Andres Freund



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 14 Dec 2021, at 05:11, Andres Freund <andres@anarazel.de> wrote:
> On 2021-12-14 16:51:58 +1300, Thomas Munro wrote:
>> I'd better go and figure out how to fix cfbot when this lands...
>
> I assume it'd be:
> - stop adding the CI stuff
> - adjust links to CI tasks, appveyor wouldn't be used anymore
> - perhaps reference individual tasks from the cfbot page?

+1 on leveraging the CI tasks in the tree in the CFBot.  For a patch like the
libnss TLS backend one it would be a great help to both developer and reviewer
to have that codepath actually built and tested as part of the CFBot; being
able to tweak the CI tasks used in the CFBot per patch would be very helpful.

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2021-12-13 18:14:52 -0500, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
> > >> sudo is used exactly twice; maybe it's not needed at all ?
> >
> > > The macos one is needed, but the freebsd one indeed isn't.
> >
> > I'm with Justin on this one.  I would view a script trying to
> > mess with /cores as a hostile act.  PG cores on macOS tend to
> > be extremely large and can fill up your disk fairly quickly
> > if you don't know they're being accumulated.  I think it's okay
> > to suggest in the documentation that people might want to allow
> > cores to be dropped, but the script has NO business trying to
> > force that.
> 
> I'm not quite following. This is a ephemeral CI instance?

As for myself, all I meant is that it's better to write it with zero sudos than
one (for the same reason that it's better to write with one than with two).

-- 
Justin



Re: Adding CI to our tree

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote:
>> On 2021-12-13 18:14:52 -0500, Tom Lane wrote:
>>> I'm with Justin on this one.  I would view a script trying to
>>> mess with /cores as a hostile act.

>> I'm not quite following. This is a ephemeral CI instance?

> As for myself, all I meant is that it's better to write it with zero sudos than
> one (for the same reason that it's better to write with one than with two).

What I'm concerned about is that it's unsafe to run the script in
any non-throwaway environment.  That doesn't seem desirable.

            regards, tom lane



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 15 Dec 2021, at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote:
>>> On 2021-12-13 18:14:52 -0500, Tom Lane wrote:
>>>> I'm with Justin on this one.  I would view a script trying to
>>>> mess with /cores as a hostile act.
>
>>> I'm not quite following. This is a ephemeral CI instance?
>
>> As for myself, all I meant is that it's better to write it with zero sudos than
>> one (for the same reason that it's better to write with one than with two).
>
> What I'm concerned about is that it's unsafe to run the script in
> any non-throwaway environment.  That doesn't seem desirable.

I don't think anyone should expect any part of the .cirrus.yml script to be
safe or applicable to a local environment, so I think this is something we can
solve with documentation.

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Peter Eisentraut
Дата:
On 13.12.21 22:12, Andres Freund wrote:
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'

I'm not in favor of this kind of thing.  I don't understand how this is 
useful, other than perhaps for developing *this* patch.  I don't think 
people would like having these tags in the mainline, and if it's for 
local use, then people can adjust the file locally.

+      CC="ccache cc" CFLAGS="-O0 -ggdb"'

I don't think using -O0 is the right thing.  It will miss some compiler 
warnings, and it will not thoroughly test the compiler.   We should test 
using the configurations that are close to what users actually use.

+    - su postgres -c 'gmake -s -j3 && gmake -s -j3 -C contrib'

Why doesn't this use make world (or world-bin, if you prefer).

Why does this use -j3 if there are two CPUs configured?  (Perhaps the 
number of CPUs should be put into a variable.)

I don't like that the -s option is used.  I would like to see what 
commands are executed.

+    cpu: 4

Why does the Linux job use 4 CPUs and the FreeBSD job 2?

+    - export

I don't think that is portable to all shells.

+    - su postgres -c 'time script test.log gmake -s -j2 ${CHECK} 
${CHECKFLAGS}'

+    su postgres -c '\
+      ulimit -c unlimited; \
+      make -s ${CHECK} ${CHECKFLAGS} -j8 \
+      '

Not clear why these are so different.  Don't we need the test.log file 
for Linux?  Don't we need the ulimit call for FreeBSD?  Why the -j8 
option even though 4 CPUs have been configured?

+    brew install \
+      ccache \
+      coreutils \
+      icu4c \
+      krb5 \
+      llvm \
+      lz4 \
+      make \
+      openldap \
+      openssl \
+      python@3.10 \
+      tcl-tk

Curious why coreutils and make are installed?  The system-supplied tools 
ought to work.

+    brew cleanup -s

Seems unnecessary?

+    PKG_CONFIG_PATH="/usr/local/opt/krb5/lib/pkgconfig:$PKG_CONFIG_PATH"

AFAICT, we don't use pkg-config for the krb5 package.

+    - gmake -s -j12 && gmake -s -j12 -C contrib

These numbers should also be explained or configured somewhere. 
Possibly query the number of CPUs on the instance.

+    PROVE_FLAGS: -j10

Why only on Windows?

+    # Installation on windows currently only completely works from 
src\tools\msvc

If that is so, let's fix that.  But see that install.pl contains

if (-e "src/tools/msvc/buildenv.pl")

etc. it seems to want to be able to be invoked from the top level.

+    - cd src\tools\msvc && perl .\install.pl 
%CIRRUS_WORKING_DIR%\tmp_install

Confusing mix of forward and backward slashes in the Windows section.  I 
think forward slashes should work everywhere.

+  test_plcheck_script:
+    - perl src/tools/msvc/vcregress.pl plcheck

etc. Couldn't we enhance vcregress.pl to take multiple arguments or take 
a "check-world" argument or something.  Otherwise, this will be tedious 
to keep updated.

+  test_subscriptioncheck_script:
+    - perl src/tools/msvc/vcregress.pl taptest .\src\test\subscription\

This is even worse.  I don't want to have to hand-register every new TAP 
test.

+  always:
+    gcc_warning_script: |
+      time ./configure \
+        --cache gcc.cache CC="ccache gcc" \
+        --enable-dtrace

I don't know why we wouldn't need the full set of options here.  It's 
not like optional code never has compiler warnings.

+  # cross-compile to windows
+  always:
+    mingw_cross_warning_script: |

I would welcome a native mingw build with full options and test suite 
run.  This cross-compiling stuff is of course interesting, but I'm not 
sure why it is being preferred over a full native run.

--- /dev/null
+++ b/.dockerignore
@@ -0,0 +1,7 @@
+# Ignore everything, except ci/

I wonder whether this would interfere with other uses of docker.  I 
suppose people might have their custom setups for building docker images 
from PostgreSQL sources.  It seems weird that this file gets this 
prominence, saying that the canonical use of docker inside PostgreSQL 
sources is for Cirrus CI.

It would be useful if the README explained the use of docker.  As I 
mentioned before, it's not immediately clear why docker is used at all 
in this.

The docker file for Windows contains a lot of hardcoded version numbers. 
  This should at least be refactored a little bit so that it is clear 
which version numbers should be updated and how.  Or better yet, avoid 
the need to constantly update version numbers.  For example, the Python 
patch release changes every few weeks (e.g., 3.10.0 isn't current 
anymore).  Also, the way OpenSSL is installed looks a bit fishy.  Is 
this what people actually use in practice?  How can we make it match 
actual practice better?

+# So that tests using the "manually" started postgres on windows can use
+# prepared statements
+max_prepared_transactions = 10

Maybe add that to the pg_ctl invocation in the Windows section instead.

+# Settings that make logs more useful
+log_autovacuum_min_duration = 0
+log_checkpoints = true
+log_connections = true
+log_disconnections = true
+log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
+log_lock_waits = true

If we think these are useful, we should make the test suite drivers set 
them for all users.

> One I explicitly brought up before:
> 
> On 2021-10-01 15:27:52 -0700, Andres Freund wrote:
>> One policy discussion that we'd have to have is who should control the images
>> used for CI. Right now that's on my personal google cloud account - which I am
>> happy to do, but medium - long term that'd not be optimal.
> 
> The proposed CI script uses custom images to run linux and freebsd tests. They
> are automatically built every day from the repository https://github.com/anarazel/pg-vm-images/
> 
> These images have all the prerequisites pre-installed. For Linux something
> similar can be achieved by using dockerfiles referenced the .cirrus.yml file,
> but for FreeBSD that's not available. Installing the necessary dependencies on
> every run is too time intensive. For linux, the tests run a lot faster in
> full-blown VMs than in docker, and full VMs allow a lot more control /
> debugging.
> 
> I think this may be OK for now, but I also could see arguments for wanting to
> transfer the image specifications and the google account to PG properties.

For the above reasons of lack of documentation, I still don't understand 
the whole docker flow here.  Do you mean, the docker files included in 
your patch are not actually used as part of the CI run; instead you use 
them to build images manually, which are then pulled in by the test runs?

If so, apart from the general problem of having this go through some 
personal account, I also have concerns how this can be kept up to date, 
given how often the dependent software changes, as mentioned above.

I think it would be much easier to get this project over the initial 
hump if we skipped the whole docker business and just set the images up 
from scratch on each run.

> The second attention-worthy point is the choice of a new toplevel ci/
> directory as the location for resources referencenced by CI. A few other
> projects also use ci/, but I can also see arguments for moving the contents to
> e.g. src/tools/ci or such?

Or src/tools/cirrus/?  This providers came and go, and before long there 
might be interest in another one.

> - Extend the set of compiler warnings - as the compiler version is controlled,
>    we could be more aggressive than we can be via configure.ac.

Not sure about that.  I don't want this to evolve into some separate 
pool of policies that yells at you because of some settings that you 
never heard of.  If we think other warnings are useful, we should 
provide a way to select them, perhaps optionally, from the usual build 
system.

> - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES,
>    and run-time force_parallel_mode = regress on some platforms. They seem to
>    catch a lot of problems during development and are likely affordable enough.

That would be useful if we can think of a way to select it optionally.



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 12/13/21 16:12, Andres Freund wrote:
> Hi,
>
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264
>

Maye I have missed it, but why are we using ccache here? That seems a
bit pointless in an ephemeral instance.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Maye I have missed it, but why are we using ccache here? That seems a
> bit pointless in an ephemeral instance.

I believe Munro's cfbot tooling is able to save and re-use ccache
across successive instantiations of a build instance.  I've not
looked at this code, but if it can do that there'd be point to it.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-17 12:34:36 +0100, Peter Eisentraut wrote:
> On 13.12.21 22:12, Andres Freund wrote:
> > Attached is an updated version of the CI patches. An example of a test run
> > with the attached version of this
> > https://cirrus-ci.com/build/6501998521483264
> 
> +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' ||
> $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
> 
> I'm not in favor of this kind of thing.  I don't understand how this is
> useful, other than perhaps for developing *this* patch.  I don't think
> people would like having these tags in the mainline, and if it's for local
> use, then people can adjust the file locally.

Definitely not for mainline. But it's extremely useful for development. If you
iteratively try to fix windows, running all the other tests can be slower -
there's a concurrency limit in how many tests you can run for free...


> +      CC="ccache cc" CFLAGS="-O0 -ggdb"'
> 
> I don't think using -O0 is the right thing.  It will miss some compiler
> warnings, and it will not thoroughly test the compiler.   We should test
> using the configurations that are close to what users actually use.

Hm. I personally always end up using -O0 for the actual development tree, and
it seems a lot of others do as well. Building with -O2 makes backtraces etc
just less useful.


> +    - su postgres -c 'gmake -s -j3 && gmake -s -j3 -C contrib'
> 
> Why doesn't this use make world (or world-bin, if you prefer).

I started working on this well before world-bin existed. And using 'world' as
the target builds the docs, which is quite expensive... I happened to actually
make the change to world-bin yesterday for the next version to send :)


> Why does this use -j3 if there are two CPUs configured?  (Perhaps the number
> of CPUs should be put into a variable.)

I tried a few and that worked best.


> I don't like that the -s option is used.  I would like to see what commands
> are executed.

I can change it - but it makes it *much* harder to spot compiler warnings.


> +    cpu: 4
> 
> Why does the Linux job use 4 CPUs and the FreeBSD job 2?

I'll add a comment about it. Two reasons
1) the limits on cirrus are lower for linux than freebsd:
   https://cirrus-ci.org/faq/
2) There's some issues on freebsd where test performance regressess *very*
   substantially with higher concurrency. Thomas and I looked a bunch into it
   without figuring out the details.


> +    - export

> I don't think that is portable to all shells.

Doesn't really need to be?


> +    - su postgres -c 'time script test.log gmake -s -j2 ${CHECK}
> ${CHECKFLAGS}'
> 
> +    su postgres -c '\
> +      ulimit -c unlimited; \
> +      make -s ${CHECK} ${CHECKFLAGS} -j8 \
> +      '
> 
> Not clear why these are so different.  Don't we need the test.log file for
> Linux?

There's a comment about the use of script:
  # Use of script is to avoid make complaints about fcntl()
  # https://savannah.gnu.org/bugs/?60774

that bug is specific to platforms that don't allow locking pipes. Which linux
does allow, but freebsd doesn't.


> Don't we need the ulimit call for FreeBSD?

I think the default core limits were different, I will check.


> Why the -j8 option even though 4 CPUs have been configured?

That might have been an accident.


> +    brew install \
> +      ccache \
> +      coreutils \
> +      icu4c \
> +      krb5 \
> +      llvm \
> +      lz4 \
> +      make \
> +      openldap \
> +      openssl \
> +      python@3.10 \
> +      tcl-tk
> 
> Curious why coreutils and make are installed?  The system-supplied tools
> ought to work.

make because newer versions of make have -Otarget, which makes concurrent
check-world output at least kind-of readable.


> +    brew cleanup -s
> 
> Seems unnecessary?

It reduces the size of the cached downloads. Not much point in keeping older
versions of the package around. Populating the cache takes time.


> +    PKG_CONFIG_PATH="/usr/local/opt/krb5/lib/pkgconfig:$PKG_CONFIG_PATH"
> 
> AFAICT, we don't use pkg-config for the krb5 package.

I now converted this to a loop.


> +    - gmake -s -j12 && gmake -s -j12 -C contrib
> 
> These numbers should also be explained or configured somewhere. Possibly
> query the number of CPUs on the instance.

macOS instances have a fixed number of cores - 12. Might make sense to query
it, but not sure what a good portable way there is.


> +    PROVE_FLAGS: -j10
> 
> Why only on Windows?

Because windows doesn't have a way to run tests in parallel in another
way. prove-level concurrency is the only thing. Whereas other platforms can
run tests in parallel via make. Combining both tends to not work very well in
my experience.


> +    # Installation on windows currently only completely works from
> src\tools\msvc
> 
> If that is so, let's fix that.

I did report the problem - just haven't gotten around to fixing it.  Note this
is also how the buildfarm invokes installation... The problem is that
Install.pm includes config.pl from the current directory, IIRC.

At some point I needed to restrict to dealing with the current state - there's
plenty other bugs.


> +    - cd src\tools\msvc && perl .\install.pl
> %CIRRUS_WORKING_DIR%\tmp_install
> 
> Confusing mix of forward and backward slashes in the Windows section.  I
> think forward slashes should work everywhere.

They would work here I think, but no, they don't work everywhere :(


> +  test_plcheck_script:
> +    - perl src/tools/msvc/vcregress.pl plcheck
> 
> etc. Couldn't we enhance vcregress.pl to take multiple arguments or take a
> "check-world" argument or something.  Otherwise, this will be tedious to
> keep updated.

> +  test_subscriptioncheck_script:
> +    - perl src/tools/msvc/vcregress.pl taptest .\src\test\subscription\
> 
> This is even worse.  I don't want to have to hand-register every new TAP
> test.

I strongly agree. There were several tests that the buildfarm on windows
didn't ever run before I started working on this. And clearly no windows
developer is going to manually invoke ~10 test steps.


> +  always:
> +    gcc_warning_script: |
> +      time ./configure \
> +        --cache gcc.cache CC="ccache gcc" \
> +        --enable-dtrace
> 
> I don't know why we wouldn't need the full set of options here.  It's not
> like optional code never has compiler warnings.

I mostly didn't like the repetition of long argument lists. There's probably a
decent way to deal with that.


> +  # cross-compile to windows
> +  always:
> +    mingw_cross_warning_script: |
> 
> I would welcome a native mingw build with full options and test suite run.
> This cross-compiling stuff is of course interesting, but I'm not sure why it
> is being preferred over a full native run.

I have a new colleague working on scripting the setup of mingw on
windows. Besides not being available yet, it's *much* *much* slower to build
postgres. This is a useful and quicker screening.


> --- /dev/null
> +++ b/.dockerignore
> @@ -0,0 +1,7 @@
> +# Ignore everything, except ci/
> 
> I wonder whether this would interfere with other uses of docker.  I suppose
> people might have their custom setups for building docker images from
> PostgreSQL sources.  It seems weird that this file gets this prominence,
> saying that the canonical use of docker inside PostgreSQL sources is for
> Cirrus CI.

I have a hard time seeing uses of docker where this would be a problem. If you
actually use the whole tree as context you pretty much need to exclude most of
it, otherwise docker (and other tools) tar the whole tree and send it to the
daemon.


> It would be useful if the README explained the use of docker.  As I
> mentioned before, it's not immediately clear why docker is used at all in
> this.

It boils down to this: Windows tests on cirrus are always run via docker
(presumably because the licensing otherwise is more expensive). And for linux,
it's considerably quicker to start up a container, rather than a full VM - but
a full VM is slower.


> The docker file for Windows contains a lot of hardcoded version numbers.
> This should at least be refactored a little bit so that it is clear which
> version numbers should be updated and how.  Or better yet, avoid the need to
> constantly update version numbers.

I don't really see a great way to avoid them in general. And e.g. with perl
the issue is that plperl straight up doesn't work with a newer perl version :(.


> Also, the way OpenSSL is installed looks a bit fishy.  Is this what people
> actually use in practice?  How can we make it match actual practice better?

I wish I knew. I didn't see any good practice for this anywhere.


> +# So that tests using the "manually" started postgres on windows can use
> +# prepared statements
> +max_prepared_transactions = 10
> 
> Maybe add that to the pg_ctl invocation in the Windows section instead.

It doesn't hurt anything else, so I don't really think it's worth going a
platform dependent way?


> +# Settings that make logs more useful
> +log_autovacuum_min_duration = 0
> +log_checkpoints = true
> +log_connections = true
> +log_disconnections = true
> +log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
> +log_lock_waits = true
> 
> If we think these are useful, we should make the test suite drivers set them
> for all users.

Some of these are set for some but not some other test drivers. And it's quite
useful for out-of-tree development to be able to quickly add options to all
tests... And no test driver helps the windows case that needs the manually
started postgres (the state of windows testing really is dreadful).


> For the above reasons of lack of documentation, I still don't understand the
> whole docker flow here.  Do you mean, the docker files included in your
> patch are not actually used as part of the CI run; instead you use them to
> build images manually, which are then pulled in by the test runs?

As submitted this is about VM images, not docker images. Used by linux (for
the main test run) and freebsd (cirrus doesn't support anything else). Some
other platforms could also be supported that way with a bit more work
(openbsd, netbsd).  For some other platforms cirrus only supports docker
containers (windows, linux on arm).

The docker containers can be built on-demand (that's what the dockerfile:
... syntax for e.g. windows does). So yes, they currently are used. cirrus
rebuilds them whenever their content (including referenced files) changes.


But the building of containers happens in every repo enabling the tests.  That
takes quite a while and uses a lot of space (the additional windows
installation is like ~6GB). So I was working over the last few days on moving
the containers to be built the alongside the VM images. Then it looks more
similar across all the platforms.


> I think it would be much easier to get this project over the initial hump if
> we skipped the whole docker business and just set the images up from scratch
> on each run.

It's not feasible. The windows stuff takes *way* too long (as in 40min),
freebsd takes quite long, linux long. And the amount of traffic it generates
to install packages over and over again isn't great either. The containers /
images are from within google's network, and thus free - the package repos
don't have that advantage.

FWIW, homebrew at some point complained about a huge fraction of their cost
being from CI. I know debian had issues with that on and off as well. While I
couldn't solve the homebrew issue via VM images, I made it at least cache the
homebrew downloads between runs.


My current (not yet submitted version) comment about this is:

> Images used for CI
> ==================
> 
> To keep CI times tolerable, most platforms use pre-generated images. Some
> platforms use containers, others use full VMs. Images for both are generated
> separately from CI runs, otherwise each git repository that is being tested
> would need to build its own set of containers, which would be wasteful (both
> in space and time.
> 
> These images are built from the specifications in github.com/anarazel/pg-vm-images/

I also did the work of redoing the necessary setup and documenting all the
steps ([1] although there could be a bit more handholding) . It'd now not be
hard to transfer the image building into different / shared ownership.

I'll expand this section.


> > The second attention-worthy point is the choice of a new toplevel ci/
> > directory as the location for resources referencenced by CI. A few other
> > projects also use ci/, but I can also see arguments for moving the contents to
> > e.g. src/tools/ci or such?
> 
> Or src/tools/cirrus/?  This providers came and go, and before long there
> might be interest in another one.

I think quite a bit of the work will be portable between CI providers. I think
having all the different CI things in one directory will make it more obvious
than having a bunch of provider names one might or might not know.

I'm imaginging that over time we'd put some of the stuff in the .cirrus.yml
file into scripts in src/tools/ci/, so they can be used by different CI
providers.


> > - Extend the set of compiler warnings - as the compiler version is controlled,
> >    we could be more aggressive than we can be via configure.ac.
> 
> Not sure about that.  I don't want this to evolve into some separate pool of
> policies that yells at you because of some settings that you never heard of.
> If we think other warnings are useful, we should provide a way to select
> them, perhaps optionally, from the usual build system.

I'd be happy with that as well. I do think that the compiler version
dependency makes it bit harder to this usefully from configure though.


> > - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES,
> >    and run-time force_parallel_mode = regress on some platforms. They seem to
> >    catch a lot of problems during development and are likely affordable enough.
> 
> That would be useful if we can think of a way to select it optionally.

What kind of optionally are you thinking? Commit message contents? A central
flag somewhere?

I was thinking it might be worth enabling such options on one of the
platforms, so that one gets coverage by default. People remembering
e.g. COPY_PARSE_PLAN_TREES don't tend to need it...

Greetings,

Andres Freund

[1] https://github.com/anarazel/pg-vm-images/blob/main/gcp_project_setup.txt



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-17 09:36:05 -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Maye I have missed it, but why are we using ccache here? That seems a
> > bit pointless in an ephemeral instance.
> 
> I believe Munro's cfbot tooling is able to save and re-use ccache
> across successive instantiations of a build instance.  I've not
> looked at this code, but if it can do that there'd be point to it.

Yes, the ccache cache is persisted across runs (see the *_cache and
upload_cache inststructions). It makes a quite substantial difference. One
reason the windows runs are a lot slower than the others is just that visual
studio isn't yet supported by ccache, and that there doesn't seem to be good
other tools.

The ccache maintainers merged more of the msvc support last weekend! So I have
quite a bit of hope of being able to use ccache there as well.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 17 Dec 2021, at 20:31, Andres Freund <andres@anarazel.de> wrote:
> On 2021-12-17 12:34:36 +0100, Peter Eisentraut wrote:

>> I don't like that the -s option is used.  I would like to see what commands
>> are executed.
>
> I can change it - but it makes it *much* harder to spot compiler warnings.

Having used Cirrus et.al a fair bit I strongly agree with Andres, working with
huge logs in the browser is painful whereas -s makes it useable even on mobile
devices.

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Andres Freund
Дата:
On 2021-12-17 11:31:59 -0800, Andres Freund wrote:
> > Don't we need the ulimit call for FreeBSD?
> 
> I think the default core limits were different, I will check.

Yep, freebsd has -c unlimited by default:
https://cirrus-ci.com/task/6199382239346688?logs=sysinfo#L23
vs
https://cirrus-ci.com/task/4792007355793408?logs=sysinfo#L32



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 12/17/21 14:34, Andres Freund wrote:
> Hi,
>
> On 2021-12-17 09:36:05 -0500, Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> Maye I have missed it, but why are we using ccache here? That seems a
>>> bit pointless in an ephemeral instance.
>> I believe Munro's cfbot tooling is able to save and re-use ccache
>> across successive instantiations of a build instance.  I've not
>> looked at this code, but if it can do that there'd be point to it.
> Yes, the ccache cache is persisted across runs (see the *_cache and
> upload_cache inststructions). It makes a quite substantial difference. One
> reason the windows runs are a lot slower than the others is just that visual
> studio isn't yet supported by ccache, and that there doesn't seem to be good
> other tools.
>
> The ccache maintainers merged more of the msvc support last weekend! So I have
> quite a bit of hope of being able to use ccache there as well.
>

Ok. I have had to disable ccache for fairywren (msys2) because it caused
serious instability.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-20 11:21:05 -0800, Andres Freund wrote:
> Attached is v4 of the CI patch.

I'd like to push this - any objections? It's not disruptive to anything but
cfbot, so we can incrementally improve it further.

I'll try to sync pushing with Thomas, so that he can adjust cfbot to not add
the CI changes anymore / adjust the links to the CI status URLs etc.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Daniel Gustafsson
Дата:
> On 29 Dec 2021, at 21:17, Andres Freund <andres@anarazel.de> wrote:
> On 2021-12-20 11:21:05 -0800, Andres Freund wrote:

>> Attached is v4 of the CI patch.
> 
> I'd like to push this - any objections? It's not disruptive to anything but
> cfbot, so we can incrementally improve it further.

No objection, I'm +1 on getting this in.

--
Daniel Gustafsson        https://vmware.com/




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Attached is v5 of the CI patch. Not a lot of changes:
- a bunch of copy-editing, wrote a commit message etc
- use ccache for CXX/CLANG in the CompilerWarnings task, I had missed
  that when making the task use all --with-* flags
- switch linux to use ossp-uuid. I tried to switch macos at first, but
  that doesn't currently seem to work.
- minor other cleanups

This time I've only attached the main CI patch, not the one making core
dumps on windows work. That's not yet committable...

I plan to push this after another cycle of tests passing (and driving
over the bay bridge...) unless I hear protests.

Greetings,

Andres Freund

Вложения

Re: Adding CI to our tree

От
Justin Pryzby
Дата:
commit message: agithub

the the the buildfarm.
=> the

access too.
=> to

# Due to that it using concurrency within prove is helpful.
=> Due to that, it's useful to run prove with multiple jobs.

further details src/tools/ci/README
=> further details , see src/tools/ci/README

script uses a pseudo-tty, which do support locking.
=> which does

To limit unneccessary work only run this once normal linux test succeeded
=> unnecessary
=> succeeds

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-30 20:28:46 -0600, Justin Pryzby wrote:
> [ language fixes]

Thanks!

> script uses a pseudo-tty, which do support locking.
> => which does

This didn't seem right either way - it's pseudo-ttys that don't support
locking, so plural seemed appropriate? I changed it to "script uses
pseudo-ttys, which do" instead...

Regards,

Andres



Re: Adding CI to our tree

От
Andres Freund
Дата:
On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> I plan to push this after another cycle of tests passing (and driving
> over the bay bridge...) unless I hear protests.

Pushed.

Marked CF entry as committed.



Re: Adding CI to our tree

От
Alvaro Herrera
Дата:
On 2021-Dec-30, Andres Freund wrote:

> On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> > I plan to push this after another cycle of tests passing (and driving
> > over the bay bridge...) unless I hear protests.
> 
> Pushed.
> 
> Marked CF entry as committed.

I tried it using the column filter patch.  It worked on the first
attempt.

Thanks!

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
I noticed a patch failing in cfbot everywhere except windows:

https://commitfest.postgresql.org/36/3476/
| Invalid relcache when ADD PRIMARY KEY USING INDEX

It's because vcregress skips tests which have NO_INSTALLCHECK=1.

Is it desirable to enable more module/contrib tests for windows CI ?

This does a few, but there's a few others which would require the server to
be restarted to set shared_preload_libraries for each module.

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..c427b468334 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -390,7 +390,7 @@ task:
     - perl src/tools/msvc/vcregress.pl check parallel
   startcreate_script:
     # paths to binaries need backslashes
-    - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log
+    - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync
     - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
     - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log
   test_pl_script:
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
     oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
     twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
index d8faa9c26c1..52cdb697a57 100644
--- a/src/tools/ci/pg_ci_base.conf
+++ b/src/tools/ci/pg_ci_base.conf
@@ -12,3 +12,24 @@ log_connections = true
 log_disconnections = true
 log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
 log_lock_waits = true
+
+# test_decoding
+wal_level = logical
+max_replication_slots = 4
+logical_decoding_work_mem = 64kB
+
+# commit_ts
+track_commit_timestamp = on
+
+## worker_spi
+#shared_preload_libraries = worker_spi
+#worker_spi.database = contrib_regression
+
+## pg_stat_statements
+##shared_preload_libraries=pg_stat_statements
+
+## test_rls_hooks
+#shared_preload_libraries=test_rls_hooks
+
+## snapshot_too_old
+#old_snapshot_threshold = 60min
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 8f3e3fa937b..7e2cc971a42 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -443,6 +443,7 @@ sub plcheck
 sub subdircheck
 {
     my $module = shift;
+    my $obey_installcheck = shift || 1;
 
     if (   !-d "$module/sql"
         || !-d "$module/expected"
@@ -452,7 +453,7 @@ sub subdircheck
     }
 
     chdir $module;
-    my @tests = fetchTests();
+    my @tests = fetchTests($obey_installcheck);
 
     # Leave if no tests are listed in the module.
     if (scalar @tests == 0)
@@ -516,6 +517,14 @@ sub contribcheck
         my $status = $? >> 8;
         $mstat ||= $status;
     }
+
+    subdircheck('test_decoding', -1);
+    $mstat ||= $? >> 8;
+
+    # The DB would need to be restarted
+    #subdircheck('pg_stat_statements', -1);
+    #$mstat ||= $? >> 8;
+
     exit $mstat if $mstat;
     return;
 }
@@ -530,6 +539,19 @@ sub modulescheck
         my $status = $? >> 8;
         $mstat ||= $status;
     }
+
+    subdircheck('commit_ts', -1);
+    $mstat ||= $? >> 8;
+
+    subdircheck('test_rls_hooks', -1);
+    $mstat ||= $? >> 8;
+
+    ## The DB would need to be restarted
+    #subdircheck('worker_spi', -1);
+    #$mstat ||= $? >> 8;
+
+    # src/test/modules/snapshot_too_old/Makefile
+
     exit $mstat if $mstat;
     return;
 }
@@ -726,6 +748,7 @@ sub fetchTests
     my $m = <$handle>;
     close($handle);
     my $t = "";
+    my $obey_installcheck = shift || 1;
 
     $m =~ s{\\\r?\n}{}g;
 
@@ -733,7 +756,7 @@ sub fetchTests
     # so bypass its run by returning an empty set of tests.
     if ($m =~ /^\s*NO_INSTALLCHECK\s*=\s*\S+/m)
     {
-        return ();
+        return () if $obey_installcheck == 1;
     }
 
     if ($m =~ /^REGRESS\s*=\s*(.*)$/gm)



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> I noticed a patch failing in cfbot everywhere except windows:
> 
> https://commitfest.postgresql.org/36/3476/
> | Invalid relcache when ADD PRIMARY KEY USING INDEX
> 
> It's because vcregress skips tests which have NO_INSTALLCHECK=1.

> Is it desirable to enable more module/contrib tests for windows CI ?

Yes. I think the way we run windows tests is pretty bad - it's not reasonable
that each developer needs to figure out 20 magic incantations to run all tests
on windows.


> This does a few, but there's a few others which would require the server to
> be restarted to set shared_preload_libraries for each module.
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 19b3737fa11..c427b468334 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -390,7 +390,7 @@ task:
>      - perl src/tools/msvc/vcregress.pl check parallel
>    startcreate_script:
>      # paths to binaries need backslashes
> -    - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log
> +    - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync
>      - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
>      - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log
>    test_pl_script:

> diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> index 9a31e0b8795..14fd847ba7f 100644
> --- a/contrib/test_decoding/Makefile
> +++ b/contrib/test_decoding/Makefile
> @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
>      oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
>      twophase_snapshot
>  
> -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
>  ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
>

Not sure why these are part of the diff?


> diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
> index d8faa9c26c1..52cdb697a57 100644
> --- a/src/tools/ci/pg_ci_base.conf
> +++ b/src/tools/ci/pg_ci_base.conf
> @@ -12,3 +12,24 @@ log_connections = true
>  log_disconnections = true
>  log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
>  log_lock_waits = true
> +
> +# test_decoding
> +wal_level = logical
> +max_replication_slots = 4
> +logical_decoding_work_mem = 64kB
> [ more ]

This doesn't really seem like a scalable path forward - duplicating
configuration in more places doesn't seem sane. It seems it'd make more sense
to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
changing the options passed to pg_regress based on fetchTests() return value
wouldn't be too hard?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-10-02 12:59:09 -0700, Andres Freund wrote:
> On 2021-10-02 11:05:20 -0400, Tom Lane wrote:
> > I don't know enough about Windows to evaluate 0001, but I'm a little
> > worried about it because it looks like it's changing our *production*
> > error handling on that platform.
> 
> Yea. It's clearly not ready as-is - it's the piece that I was planning to
> write a separate email about.

> 
> It's hard to understand what *precisely* SEM_NOGPFAULTERRORBOX etc do.
> 
> What I do know is that without the _set_abort_behavior() stuff abort() doesn't
> trigger windows' "crash" paths in at least debugging builds, and that the
> SetErrorMode() and _CrtSetReportMode() changes are necessary to get segfaults
> to reach the crash paths.
> 
> The in-tree behaviour turns out to make debugging on windows a major pain, at
> least when compiling with msvc. Crashes never trigger core dumps or "just in
> time" debugging (their term for invoking a debugger upon crash), so one has to
> attach to processes before they crash, to have any chance of debugging.
> 
> As far as I can tell this also means that at least for debugging builds,
> pgwin32_install_crashdump_handler() is pretty much dead weight -
> crashDumpHandler() never gets invoked. I think it may get invoked for abort()s
> in production builds, but probably not for segfaults.
> 
> And despite SEM_NOGPFAULTERRORBOX we display those annoying "popup" boxes
> telling us about the crash and giving the option to retry, ignore, something
> something.   It's all a bit baffling.

FWIW, the latest version of this patch (including an explanation why
SEM_NOGPFAULTERRORBOX isn't useful for our purposes [anymore]) is at (and
above)
https://postgr.es/m/20220110005704.es4el6i2nxlxzwof%40alap3.anarazel.de

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> > index 9a31e0b8795..14fd847ba7f 100644
> > --- a/contrib/test_decoding/Makefile
> > +++ b/contrib/test_decoding/Makefile
> > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
> >      oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> >      twophase_snapshot
> >  
> > -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> > +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> >  ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> 
> Not sure why these are part of the diff?

Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
..which means test1 gets eaten as the argument to --temp-config

> > diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
> > index d8faa9c26c1..52cdb697a57 100644
> > --- a/src/tools/ci/pg_ci_base.conf
> > +++ b/src/tools/ci/pg_ci_base.conf
> > @@ -12,3 +12,24 @@ log_connections = true
> >  log_disconnections = true
> >  log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
> >  log_lock_waits = true
> > +
> > +# test_decoding
> > +wal_level = logical
> > +max_replication_slots = 4
> > +logical_decoding_work_mem = 64kB
> > [ more ]
> 
> This doesn't really seem like a scalable path forward - duplicating
> configuration in more places doesn't seem sane. It seems it'd make more sense
> to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> changing the options passed to pg_regress based on fetchTests() return value
> wouldn't be too hard?

It needs to run the tests with separate instance.  Maybe you're suggesting to
use --temp-instance.

It needs to avoid running on the buildfarm, right ?

-- 
Justin

Вложения

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> I plan to push this after another cycle of tests passing (and driving
> over the bay bridge...) unless I hear protests.

I noticed that it's harder to see compile warnings on msvc in CI than it was
at an earlier point.  There used to be a summary of errors at the end.

That turns out to be an uninteded consequence of the option to reduce msbuild
verbosity.


> +    # Use parallelism, disable file tracker, we're never going to rebuild...
> +    MSBFLAGS: -m -verbosity:minimal /p:TrackFileAccess=false

Unless somebody protests quickly, I'm going to add
  "-consoleLoggerParameters:Summary;ForceNoAlign"
to that.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-10 16:07:48 -0600, Justin Pryzby wrote:
> On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> > On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> > > index 9a31e0b8795..14fd847ba7f 100644
> > > --- a/contrib/test_decoding/Makefile
> > > +++ b/contrib/test_decoding/Makefile
> > > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
> > >      oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> > >      twophase_snapshot
> > >
> > > -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> > > +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> > >  ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> >
> > Not sure why these are part of the diff?
>
> Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
> ..which means test1 gets eaten as the argument to --temp-config

Ah. I see you changed that globally, good...

I'll probably apply that part and 0002 separately.


> > This doesn't really seem like a scalable path forward - duplicating
> > configuration in more places doesn't seem sane. It seems it'd make more sense
> > to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> > changing the options passed to pg_regress based on fetchTests() return value
> > wouldn't be too hard?
>
> It needs to run the tests with separate instance.  Maybe you're suggesting to
> use --temp-instance.

Yes.


> It needs to avoid running on the buildfarm, right ?

I guess so. It currently appears to have its own logic for finding contrib
(and other) tap tests:

        foreach my $testdir (glob("$pgsql/contrib/*"))
        {
                next unless -d "$testdir/t";
                my $testname = basename($testdir);
                next unless step_wanted("contrib-$testname");
                print time_str(), "running contrib test $testname ...\n" if $verbose;
                run_tap_test("$testdir", "contrib-$testname", undef);
        }

but does run vcregress contribcheck, modulescheck.


Andrew, do you see a better way than what Justin is proposing here?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Thu, Jan 13, 2022 at 10:55:27AM -0800, Andres Freund wrote:
> I'll probably apply that part and 0002 separately.

I've hacked on it a bit more now..

Question: are data-checksums tested at all ?  The only thing I can find is that
some buildfarm members *might* exercise it during installcheck.

I added pg_regress --initdb-opts since that seems to be a deficiency.

-- 
Justin

Вложения

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-13 13:06:42 -0600, Justin Pryzby wrote:
> Question: are data-checksums tested at all ?  The only thing I can find is that
> some buildfarm members *might* exercise it during installcheck.

There's some coverage via src/bin/pg_basebackup/t/010_pg_basebackup.pl and
src/bin/pg_checksums/t/002_actions.pl - but that's not a whole lot.

Might be worth converting one of the "additional" pg_regress runs to use
data-checksums? E.g. pg_upgrade's, or the one being introduced in the "replay"
test?
https://postgr.es/m/CA%2BhUKGK-%2Bmg6RWiDu0JudF6jWeL5%2BgPmi8EKUm1eAzmdbwiE_A%40mail.gmail.com


> From b67cd05895c8fa42e13e6743db36412a68292607 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 9 Jan 2022 22:54:32 -0600
> Subject: [PATCH 2/7] CI: run initdb with --no-sync for windows

Applied this already.



> From 885becd19f630a69ab1de44cefcdda21ca8146d6 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Tue, 11 Jan 2022 01:30:37 -0600
> Subject: [PATCH 4/7] cirrus/linux: script test.log..
> 
> For consistency, and because otherwise errors can be lost (?) or hard to find.

> -      make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}
> +      script --command "make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}" test.log
>      EOF

I'm not following this one? all the output is in the CI run already, you can
download it already as well?

The only reason to use script as a wrapper is that otherwise make on
freebsd/macos warns about fcntl failures?


>    only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~
'.*\nci-os-only:[^\n]*linux.*'
> @@ -183,7 +178,7 @@ task:
>      mkdir -p ${CCACHE_DIR}
>      chown -R postgres:postgres ${CCACHE_DIR}
>      echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
> -    su postgres -c "ulimit -l -H && ulimit -l -S"
> +    su postgres -c "ulimit -l -H && ulimit -l -S" # XXX

Hm?


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 1/13/22 13:55, Andres Freund wrote:
>> It needs to avoid running on the buildfarm, right ?
> I guess so. It currently appears to have its own logic for finding contrib
> (and other) tap tests:
>
>         foreach my $testdir (glob("$pgsql/contrib/*"))
>         {
>                 next unless -d "$testdir/t";
>                 my $testname = basename($testdir);
>                 next unless step_wanted("contrib-$testname");
>                 print time_str(), "running contrib test $testname ...\n" if $verbose;
>                 run_tap_test("$testdir", "contrib-$testname", undef);
>         }
>
> but does run vcregress contribcheck, modulescheck.
>
>
> Andrew, do you see a better way than what Justin is proposing here?
>

I can probably adjust to whatever we decide to do. But I think we're
really just tinkering at the edges here. What I think we really need is
the moral equivalent of `make check-world` in one invocation of
vcregress.pl.

While we're on the subject of vcregress.pl, there's this recent
discussion, which is on my list of things to return to:
<https://www.postgresql.org/message-id/46c40cc7-db28-b684-379d-43b34daa5ffa%40dunslane.net>


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> I can probably adjust to whatever we decide to do. But I think we're
> really just tinkering at the edges here. What I think we really need is
> the moral equivalent of `make check-world` in one invocation of
> vcregress.pl.

I agree strongly that we need that. But I think a good chunk of Justin's
changes are actually required to get there?

Specifically, unless we want lots of duplicated logic in vcregress.pl, we
need to make vcregress know how to run NO_INSTALLCHECK test. The option added
was just so the buildfarm doesn't start to run tests multiple times...

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> > I can probably adjust to whatever we decide to do. But I think we're
> > really just tinkering at the edges here. What I think we really need is
> > the moral equivalent of `make check-world` in one invocation of
> > vcregress.pl.
> 
> I agree strongly that we need that. But I think a good chunk of Justin's
> changes are actually required to get there?
> 
> Specifically, unless we want lots of duplicated logic in vcregress.pl, we
> need to make vcregress know how to run NO_INSTALLCHECK test. The option added
> was just so the buildfarm doesn't start to run tests multiple times...

The main reason I made the INSTALLCHECK runs conditional (they only run if a
new option is specified) is because of these comments:

| # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
| # which typical installcheck users do not have (e.g. buildfarm clients).
| NO_INSTALLCHECK = 1

Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP
tests aren't being run by CI.   I think vcregress should have an "alltap"
target that runs everything like glob("**/t/").  CI would use that instead of
the existing ssl, auth, subscription, recovery, and bin targets.  The buildfarm
could switch to that after it's been published.

https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de

-- 
Justin



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 1/14/22 18:54, Justin Pryzby wrote:
> On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
>>> I can probably adjust to whatever we decide to do. But I think we're
>>> really just tinkering at the edges here. What I think we really need is
>>> the moral equivalent of `make check-world` in one invocation of
>>> vcregress.pl.
>> I agree strongly that we need that. But I think a good chunk of Justin's
>> changes are actually required to get there?
>>
>> Specifically, unless we want lots of duplicated logic in vcregress.pl, we
>> need to make vcregress know how to run NO_INSTALLCHECK test. The option added
>> was just so the buildfarm doesn't start to run tests multiple times...
> The main reason I made the INSTALLCHECK runs conditional (they only run if a
> new option is specified) is because of these comments:
>
> | # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
> | # which typical installcheck users do not have (e.g. buildfarm clients).
> | NO_INSTALLCHECK = 1
>
> Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP
> tests aren't being run by CI.   I think vcregress should have an "alltap"
> target that runs everything like glob("**/t/").  CI would use that instead of
> the existing ssl, auth, subscription, recovery, and bin targets.  The buildfarm
> could switch to that after it's been published.
>
> https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de




The buildfarm is moving in the opposite direction, to disaggregate
steps. There are several reasons for that, including that it makes for
less log output that you need to churn through o find out what's gone
wrong in a particular case, and that it makes disabling certain test
sets via the buildfarm client's `skip-steps' feature possible.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote:
> The buildfarm is moving in the opposite direction, to disaggregate
> steps.

I'm a bit confused as to where you want changes to vcregress.pl
going. Upthread you argued against adding more granular targets to
vcregress. But this seems to be arguing against that?


> There are several reasons for that, including that it makes for
> less log output that you need to churn through o find out what's gone
> wrong in a particular case, and that it makes disabling certain test
> sets via the buildfarm client's `skip-steps' feature possible.

FWIW, to me this shouldn't require a lot of separate manual test
invocations. And continuing to have lots of granular test invocations from the
buildfarm client is *bad*, because it requires constantly syncing up the set
of test targets.

It also makes the buildfarm far slower than necessary, because it runs a lot
of stuff serially that it could run in parallel. This is particularly a
problem for things like valgrind runs, where individual tests are quite slow -
but just throwing more CPUs at it would help a lot.

We should set things up so that:

a) The output of each test can easily be associated with the corresponding set
   of log files.
b) The list of tests run can be determined generically by looking at the
   filesystems
c) For each test run, it's easy to see whether it failed or succeeded

As part of the meson stuff (which has its own test runner), I managed to get a
bit towards this by changing the log output hierarchy so that each test gets
its own directory for log files (regress_log_*, initdb.log, postmaster.log,
regression.diffs, server log files). What's missing is a
test.{failed,succeeded} marker or such, to make it easy to report the log
files for just failed tasks.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Robert Haas
Дата:
On Mon, Jan 17, 2022 at 1:19 PM Andres Freund <andres@anarazel.de> wrote:
> FWIW, to me this shouldn't require a lot of separate manual test
> invocations. And continuing to have lots of granular test invocations from the
> buildfarm client is *bad*, because it requires constantly syncing up the set
> of test targets.

I have a lot of sympathy with Andrew here, actually. If you just do
'make check-world' and assume that will cover everything, you get one
giant output file. That is not great at all. People who are looking
through buildfarm results do not want to have to look through giant
logfiles hunting for the failure; they want to look at the stuff
that's just directly relevant to the failure they saw. The current BF
is actually pretty bad at this. You can click on various things on a
buildfarm results page, but it's not very clear where those links are
taking you, and the pages at least in my browser (normally Chrome)
render so slowly as to make the whole thing almost unusable. I'd like
to have a thing where the buildfarm shows a list of tests in red or
green and I can click links next to each test to see the various logs
that test produced. That's really hard to accomplish if you just run
all the tests with one invocation - any technique you use to find the
boundaries between one test's output and the next will prove to be
unreliable.

But having said that, I also agree that it sucks to have to keep
updating the BF client every time we want to do any kind of
test-related changes in the main source tree. One way around that
would be to put a file in the main source tree that the build farm
client can read to know what to do. Another would be to have the BF
client download the latest list of steps from somewhere instead of
having it in the source code, so that it can be updated without
everyone needing to update their machine. There might well be other
approaches that are even better. But the "ask Andrew to adjust the BF
client and then have everybody install the new version" approach upon
which we have been relying up until now is not terribly scalable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Adding CI to our tree

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I have a lot of sympathy with Andrew here, actually. If you just do
> 'make check-world' and assume that will cover everything, you get one
> giant output file. That is not great at all.

Yeah.  I agree with Andrew that we want output that is more modular,
not less so.  But we do need to find a way to have less knowledge
hard-wired in the buildfarm client script.

> But having said that, I also agree that it sucks to have to keep
> updating the BF client every time we want to do any kind of
> test-related changes in the main source tree. One way around that
> would be to put a file in the main source tree that the build farm
> client can read to know what to do. Another would be to have the BF
> client download the latest list of steps from somewhere instead of
> having it in the source code, so that it can be updated without
> everyone needing to update their machine.

The obvious place for "somewhere" is "the main source tree", so I
doubt your second suggestion is better than your first.  But your
first does seem like a plausible way to proceed.

Another way to think of it, maybe, is to migrate chunks of the
buildfarm client script itself into the source tree.  I'd rather
that developers not need to become experts on the buildfarm client
to make adjustments to the test process --- but I suspect that
a simple script like "run make check in these directories" is
not going to be flexible enough for everything.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-17 13:50:04 -0500, Robert Haas wrote:
> On Mon, Jan 17, 2022 at 1:19 PM Andres Freund <andres@anarazel.de> wrote:
> > FWIW, to me this shouldn't require a lot of separate manual test
> > invocations. And continuing to have lots of granular test invocations from the
> > buildfarm client is *bad*, because it requires constantly syncing up the set
> > of test targets.
> 
> I have a lot of sympathy with Andrew here, actually. If you just do
> 'make check-world' and assume that will cover everything, you get one
> giant output file. That is not great at all.

I very much agree with check-world output being near unusable.


>  That's really hard to accomplish if you just run all the tests with one
> invocation - any technique you use to find the boundaries between one test's
> output and the next will prove to be unreliable.

I think it's not actually that hard, with something like I described in the
email upthread, with each tests going into a prescribed location, and the
on-disk status being inspectable in an automated way. check-world could invoke
a command to summarize the tests at the end in a .NOTPARALLEL, to make the
local case easier.

pg_regress/isolation style tests have the "summary" output in regression.out,
but we remove it on success.
Tap tests have their output in the regress_log_* files, however these are far
more verbose than the output from running the tests directly.

I wonder if we should keep regression.out and also keep a copy of the
"shorter" tap test output in a file?


> But having said that, I also agree that it sucks to have to keep
> updating the BF client every time we want to do any kind of
> test-related changes in the main source tree.

It also sucks locally. I *hate* having to dig through a long check-world
output to find the first failure.

This subthread is about the windows tests specifically, where it's even worse
- there's no way to run all tests. Nor a list of test targets that's even
halfway complete :/. One just has to know that one has to invoke a myriad of
vcregress.pl taptest path/to/directory

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I think it's not actually that hard, with something like I described in the
> email upthread, with each tests going into a prescribed location, and the
> on-disk status being inspectable in an automated way. check-world could invoke
> a command to summarize the tests at the end in a .NOTPARALLEL, to make the
> local case easier.

That sounds a bit, um, make-centric.  At this point it seems to me
we ought to be thinking about how it'd work under meson.

> This subthread is about the windows tests specifically, where it's even worse
> - there's no way to run all tests.

That's precisely because the windows build doesn't use make.
We shouldn't be thinking about inventing two separate dead-end
solutions to this problem.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-17 14:30:53 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think it's not actually that hard, with something like I described in the
> > email upthread, with each tests going into a prescribed location, and the
> > on-disk status being inspectable in an automated way. check-world could invoke
> > a command to summarize the tests at the end in a .NOTPARALLEL, to make the
> > local case easier.
>
> That sounds a bit, um, make-centric.  At this point it seems to me
> we ought to be thinking about how it'd work under meson.

Some of this is a lot easier with meson. It has a builtin test runner, which
understands tap (thereby not requiring prove anymore). Those tests can be
listed (meson test --list).

Depending on the option this results in a list of all tests with just the
"topline" name of passing tests and error output from failing tests, or all
output all the time or ... At the end it prints a summary of test counts and a
list of failed tests.

Here's an example (the leading timestamps are from CI, not meson), on windows:
https://api.cirrus-ci.com/v1/task/6009638771490816/logs/check.log

The test naming isn't necessarily great, but that's my fault.

Running all the tests with meson takes a good bit less time than running most,
but far from all, tests using vcregress.pl:
https://cirrus-ci.com/build/4611852939296768



meson test makes it far easier to spot which tests failed, it's consistent
across operating systems, allows to skip individual tests, etc.

However: It doesn't address the log collection issue in itself. For that we'd
still need to collect them in a way that's easier to associate with individual
tests.

In the meson branch I made it so that each test (including individual tap
ones) has it's own log directory, which makes it easier to select all the logs
for a failing test etc.


> > This subthread is about the windows tests specifically, where it's even worse
> > - there's no way to run all tests.
>
> That's precisely because the windows build doesn't use make.
> We shouldn't be thinking about inventing two separate dead-end
> solutions to this problem.

Agreed. I think some improvements, e.g. around making the logs easier to
associate with an individual test, is orthogonal to the buildsystem issue.


I think it might still be worth adding stopgap way of running all tap tests on
windows though. Having a vcregress.pl function to find all directories with t/
and run the tests there, shouldn't be a lot of code...

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 1/17/22 13:19, Andres Freund wrote:
> Hi,
>
> On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote:
>> The buildfarm is moving in the opposite direction, to disaggregate
>> steps.
> I'm a bit confused as to where you want changes to vcregress.pl
> going. Upthread you argued against adding more granular targets to
> vcregress. But this seems to be arguing against that?


OK, let me clear that up. Yes I argued upthread for a 'make check-world'
equivalent, because it's useful for developers on Windows. But I don't
really want to use it in the buildfarm client, where I prefer to run
fine-grained tests.


>
>
>> There are several reasons for that, including that it makes for
>> less log output that you need to churn through o find out what's gone
>> wrong in a particular case, and that it makes disabling certain test
>> sets via the buildfarm client's `skip-steps' feature possible.
> FWIW, to me this shouldn't require a lot of separate manual test
> invocations. And continuing to have lots of granular test invocations from the
> buildfarm client is *bad*, because it requires constantly syncing up the set
> of test targets.


Well, the logic we use for TAP tests is we run them for the following if
they have a t subdirectory, subject to configuration settings like
PG_TEST_EXTRA:

 src/test/bin/*

 contrib/*

 src/test/modules/*

 src/test/*


As long as any new TAP test gets place in such a location nothing new is
required in the buildfarm client.


>
> It also makes the buildfarm far slower than necessary, because it runs a lot
> of stuff serially that it could run in parallel. 


That's a legitimate concern. I have made some strides towards gross
parallelism in the buildfarm by providing for different branches to be
run in parallel. This has proven to be fairly successful (i.e. I have
not run into any complaints, and several of my own animals utilize it).
I can certainly look at doing something of the sort for an individual
branch run.


> This is particularly a
> problem for things like valgrind runs, where individual tests are quite slow -
> but just throwing more CPUs at it would help a lot.


When I ran a valgrind animal, `make check` was horribly slow, and it's
already parallelized. But it was on a VM and I forget how many CPUs the
VM had.


>
> We should set things up so that:
>
> a) The output of each test can easily be associated with the corresponding set
>    of log files.
> b) The list of tests run can be determined generically by looking at the
>    filesystems
> c) For each test run, it's easy to see whether it failed or succeeded
>
> As part of the meson stuff (which has its own test runner), I managed to get a
> bit towards this by changing the log output hierarchy so that each test gets
> its own directory for log files (regress_log_*, initdb.log, postmaster.log,
> regression.diffs, server log files). What's missing is a
> test.{failed,succeeded} marker or such, to make it easy to report the log
> files for just failed tasks.


One thing I have been working on is a way to split the log output of an
individual buildfarm step, hitherto just a text blob, , so that you can
easily navigate to say regress_log_0003-foo-step.log without having to
page through myriads of stuff. It's been on the back burner but I need
to prioritize it, maybe when the current CF is done.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Peter Eisentraut
Дата:
On 17.01.22 22:13, Andrew Dunstan wrote:
> Well, the logic we use for TAP tests is we run them for the following if
> they have a t subdirectory, subject to configuration settings like
> PG_TEST_EXTRA:
> 
>   src/test/bin/*
>   contrib/*
>   src/test/modules/*
>   src/test/*
> 
> As long as any new TAP test gets place in such a location nothing new is
> required in the buildfarm client.

But if I wanted to add TAP tests to libpq, then I'd still be stuck.  Why 
not change the above list to "anywhere"?




Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 1/18/22 08:06, Peter Eisentraut wrote:
> On 17.01.22 22:13, Andrew Dunstan wrote:
>> Well, the logic we use for TAP tests is we run them for the following if
>> they have a t subdirectory, subject to configuration settings like
>> PG_TEST_EXTRA:
>>
>>   src/test/bin/*
>>   contrib/*
>>   src/test/modules/*
>>   src/test/*
>>
>> As long as any new TAP test gets place in such a location nothing new is
>> required in the buildfarm client.
>
> But if I wanted to add TAP tests to libpq, then I'd still be stuck. 
> Why not change the above list to "anywhere"?



Sure, very doable, although we will still need some special logic for
src/test - there are security implication from running the ssl, ldap and
kerberos tests by default. See its Makefile.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
> Sure, very doable, although we will still need some special logic for
> src/test - there are security implication from running the ssl, ldap and
> kerberos tests by default. See its Makefile.

ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
not need to duplicate them in the make / msvc world and we'd see them as
skipped tests when not enabled.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andrew Dunstan
Дата:
On 1/18/22 12:44, Andres Freund wrote:
> Hi,
>
> On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
>> Sure, very doable, although we will still need some special logic for
>> src/test - there are security implication from running the ssl, ldap and
>> kerberos tests by default. See its Makefile.
> ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
> not need to duplicate them in the make / msvc world and we'd see them as
> skipped tests when not enabled.
>

Yeah, good idea. Especially if we can backpatch that. The buildfarm
client would also get simpler, so it would be doubleplusgood.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> I think it might still be worth adding stopgap way of running all tap tests on
> windows though. Having a vcregress.pl function to find all directories with t/
> and run the tests there, shouldn't be a lot of code...

I started doing that, however it makes CI/windows even slower.  I think it'll
be necessary to run prove with all the tap tests to parallelize them, rather
than looping around directories, many of which have only a single file, and are
run serially.

https://github.com/justinpryzby/postgres/commits/citest-cirrus
This has the link to a recent cirrus result if you'd want to look.
I suppose I should start a new thread.  

There's a bunch of other stuff for cirrus in there (and bits and pieces of
other branches).

 .  cirrus: spell ccache_maxsize
 .  cirrus: avoid unnecessary double star **
 .  vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
 .  vcregress: style
 .  wip: vcsregress: add alltaptests
 .  wip: run upgrade tests with data-checksums
 .  pg_regress --initdb-opts
 .  wip: pg_upgrade: show list of files copied only in vebose mode
 .  wip: cirrus: include hints how to install OS packages..
 .  wip: cirrus: code coverage
 .  cirrus: upload html docs as artifacts
 .  wip: cirrus/windows: save compiled build as an artifact



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests on
> > windows though. Having a vcregress.pl function to find all directories with t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.

To be expected... Perhaps the caching approach I just posted in [1] would buy
most of it back though...


> I think it'll be necessary to run prove with all the tap tests to
> parallelize them, rather than looping around directories, many of which have
> only a single file, and are run serially.

That's unfortunately not trivially possible. Quite a few tests currently rely
on being called in a specific directory. We should fix this, but it's not a
trivial amount of work.

Greetings,

Andres Freund

[1] https://postgr.es/m/20220119010034.javla5sakeh2a4fa%40alap3.anarazel.de



Re: Adding CI to our tree

От
Tom Lane
Дата:
I just found one thing making check-world slower than it ought to be:
src/test/recovery/t/008_fsm_truncation.pl does

$node_primary->append_conf(
    'postgresql.conf', qq{
fsync = on
wal_log_hints = on
max_prepared_transactions = 5
autovacuum = off
});

There is no reason for this script to be overriding Cluster.pm's
fsync = off setting.  This actually causes parallel check-world to
fail altogether on florican's host, because the initial fsync of
the recovered primary takes more than 3 minutes when there's
conflicting I/O traffic, causing pg_ctl to time out.

This appears to go back to 917dc7d23 of 2016, so I think it just
predates our recognition that we should disable fsync in routine
tests.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
> I just found one thing making check-world slower than it ought to be:
> src/test/recovery/t/008_fsm_truncation.pl does
> 
> $node_primary->append_conf(
>     'postgresql.conf', qq{
> fsync = on
> wal_log_hints = on
> max_prepared_transactions = 5
> autovacuum = off
> });
> 
> There is no reason for this script to be overriding Cluster.pm's
> fsync = off setting.
> 
> This appears to go back to 917dc7d23 of 2016, so I think it just
> predates our recognition that we should disable fsync in routine
> tests.

Yea, I noticed this too. I was wondering if there's a conceivable reason to
actually want fsyncs, but I couldn't come up with one.

On systems where IO isn't overloaded, the main problem with this test are
elsewhere: It multiple times waits for VACUUMs that are blocked truncating the
table. Which these days takes 5 seconds. Thus the test takes quite a while.

To me VACUUM_TRUNCATE_LOCK_TIMEOUT = 5s seems awfully long. On a system with a
lot of tables that's much more than vacuum will take. So this can easily lead
to using up all autovacuum workers...



> This actually causes parallel check-world to fail altogether on florican's
> host, because the initial fsync of the recovered primary takes more than 3
> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.

Ugh.

I noticed a few other sources of "unnecessary" fsyncs.  The most frequent
being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are
surprisingly large, 135k for a freshly initdb'd cluster.


There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
don't really understand what the comment:

    /* Always fsync on close, so the padding gets fsynced */
    if (tar_sync(f) < 0)

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
>> There is no reason for this script to be overriding Cluster.pm's
>> fsync = off setting.
>> This appears to go back to 917dc7d23 of 2016, so I think it just
>> predates our recognition that we should disable fsync in routine
>> tests.

> Yea, I noticed this too. I was wondering if there's a conceivable reason to
> actually want fsyncs, but I couldn't come up with one.

On the one hand, it feels a little wrong if our test suites never
reach our fsync calls at all.  On the other hand, it's not clear
what is meaningful about testing fsync when your test scenario
doesn't include a plug pull.

I'd be okay with having some exercise of the fsync code paths in
a test that is (a) designated for the purpose and (b) arranged
to not take an excessive amount of time, even under heavy load.
008_fsm_truncation.pl is neither of those things.  It seems
entirely random that it has fsync = on when we don't test that
elsewhere.

            regards, tom lane



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
>> This actually causes parallel check-world to fail altogether on florican's
>> host, because the initial fsync of the recovered primary takes more than 3
>> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.

> Ugh.

I misspoke there: it's the standby that is performing an fsync'd
checkpoint and timing out, during the test's promote-the-standby
step.

This test attempt revealed another problem too: the standby never
shut down, and thus the calling "make" never quit, until I intervened
manually.  I'm not sure why.  I see that Cluster::promote uses
system_or_bail() to run "pg_ctl promote" ... could it be that
BAIL_OUT causes the normal script END hooks to not get run?
But it seems like we'd have noticed that long ago.

            regards, tom lane



Re: Adding CI to our tree

От
Tom Lane
Дата:
I wrote:
> This test attempt revealed another problem too: the standby never
> shut down, and thus the calling "make" never quit, until I intervened
> manually.  I'm not sure why.  I see that Cluster::promote uses
> system_or_bail() to run "pg_ctl promote" ... could it be that
> BAIL_OUT causes the normal script END hooks to not get run?
> But it seems like we'd have noticed that long ago.

I failed to reproduce any failure in the promote step, and I now
think I was mistaken and it happened during the standby's initial
start.  I can reproduce that very easily by setting PGCTLTIMEOUT
to a second or two; with fsync enabled, it takes the standby more
than that to reach a consistent state.  And the cause of that
is obvious: Cluster::start thinks that if "pg_ctl start" failed,
there couldn't possibly be a postmaster running.  So it doesn't
bother to update self->_pid, and then the END hook thinks there
is nothing to do.

Now, leaving an idle postmaster hanging around isn't a mortal sin,
since it'll go away by itself shortly after the next cycle of
testing does an "rm -rf" on its data directory.  But it's ugly,
and conceivably it could cause problems for later testing on
machines with limited shmem or semaphore space.

The attached simple fix gets rid of this problem.  Any objections?

            regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db13..fd0738d12d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -845,6 +845,11 @@ sub start
     {
         print "# pg_ctl start failed; logfile:\n";
         print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+
+        # pg_ctl could have timed out, so check to see if there's a pid file;
+        # without this, we fail to shut down the new postmaster later.
+        $self->_update_pid(-1);
+
         BAIL_OUT("pg_ctl start failed") unless $params{fail_ok};
         return 0;
     }
@@ -1123,7 +1128,10 @@ archive_command = '$copy_command'
     return;
 }

-# Internal method
+# Internal method to update $self->{_pid}
+# $is_running = 1: pid file should be there
+# $is_running = 0: pid file should NOT be there
+# $is_running = -1: we aren't sure
 sub _update_pid
 {
     my ($self, $is_running) = @_;
@@ -1138,7 +1146,7 @@ sub _update_pid
         close $pidfile;

         # If we found a pidfile when there shouldn't be one, complain.
-        BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
+        BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0;
         return;
     }

@@ -1146,7 +1154,7 @@ sub _update_pid
     print "# No postmaster PID for node \"$name\"\n";

     # Complain if we expected to find a pidfile.
-    BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
+    BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1;
     return;
 }


Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-01-19 15:05:44 -0500, Tom Lane wrote:
> And the cause of that is obvious: Cluster::start thinks that if "pg_ctl
> start" failed, there couldn't possibly be a postmaster running.  So it
> doesn't bother to update self->_pid, and then the END hook thinks there is
> nothing to do.

Ah, right.

I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
which doesn't generally seem like the right thing to do after a single test
fails. But that's really independent of the fix you make here.


> The attached simple fix gets rid of this problem.  Any objections?

Nope, sounds like a plan.

> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 7af0f8db13..fd0738d12d 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -845,6 +845,11 @@ sub start
>      {
>          print "# pg_ctl start failed; logfile:\n";
>          print PostgreSQL::Test::Utils::slurp_file($self->logfile);
> +
> +        # pg_ctl could have timed out, so check to see if there's a pid file;
> +        # without this, we fail to shut down the new postmaster later.
> +        $self->_update_pid(-1);

I'd maybe do s/later/in the END block/ or such, so that one knows where to
look. Took me a minute to find it again.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
> prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
> which doesn't generally seem like the right thing to do after a single test
> fails. But that's really independent of the fix you make here.

Agreed, that's a separate question.  It does seem like "stop this script
and move to the next one" would be better behavior, though.

> I'd maybe do s/later/in the END block/ or such, so that one knows where to
> look. Took me a minute to find it again.

OK.

            regards, tom lane



pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)

От
Andres Freund
Дата:
Hi,

On 2022-01-18 20:16:46 -0800, Andres Freund wrote:
> I noticed a few other sources of "unnecessary" fsyncs.  The most frequent
> being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are
> surprisingly large, 135k for a freshly initdb'd cluster.

Robert, I assume the fsync for manifests isn't ignoring --no-sync for a
particular reason?

The attached patch adds no-sync handling to the manifest rename, as well as
one case in the directory wal method.


It's a bit painful that we have to have code like

            if (dir_data->sync)
                r = durable_rename(tmppath, tmppath2);
            else
            {
                if (rename(tmppath, tmppath2) != 0)
                {
                    pg_log_error("could not rename file \"%s\" to \"%s\": %m",
                                 tmppath, tmppath2);
                    r = -1;
                }
            }

It seems like it'd be better to set it up so that durable_rename() could
decide internally wether to fsync, or have a wrapper around durable_rename()?


> There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
> don't really understand what the comment:
> 
>     /* Always fsync on close, so the padding gets fsynced */
>     if (tar_sync(f) < 0)

tar_sync() actually checks for tar_data->sync, so it doesn't do an
fsync. Arguably the comment is a bit confusing, but ...

Greetings,

Andres Freund

Вложения

Re: pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)

От
Andres Freund
Дата:
Hi,

On 2022-01-21 12:00:57 -0800, Andres Freund wrote:
> The attached patch adds no-sync handling to the manifest rename, as well as
> one case in the directory wal method.

Pushed that.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Tue, Jan 18, 2022 at 03:08:47PM -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests on
> > windows though. Having a vcregress.pl function to find all directories with t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.  I think it'll
> be necessary to run prove with all the tap tests to parallelize them, rather
> than looping around directories, many of which have only a single file, and are
> run serially.

FYI: I've rebased these against your cirrus/windows changes.

A recent cirrus result is here; this has other stuff in the branch, so you can
see the artifacts with HTML docs and code coverage.

https://github.com/justinpryzby/postgres/runs/5046465342

95793675633 cirrus: spell ccache_maxsize
e5286ede1b4 cirrus: avoid unnecessary double star **
03f6de4643e cirrus: include hints how to install OS packages..
39cc2130e76 cirrus/linux: script test.log..
398b7342349 cirrus/macos: uname -a and export at end of sysinfo
9d0f03d3450 wip: cirrus: code coverage
bff64e8b998 cirrus: upload html docs as artifacts
80f52c3b172 wip: only upload changed docs
7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode
ba229165fe1 wip: run pg_upgrade tests with data-checksums
97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
654b6375401 wip: vcsregress: add alltaptests

BTW, I think the double star added in f3feff825 probably doesn't need to be
double, either:

path: "crashlog-**.txt"

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
> FYI: I've rebased these against your cirrus/windows changes.

Did you put then on a dedicated branch, or only intermixed with other changes?


> A recent cirrus result is here; this has other stuff in the branch, so you can
> see the artifacts with HTML docs and code coverage.

I'm a bit worried about the increased storage and runtime overhead due to the
docs changes. We probably can make it a good bit cheaper though.


> https://github.com/justinpryzby/postgres/runs/5046465342


> 95793675633 cirrus: spell ccache_maxsize

Yep, will apply with a bunch of your other changes, if you answer a question
or two...


> e5286ede1b4 cirrus: avoid unnecessary double star **

Can't get excited about this, but whatever.

What I am excited about is that some of your other changes showed that we
don't need separate *_artifacts for separate directories anymore. That used to
be the case, but an array of paths is now supported. Putting log, diffs, and
regress_log in one directory will be considerably more convenient...


> 03f6de4643e cirrus: include hints how to install OS packages..

What's the idea behind

#echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list

That's already in sources.list, so I'm not sure what this shows?


I think it may be a bit cleaner to have the "install additional packages"
"template step" be just that, and not merge in other contents into it?


> 39cc2130e76 cirrus/linux: script test.log..

I still don't understand what this commit is trying to achieve.


> 398b7342349 cirrus/macos: uname -a and export at end of sysinfo

Shrug.


> 9d0f03d3450 wip: cirrus: code coverage

I don't think it's good to just unconditionally reference the master branch
here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
not other uses.

Perhaps we could have a cfbot special case (by checking for a repository
variable variable indicating the base branch) and show the incremental changes
to that branch? Or we could just check which branch has the smallest distance
in #commits?


If cfbot weren't a thing, I'd just make a code coverage / docs generation a
manual task (startable by a click in the UI). But that requires permission on
the repository...


Hm. I wonder if cfbot could maintain the code not as branches as such, but as
pull requests? Those include information about what the base branch is ;)


Is looking at the .c files in the change really a reliable predictor of where
code coverage changes? I'm doubtful. Consider stuff like removing the last
user of some infrastructure or such. Or adding the first.


> bff64e8b998 cirrus: upload html docs as artifacts
> 80f52c3b172 wip: only upload changed docs

Similar to the above.


> 7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode

I think that should be discussed on a different thread.


> 97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

Probably also worth breaking out into a new thread.


> 654b6375401 wip: vcsregress: add alltaptests

I assume this doesn't yet work to a meaningful degree? Last time I checked
there were quite a few tests that needed to be invoked in a specific
directory.  In the meson branch I worked around that by having a wrapper
around the invocation of individual tap tests that changes CWD.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
> On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
> > FYI: I've rebased these against your cirrus/windows changes.
> 
> Did you put then on a dedicated branch, or only intermixed with other changes?

Yes it's intermixed (because I have too many branches, and because in this case
it's useful to show the doc builds and coverage artifacts).

> > A recent cirrus result is here; this has other stuff in the branch, so you can
> > see the artifacts with HTML docs and code coverage.
> 
> I'm a bit worried about the increased storage and runtime overhead due to the
> docs changes. We probably can make it a good bit cheaper though.

If you mean overhead of additional disk operations, it shouldn't be an issue,
since the git clone uses shared references (not even hardlinks).

If you meant storage capacity, I'm only uploading the *changed* docs as
artifacts.  The motivation being that it's a lot more convenient to look though
a single .html, and not hundreds.

> What's the idea behind
> #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list
> That's already in sources.list, so I'm not sure what this shows?

At one point I thought I needed it - maybe all I needed was "apt-get update"..

> > 9d0f03d3450 wip: cirrus: code coverage
>
> I don't think it's good to just unconditionally reference the master branch
> here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
> not other uses.

That's only used for filtering changed files.  It uses git diff --cherry-pick
postgres/master..., which would *try* to avoid showing anything which is also
in master.

> Is looking at the .c files in the change really a reliable predictor of where
> code coverage changes? I'm doubtful. Consider stuff like removing the last
> user of some infrastructure or such. Or adding the first.

You're right that it isn't particularly accurate, but it's a step forward if
lots of patches use it to check/improve coverge of new code.

In addition to the HTML generated for changed .c files, it's possible to create
a coverage.gcov output for everything, which could be retrieved to generate
full HTML locally.  It's ~8MB (or 2MB after gzip).

> > bff64e8b998 cirrus: upload html docs as artifacts
> > 80f52c3b172 wip: only upload changed docs
> 
> Similar to the above.

Do you mean it's not reliable ?  This is looking at which .html have changed
(not sgml).  Surely that's reliable ?

> > 654b6375401 wip: vcsregress: add alltaptests
> 
> I assume this doesn't yet work to a meaningful degree? Last time I checked
> there were quite a few tests that needed to be invoked in a specific
> directory.

It works - tap_check() does chdir().  But it's slow, and maybe should try to
implement a check-world target.  It currently fails in 027_stream_regress.pl,
although I keep hoping that it had been fixed...
https://cirrus-ci.com/task/6116235950686208

(BTW, I just realized that that commit should also remove the recoverycheck
call.)

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On February 3, 2022 9:04:04 PM PST, Justin Pryzby <pryzby@telsasoft.com> wrote:
>On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
>> On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
>> > FYI: I've rebased these against your cirrus/windows changes.
>>
>> What's the idea behind
>> #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list
>> That's already in sources.list, so I'm not sure what this shows?
>
>At one point I thought I needed it - maybe all I needed was "apt-get update"..

Yes, that's needed. There's no old pre fetched package list, because it'd be outdated anyway, and work *sometimes* for
somepackages... They're also not small (image size influences job start speed heavily). 


>> > 9d0f03d3450 wip: cirrus: code coverage
>>
>> I don't think it's good to just unconditionally reference the master branch
>> here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
>> not other uses.
>
>That's only used for filtering changed files.  It uses git diff --cherry-pick
>postgres/master..., which would *try* to avoid showing anything which is also
>in master.

The point is that master is not a relevant point of comparison when a commit in the 14 branch is tested.


Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > I assume this doesn't yet work to a meaningful degree? Last time I checked
> > there were quite a few tests that needed to be invoked in a specific
> > directory.
> 
> It works - tap_check() does chdir().

Ah, I thought you'd implemented a target that does it all in one prove
invocation...


> It currently fails in 027_stream_regress.pl, although I keep hoping that it
> had been fixed...

That's likely because you're not setting REGRESS_OUTPUTDIR like
src/test/recovery/Makefile and recoverycheck() are doing.


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Tue, Jan 18, 2022 at 05:16:26PM -0800, Andres Freund wrote:
> On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote:
> > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > > I think it might still be worth adding stopgap way of running all tap tests on
> > > windows though. Having a vcregress.pl function to find all directories with t/
> > > and run the tests there, shouldn't be a lot of code...
> > 
> > I started doing that, however it makes CI/windows even slower.
...
> > I think it'll be necessary to run prove with all the tap tests to
> > parallelize them, rather than looping around directories, many of which have
> > only a single file, and are run serially.
> 
> That's unfortunately not trivially possible. Quite a few tests currently rely
> on being called in a specific directory. We should fix this, but it's not a
> trivial amount of work.

On Sat, Feb 05, 2022 at 07:23:39PM -0800, Andres Freund wrote:
> On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > > I assume this doesn't yet work to a meaningful degree? Last time I checked
> > > there were quite a few tests that needed to be invoked in a specific
> > > directory.
> > 
> > It works - tap_check() does chdir().
> 
> Ah, I thought you'd implemented a target that does it all in one prove
> invocation...

I had some success with that, but it doesn't seem to be significantly faster -
it looks a lot like the tests are not actually running in parallel.  I tried
some variations like passing the list of dirs vs the list of files, and
--jobs=9 vs -j9, without success.

https://cirrus-ci.com/task/5580584675180544

https://github.com/justinpryzby/postgres/commit/a865adc5b8c
fc7b3ea8bce vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
03adb043d16 wip: vcsregress: add alltaptests
63bf0796ffd wip: vcregress: run alltaptests in parallel
9dc327f6b30 f!wip: vcregress: run alltaptests in a single prove invocation
a865adc5b8c tmp: run tap tests first

> > It currently fails in 027_stream_regress.pl, although I keep hoping that it
> > had been fixed...
> 
> That's likely because you're not setting REGRESS_OUTPUTDIR like
> src/test/recovery/Makefile and recoverycheck() are doing.

Yes, thanks.

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> I had some success with that, but it doesn't seem to be significantly faster -
> it looks a lot like the tests are not actually running in parallel.

Note that prove unfortunately serializes the test output to be in the order it
started them, even when tests run concurrently. Extremely unhelpful, but ...

Isn't this kind of a good test time? I thought earlier your alltaptests target
took a good bit longer?

One nice bit is that the output is a *lot* easier to read.


You could try improving the total time by having prove remember slow tests and
use that file to run the slowest tests first next time. --state slow,save or
such I believe. Of course we'd have to save that state file...

To verify that tests actually run concurrently you could emit a few
notes. Looks like those are output immediately...

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-03 11:57:18 -0800, Andres Freund wrote:
> > 95793675633 cirrus: spell ccache_maxsize
> 
> Yep, will apply with a bunch of your other changes, if you answer a question
> or two...

Pushed.


> > e5286ede1b4 cirrus: avoid unnecessary double star **
> 
> Can't get excited about this, but whatever.
> 
> What I am excited about is that some of your other changes showed that we
> don't need separate *_artifacts for separate directories anymore. That used to
> be the case, but an array of paths is now supported. Putting log, diffs, and
> regress_log in one directory will be considerably more convenient...

pushed together.


> > 398b7342349 cirrus/macos: uname -a and export at end of sysinfo
> 
> Shrug.

Pushed.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Alvaro adding you because the "branch" discussion in the MERGE thread.


On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > I'm a bit worried about the increased storage and runtime overhead due to the
> > docs changes. We probably can make it a good bit cheaper though.
>
> If you mean overhead of additional disk operations, it shouldn't be an issue,
> since the git clone uses shared references (not even hardlinks).

I was concerned about those and just the increased runtime of the script. Our
sources are 130MB, leaving the shared .git aside. But maybe it's just fine.

We probably can just get rid of the separate clone and configure run though?
Build the docs, copy the output, do a git co parent docs/, build again?


What was the reason behind moving the docs stuff from the compiler warnings
task to linux? Not that either fits very well... I think it might be worth
moving the docs stuff into its own task, using a 1 CPU container (docs build
isn't parallel anyway). Think that'll be easier to see in the cfbot page. I
think it's also good to run it independent of the linux task succeeding - a
docs failure seems like a separate enough issue.


> > > 9d0f03d3450 wip: cirrus: code coverage
> >
> > I don't think it's good to just unconditionally reference the master branch
> > here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
> > not other uses.
>
> That's only used for filtering changed files.  It uses git diff --cherry-pick
> postgres/master..., which would *try* to avoid showing anything which is also
> in master.

You commented in another email on this:

On 2022-02-11 12:51:50 -0600, Justin Pryzby wrote:
> Because I put your patch on top of some other branch with the CI coverage (and
> other stuff).
>
> It tries to only show files changed by the branch being checked:
> https://github.com/justinpryzby/postgres/commit/d668142040031915
>
> But it has to figure out where the branch "starts".  Which I did by looking at
> "git diff --cherry-pick origin..."
>
> Andres thinks that does the wrong thing if CI is run manually (not by CFBOT)
> for patches against backbranches.  I'm not sure git diff --cherry-pick is
> widely known/used, but I think using that relative to master may be good
> enough.

Note that I'm not concerned about "manually" running CI against other branches
- I'm concerned about the point where where 15 branches off and CI will
automatically also run against 15. E.g. in the postgres repo
https://cirrus-ci.com/github/postgres/postgres/

I can see a few ways to deal with this:
1) iterate over release branches and see which has the smallest diff
2) parse the branch name, if it's a cfbot run, we know it's master, otherwise skip
3) change cfbot so that it maintains things as pull requests, which have a
   base branch


> > Is looking at the .c files in the change really a reliable predictor of where
> > code coverage changes? I'm doubtful. Consider stuff like removing the last
> > user of some infrastructure or such. Or adding the first.
>
> You're right that it isn't particularly accurate, but it's a step forward if
> lots of patches use it to check/improve coverge of new code.

Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
-> ~7.15m), but probably acceptable. Although I also would like to enable
-fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
well (-fsanitize=address is a lot more expensive), they both work best on
linux.


> In addition to the HTML generated for changed .c files, it's possible to create
> a coverage.gcov output for everything, which could be retrieved to generate
> full HTML locally.  It's ~8MB (or 2MB after gzip).

Note sure that doing doing it locally adds much over just running tests
locally.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > > e5286ede1b4 cirrus: avoid unnecessary double star **
> > 
> > Can't get excited about this, but whatever.
> > 
> > What I am excited about is that some of your other changes showed that we
> > don't need separate *_artifacts for separate directories anymore. That used to
> > be the case, but an array of paths is now supported. Putting log, diffs, and
> > regress_log in one directory will be considerably more convenient...
> 
> pushed together.

While rebasing, I noticed an error.

You wrote **/.diffs, but should be **/*.diffs

--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -30,15 +30,11 @@ env:
 # What files to preserve in case tests fail
 on_failure: &on_failure
   log_artifacts:
-    path: "**/**.log"
+    paths:
+      - "**/*.log"
+      - "**/.diffs"
+      - "**/regress_log_*"
     type: text/plain
-  regress_diffs_artifacts:
-    path: "**/**.diffs"
-    type: text/plain
-  tap_artifacts:
-    path: "**/regress_log_*"
-    type: text/plain
-



Re: Adding CI to our tree

От
Andres Freund
Дата:
On 2022-02-12 20:47:04 -0600, Justin Pryzby wrote:
> While rebasing, I noticed an error.
> 
> You wrote **/.diffs, but should be **/*.diffs

Embarassing. Thanks for noticing! Pushed the fix...



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote:
> On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > > I'm a bit worried about the increased storage and runtime overhead due to the
> > > docs changes. We probably can make it a good bit cheaper though.
> >
> > If you mean overhead of additional disk operations, it shouldn't be an issue,
> > since the git clone uses shared references (not even hardlinks).
> 
> I was concerned about those and just the increased runtime of the script. Our
> sources are 130MB, leaving the shared .git aside. But maybe it's just fine.
> 
> We probably can just get rid of the separate clone and configure run though?
> Build the docs, copy the output, do a git co parent docs/, build again?

Yes - works great, thanks.

> What was the reason behind moving the docs stuff from the compiler warnings
> task to linux?

I wanted to build docs even if the linux task fails.  To allow CFBOT to link to
them, so somoene can always review the docs, in HTML (rather than reading SGML
with lines prefixed with +).

> Not that either fits very well... I think it might be worth
> moving the docs stuff into its own task, using a 1 CPU container (docs build
> isn't parallel anyway). Think that'll be easier to see in the cfbot page. I

Yeah.  The only drawback is the duplication (including the "git parent" stuff).

BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet-man.xsl postgres.sgml

> 1) iterate over release branches and see which has the smallest diff

Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; done |sort -n |head -1

> > > Is looking at the .c files in the change really a reliable predictor of where
> > > code coverage changes? I'm doubtful. Consider stuff like removing the last
> > > user of some infrastructure or such. Or adding the first.
> >
> > You're right that it isn't particularly accurate, but it's a step forward if
> > lots of patches use it to check/improve coverge of new code.
> 
> Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
> -> ~7.15m), but probably acceptable. Although I also would like to enable
> -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
> well (-fsanitize=address is a lot more expensive), they both work best on
> linux.

There's other things that it'd be nice to enable, but maybe these don't need to
be on by default.  Maybe just have a list of optional compiler flags (and hints
when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.

> > In addition to the HTML generated for changed .c files, it's possible to create
> > a coverage.gcov output for everything, which could be retrieved to generate
> > full HTML locally.  It's ~8MB (or 2MB after gzip).
> 
> Note sure that doing doing it locally adds much over just running tests
> locally.

You're right, since one needs to have the patched source files to generate the
HTML.

On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
> I think it may be a bit cleaner to have the "install additional packages"
> "template step" be just that, and not merge in other contents into it?

I renamed the "cores" task since it consistently makes me think you're doing
with CPU cores.  It took it as an opportunity to generalize the task.

These patches are ready for review.  I'll plan to mail about TAP stuff
tomorrow.

Вложения

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-12 23:19:38 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote:
> > What was the reason behind moving the docs stuff from the compiler warnings
> > task to linux?
> 
> I wanted to build docs even if the linux task fails.  To allow CFBOT to link to
> them, so somoene can always review the docs, in HTML (rather than reading SGML
> with lines prefixed with +).

I'd be ok with running the compiler warnings job without the dependency, if
that's the connection.


> BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl postgres.sgml
> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet-man.xsl postgres.sgml

Sure, it just doesn't make a difference:

make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/
real    0m34.626s
user    0m34.342s
sys    0m0.298s

make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/

real    0m34.780s
user    0m34.494s
sys    0m0.285s



> > 1) iterate over release branches and see which has the smallest diff
> 
> Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; done |sort -n |head -1
> 
> > > > Is looking at the .c files in the change really a reliable predictor of where
> > > > code coverage changes? I'm doubtful. Consider stuff like removing the last
> > > > user of some infrastructure or such. Or adding the first.
> > >
> > > You're right that it isn't particularly accurate, but it's a step forward if
> > > lots of patches use it to check/improve coverge of new code.
> > 
> > Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
> > -> ~7.15m), but probably acceptable. Although I also would like to enable
> > -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
> > well (-fsanitize=address is a lot more expensive), they both work best on
> > linux.
> 
> There's other things that it'd be nice to enable, but maybe these don't need to
> be on by default.  Maybe just have a list of optional compiler flags (and hints
> when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.

I think it'd be good to enable a reasonable set by default. Particularly for
newer contributors stuff like forgetting in/out/readfuncs, or not knowing
about some undefined behaviour, is easy. Probably makes sense to use different
settings on different tasks.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Robert Haas
Дата:
On Sun, Feb 13, 2022 at 3:30 AM Andres Freund <andres@anarazel.de> wrote:
> > There's other things that it'd be nice to enable, but maybe these don't need to
> > be on by default.  Maybe just have a list of optional compiler flags (and hints
> > when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.
>
> I think it'd be good to enable a reasonable set by default. Particularly for
> newer contributors stuff like forgetting in/out/readfuncs, or not knowing
> about some undefined behaviour, is easy. Probably makes sense to use different
> settings on different tasks.

This is exactly why I'm not a huge fan of having ci stuff in the tree.
It supposes that there's one right way to do a build, but in reality,
different people want and indeed need to use different options for all
kinds of reasons. That's the whole value of having things like
configure and pg_config_manual.h. When we start arguing about whether
or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
into the realm where no choice is objectively better, and whoever
yells the loudest will get it the way they want, and then somebody
else later will say "well that's dumb I don't like that" or even just
"well that's not the right thing for testing MY patch," at which point
the previous mailing list discussion will be cited as "precedent" for
what was essentially an arbitrary decision made by 1 or 2 people.

Mind you, I'm not trying to hold back the tide. I realize that in-tree
ci stuff is very much in vogue. But I think it would be a very healthy
thing if we acknowledged that what goes in there is far more arbitrary
than most of what we put in the tree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Adding CI to our tree

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> This is exactly why I'm not a huge fan of having ci stuff in the tree.
> It supposes that there's one right way to do a build, but in reality,
> different people want and indeed need to use different options for all
> kinds of reasons. That's the whole value of having things like
> configure and pg_config_manual.h. When we start arguing about whether
> or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
> into the realm where no choice is objectively better,

Right.  Can we set things up so that it's not too painful to inject
custom build options into a CI build?  I should think that at the
very least one needs to be able to vary the configure switches and
CPPFLAGS/CFLAGS.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > This is exactly why I'm not a huge fan of having ci stuff in the tree.
> > It supposes that there's one right way to do a build, but in reality,
> > different people want and indeed need to use different options for all
> > kinds of reasons.

Sure. But why is that an argument against "having ci stuff in the tree"?

All it does is to make sure that a certain base-level of testing is easy to
achieve for everyone. I don't like working on windows or mac, but my patches
often have platform dependent bits. Now it's less likely that I need to
manually interact with windows.

I don't think we can (or well should) replace the buildfarm with the CI
stuff. The buildfarm provides extensive and varied coverage for master/release
branches. Which isn't feasible for unmerged development work, including cfbot,
from a resource usage POV alone.


> > That's the whole value of having things like
> > configure and pg_config_manual.h. When we start arguing about whether
> > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
> > into the realm where no choice is objectively better,

> Right.  Can we set things up so that it's not too painful to inject
> custom build options into a CI build?

What kind of injection are you thinking about? A patch author can obviously
just add options in .cirrus.yml. That's something possible now, that was not
possible with cfbot applying its own .cirrus.yml

It'd be nice if there were a way to do it more easily for msvc and configure
builds together, right now it'd require modifying those tasks in different
ways. But that's not really a CI question.


I'd like to have things like -fanitize=aligned and
-DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's
benefit. Most patch authors won't know about using
-DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling
them. We're *really* not doing well on the "timely review" side of things, so
we at least should not waste time on high latency back-forth for easily
automatically detectable things.


> I should think that at the very least one needs to be able to vary the
> configure switches and CPPFLAGS/CFLAGS.

Do you mean as part of a patch tested with cfbot, CI running for pushes to
your own repository, or ...?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
>> Right.  Can we set things up so that it's not too painful to inject
>> custom build options into a CI build?

> What kind of injection are you thinking about?

That's exactly what needs to be decided.

> A patch author can obviously
> just add options in .cirrus.yml. That's something possible now, that was not
> possible with cfbot applying its own .cirrus.yml

Fine, but are committers supposed to keep track of the fact that they
shouldn't commit that part of a patch?  I'd prefer something a bit more
out-of-band.  I don't know this technology well enough to propose a way.

> I'd like to have things like -fanitize=aligned and
> -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's
> benefit. Most patch authors won't know about using
> -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling
> them. We're *really* not doing well on the "timely review" side of things, so
> we at least should not waste time on high latency back-forth for easily
> automatically detectable things.

I don't personally have an objection to either of those; maybe Robert
does.

            regards, tom lane



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-13 15:09:20 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
> >> Right.  Can we set things up so that it's not too painful to inject
> >> custom build options into a CI build?
> 
> > What kind of injection are you thinking about?
> 
> That's exactly what needs to be decided.
> [...]
> I'd prefer something a bit more out-of-band.  I don't know this technology
> well enough to propose a way.

I don't yet understand the precise use case for adjustments well enough to
propose something. Who would like to adjust something for what purpose?  The
"original" author, for a one-off test? A reviewer / committer, to track down a
hunch?

If it's about CI runs in in a personal repository, one can set additional
environment vars from the CI management interface. We can make sure they work
(the ci stuff probably overrides CFLAGS, but COPT should work) and document
the way to do so.


> > A patch author can obviously
> > just add options in .cirrus.yml. That's something possible now, that was not
> > possible with cfbot applying its own .cirrus.yml
> 
> Fine, but are committers supposed to keep track of the fact that they
> shouldn't commit that part of a patch?

I'd say it depends on the the specific modification - there's some patches
where it seems to make sense to adjust extend CI as part of it and have it
merged. But yea, in other cases committers would have to take them out.


For more on-off stuff one would presumably not want to spam the list the list
with a full patchset to trigger a cfbot run, nor wait till cfbot gets round to
that patch again. When pushing to a personal repo it's of course easy to just
have a commit on-top of what's submitted to play around with compile options.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > What I am excited about is that some of your other changes showed that we
> > don't need separate *_artifacts for separate directories anymore. That used to
> > be the case, but an array of paths is now supported. Putting log, diffs, and
> > regress_log in one directory will be considerably more convenient...
> 
> pushed together.

This change actually complicates things.

Before, there was log/src/test/recovery/tmp_check/log, with a few files for
001, a few for 002, a few for 003.  This are a lot of output files, but at
least they're all related.

Now, there's a single log/tmp_check/log, which has logs for the entire tap
tests.  There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc.  

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-13 15:02:50 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > > What I am excited about is that some of your other changes showed that we
> > > don't need separate *_artifacts for separate directories anymore. That used to
> > > be the case, but an array of paths is now supported. Putting log, diffs, and
> > > regress_log in one directory will be considerably more convenient...
> >
> > pushed together.
>
> This change actually complicates things.
>
> Before, there was log/src/test/recovery/tmp_check/log, with a few files for
> 001, a few for 002, a few for 003.  This are a lot of output files, but at
> least they're all related.

> Now, there's a single log/tmp_check/log, which has logs for the entire tap
> tests.  There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc.

Hm? Doesn't look like that to me, and I don't see why it would work that way?
This didn't do anything to flatten the directory hierarchy, just combine three
hierarchies into one?

What I see, and what I expect, is that logs end up in
e.g. log/src/test/recovery/tmp_check/log but that that directory contains
regress_log_*, as well as *.log?  Before one needed to go through the
hierarchy multiple times to see both regress_log_ (i.e. tap test log) as well
as 0*.log (i.e. server logs).

A random example: https://cirrus-ci.com/task/5152523873943552 shows the logs
for the failure in log/src/bin/pg_upgrade/tmp_check/log


If you're seeing this on windows on one of your test branches, that's much
more likely to be caused by the alltaptests stuff, than by the change in
artifact instruction.


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sun, Feb 13, 2022 at 01:23:16PM -0800, Andres Freund wrote:
> If you're seeing this on windows on one of your test branches, that's much
> more likely to be caused by the alltaptests stuff, than by the change in
> artifact instruction.

Oh - I suppose you're right.  That's an unfortunate consequence of running a
single prove instance without chdir.

-- 
Justin



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote:
> On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> > I had some success with that, but it doesn't seem to be significantly faster -
> > it looks a lot like the tests are not actually running in parallel.

Note that the total test time is close to the sum of the individual test times.
But I think that may be an artifact of how prove is showing/attributing times
to each test (which, if so, is misleading).

> Note that prove unfortunately serializes the test output to be in the order it
> started them, even when tests run concurrently. Extremely unhelpful, but ...

Are you sure ?  When I run it locally, I see:
rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl TESTDIR=`pwd`/src/test/recovery
PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATHPG_REGRESS=`pwd`/src/test/regress/pg_regress
REGRESS_SHLIB=`pwd`/src/test/regress/regress.soprove --time -j4 --ext '*.pl' `find src -name t`
 
...
[15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ....................... ok      104 ms ( 0.00 usr  0.00 sys +  2.35
cusr 0.47 csys =  2.82 CPU)
 
[15:34:49] src/bin/scripts/t/090_reindexdb.pl .......................... ok     8894 ms ( 0.06 usr  0.01 sys + 14.45
cusr 3.38 csys = 17.90 CPU)
 
[15:34:50] src/bin/pg_config/t/001_pg_config.pl ........................ ok       79 ms ( 0.00 usr  0.01 sys +  0.23
cusr 0.04 csys =  0.28 CPU)
 
[15:34:50] src/bin/pg_waldump/t/001_basic.pl ........................... ok       35 ms ( 0.00 usr  0.00 sys +  0.26
cusr 0.02 csys =  0.28 CPU)
 
[15:34:51] src/bin/pg_test_fsync/t/001_basic.pl ........................ ok      100 ms ( 0.01 usr  0.00 sys +  0.24
cusr 0.04 csys =  0.29 CPU)
 
[15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl ........ ok      177 ms ( 0.02 usr  0.00 sys +  0.26
cusr 0.03 csys =  0.31 CPU)
 
[15:34:55] src/bin/scripts/t/100_vacuumdb.pl ........................... ok    11267 ms ( 0.12 usr  0.04 sys + 13.47
cusr 3.20 csys = 16.83 CPU)
 
[15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl .................... ok     5802 ms ( 0.06 usr  0.01 sys +  7.70
cusr 1.37 csys =  9.14 CPU)
 
...

=> scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ stuff.

But I never saw that in cirrus.

> Isn't this kind of a good test time? I thought earlier your alltaptests target
> took a good bit longer?

The original alltaptests runs in 16m 21s.
https://cirrus-ci.com/task/6679061752709120

2 weeks ago, it was ~14min with your patch to cache initdb.
https://cirrus-ci.com/task/5439320633901056

As I recall, that didn't seem to improve runtime when combined with my parallel
patch.

> One nice bit is that the output is a *lot* easier to read.
> 
> You could try improving the total time by having prove remember slow tests and
> use that file to run the slowest tests first next time. --state slow,save or
> such I believe. Of course we'd have to save that state file...

In a test, this hurt rather than helped (13m 42s).
https://cirrus-ci.com/task/6359167186239488

I'm not surprised - it makes sense to run 10 fast tests at once, but usually
doesn't make sense to run 10 slow tests tests at once (at least a couple of
which are doing something intensive).  It was faster (12m16s) to do it
backwards (fastest tests first).
https://cirrus-ci.com/task/5745115443494912

BTW, does it make sense to remove test_regress_parallel_script ?  The
pg_upgrade run would do the same things, no ?  If so, it might make sense to
run that first.  OTOH, you suggested to run the upgrade tests with checksums
enabled, which seems like a good idea.

Note that in the attached patches, I changed the msvc "warnings" to use "tee".

I don't know how to fix the pipeline test in a less hacky way...

You said the docs build should be a separate task, but then said that it'd be
okay to remove the dependency.  So I did it both ways.  There's currently some
duplication between the docs patch and code coverage patch.

-- 
Justin

Вложения

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
> Oh - I suppose you're right.  That's an unfortunate consequence of running a
> single prove instance without chdir.

I don't think it's chdir that's relevant (that changes into the source
directory after all). It's the TESTDIR environment variable.

I was thinking that we should make Utils.pm's INIT block responsible for
figuring out both the directory a test should run in and the log location,
instead having that in vcregress.pl and Makefile.global.in. Mostly because
doing it in the latter means we can't start tests with different TESTDIR and
working dir at the same time.

If instead we pass the location of the top-level build and top-level source
directory from vcregress.pl / Makefile.global, the tap test infrastructure can
figure out that stuff themselves, on a per-test basis.

For msvc builds we probably would need to pass in some information that allow
Utils.pm to set up PATH appropriately. I think that might just require knowing
that a) msvc build system is used b) Release vs Debug.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-13 15:42:13 -0600, Justin Pryzby wrote:
> > Note that prove unfortunately serializes the test output to be in the order it
> > started them, even when tests run concurrently. Extremely unhelpful, but ...
> 
> Are you sure ?

Somewhat. I think it's a question of the prove version and some autodetection
of what type of environment prove is running in (stdin/stdout/stderr). I don't
remember the details, but at some point I pinpointed the source of the
serialization, and verified that parallelization makes a significant
difference in runtime even without being easily visible :(.  But this is all
vague memory, so I might be wrong.

Reminds me that somebody (ugh, me???) should fix the perl > 5.26
incompatibilities on windows, then we'd also get a newer prove...



> > One nice bit is that the output is a *lot* easier to read.
> > 
> > You could try improving the total time by having prove remember slow tests and
> > use that file to run the slowest tests first next time. --state slow,save or
> > such I believe. Of course we'd have to save that state file...
> 
> In a test, this hurt rather than helped (13m 42s).
> https://cirrus-ci.com/task/6359167186239488
> 
> I'm not surprised - it makes sense to run 10 fast tests at once, but usually
> doesn't make sense to run 10 slow tests tests at once (at least a couple of
> which are doing something intensive).  It was faster (12m16s) to do it
> backwards (fastest tests first).
> https://cirrus-ci.com/task/5745115443494912

Hm.

I know I saw significant reduction in test times locally with meson by
starting slow tests earlier, because they're the limiting factor for the
*overall* test runtime - but I have more resources than on cirrus.  Even
locally on a windows VM, with the current buildsystem, I found that moving 027
to earlier withing recoverycheck reduced the test time.

But it's possible that with all tests being scheduled concurrently, starting
the slow tests early leads to sufficient resource overcommit to be
problematic.


> BTW, does it make sense to remove test_regress_parallel_script ?  The
> pg_upgrade run would do the same things, no ?  If so, it might make sense to
> run that first.  OTOH, you suggested to run the upgrade tests with checksums
> enabled, which seems like a good idea.

No, I don't think so. The main regression tests are by far the most important
thing during normal development. Just relying on main regression test runs
embedded in other tests, with different output and config of the main
regression test imo is just confusing.


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
> > Oh - I suppose you're right.  That's an unfortunate consequence of running a
> > single prove instance without chdir.
> 
> I don't think it's chdir that's relevant (that changes into the source
> directory after all). It's the TESTDIR environment variable.
> 
> I was thinking that we should make Utils.pm's INIT block responsible for
> figuring out both the directory a test should run in and the log location,
> instead having that in vcregress.pl and Makefile.global.in. Mostly because
> doing it in the latter means we can't start tests with different TESTDIR and
> working dir at the same time.
> 
> If instead we pass the location of the top-level build and top-level source
> directory from vcregress.pl / Makefile.global, the tap test infrastructure can
> figure out that stuff themselves, on a per-test basis.
> 
> For msvc builds we probably would need to pass in some information that allow
> Utils.pm to set up PATH appropriately. I think that might just require knowing
> that a) msvc build system is used b) Release vs Debug.

I'm totally unsure if this resembles what you're thinking of, and I'm surprised
I got it working so easily.  But it gets the tap test output in separate dirs,
and CI is passing for everyone (windows failed because I injected a "false" to
force it to upload artifacts).

https://github.com/justinpryzby/postgres/runs/5211673291

commit 899e562102dd7a663cb087cdf88f0f78f8302492
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Tue Feb 15 20:02:36 2022 -0600

    wip: set TESTDIR from src/test/perl rather than Makefile/vcregress

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 05c54b27def..1e49d8c8c37 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -450,7 +450,7 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
    PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \
    PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
    $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
@@ -460,7 +460,7 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
    PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
    PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
    $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
@@ -471,7 +471,7 @@ define prove_check
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
    PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
    $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index 005961f34d4..a86dc78a365 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -70,7 +70,7 @@ delete $ENV{LS_COLORS};
 # to run in the build directory so that we can use relative paths to
 # access the tmp_check subdirectory; otherwise the output from filename
 # completion tests is too variable.
-if ($ENV{TESTDIR})
+if ($ENV{TESTDIR} && 0)
 {
     chdir $ENV{TESTDIR} or die "could not chdir to \"$ENV{TESTDIR}\": $!";
 }
diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
index facfec5cad4..2a0eca77440 100644
--- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
+++ b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
@@ -49,9 +49,7 @@ for my $testname (@tests)
         my $expected;
         my $result;
 
-        # Hack to allow TESTDIR=. during parallel tap tests
-        my $inputdir = "$ENV{'TESTDIR'}/src/test/modules/libpq_pipeline";
-        $inputdir = "$ENV{'TESTDIR'}" if ! -e $inputdir;
+        my $inputdir = "$ENV{'TESTDIR'}/tmp_check";
         $expected = slurp_file_eval("$inputdir/traces/$testname.trace");
         next unless $expected ne "";
         $result = slurp_file_eval($traceout);
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 57fcb240898..5429de41ed5 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -184,19 +184,21 @@ INIT
     # test may still fail, but it's more likely to report useful facts.
     $SIG{PIPE} = 'IGNORE';
 
-    # Determine output directories, and create them.  The base path is the
-    # TESTDIR environment variable, which is normally set by the invoking
-    # Makefile.
-    $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check";
+    my $test_dir = File::Spec->rel2abs(dirname($0));
+    my $test_name = basename($0);
+    $test_name =~ s/\.[^.]+$//;
+
+    # Determine output directories, and create them.
+    # TODO: set TESTDIR and srcdir?
+    $tmp_check = "$test_dir/tmp_check";
     $log_path = "$tmp_check/log";
+    $ENV{TESTDIR} = $test_dir;
 
     mkdir $tmp_check;
     mkdir $log_path;
 
     # Open the test log file, whose name depends on the test name.
-    $test_logfile = basename($0);
-    $test_logfile =~ s/\.[^.]+$//;
-    $test_logfile = "$log_path/regress_log_$test_logfile";
+    $test_logfile = "$log_path/regress_log_$test_name";
     open my $testlog, '>', $test_logfile
       or die "could not open STDOUT to logfile \"$test_logfile\": $!";
 
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index fdb6f44eded..d7794c5766a 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -261,10 +261,8 @@ sub tap_check
     $ENV{PG_REGRESS}    = "$topdir/$Config/pg_regress/pg_regress";
     $ENV{REGRESS_SHLIB} = "$topdir/src/test/regress/regress.dll";
 
-    $ENV{TESTDIR} = "$dir";
     my $module = basename $dir;
-    # add the module build dir as the second element in the PATH
-    $ENV{PATH} =~ s!;!;$topdir/$Config/$module;!;
+    $ENV{VCREGRESS_MODE} = $Config;
 
     print "============================================================\n";
     print "Checking @args\n";



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On February 15, 2022 10:12:36 PM PST, Justin Pryzby <pryzby@telsasoft.com> wrote:
>On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
>> > Oh - I suppose you're right.  That's an unfortunate consequence of running a
>> > single prove instance without chdir.
>>
>> I don't think it's chdir that's relevant (that changes into the source
>> directory after all). It's the TESTDIR environment variable.
>>
>> I was thinking that we should make Utils.pm's INIT block responsible for
>> figuring out both the directory a test should run in and the log location,
>> instead having that in vcregress.pl and Makefile.global.in. Mostly because
>> doing it in the latter means we can't start tests with different TESTDIR and
>> working dir at the same time.
>>
>> If instead we pass the location of the top-level build and top-level source
>> directory from vcregress.pl / Makefile.global, the tap test infrastructure can
>> figure out that stuff themselves, on a per-test basis.
>>
>> For msvc builds we probably would need to pass in some information that allow
>> Utils.pm to set up PATH appropriately. I think that might just require knowing
>> that a) msvc build system is used b) Release vs Debug.
>
>I'm totally unsure if this resembles what you're thinking of, and I'm surprised
>I got it working so easily.  But it gets the tap test output in separate dirs,
>and CI is passing for everyone (windows failed because I injected a "false" to
>force it to upload artifacts).
>
>https://github.com/justinpryzby/postgres/runs/5211673291

Yes, that's along the lines I was thinking. I only checked it on my phone, so it certainly isn't a careful look...

I think this should be discussed in a separate thread, for visibility.

FWIW, I'd like to additionally add marker files in INIT and remove them in END. And create files signaling success and
failurein END. That would allow automated selection of log files of failed tests... 

Andres

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Adding CI to our tree

От
Peter Eisentraut
Дата:
On 13.02.22 09:30, Andres Freund wrote:
>> BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
>> /usr/bin/xmllint --path . --noout --valid postgres.sgml
>> /usr/bin/xmllint --path . --noout --valid postgres.sgml
>> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl postgres.sgml
>> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet-man.xsl postgres.sgml
> Sure, it just doesn't make a difference:
> 
> make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/
> real    0m34.626s
> user    0m34.342s
> sys    0m0.298s
> 
> make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/
> 
> real    0m34.780s
> user    0m34.494s
> sys    0m0.285s

Note that the default target in doc/src/sgml/ is "html", not "all".  If 
you build "all", you build "html" plus "man", which can be run in 
parallel.  (It's only two jobs, of course.)  If you're more ambitious, 
you could also run the PDF builds.




Re: Adding CI to our tree (ccache)

От
Justin Pryzby
Дата:
Have you tried to use the yet-to-be-released ccache with MSVC ?

Also, do you know about msbuild /outputResultsCache ?
When I tried that, it gave a bunch of error.

https://cirrus-ci.com/task/5697497241747456

|[16:35:13.605]      1>c:\cirrus\pgsql.sln.metaproj : error : MSB4252: Project "c:\cirrus\pgsql.sln" with global
properties[c:\cirrus\pgsql.sln] 
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :     (TrackFileAccess=false; CLToolExe=clcache.exe)
[c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :     is building project "c:\cirrus\initdb.vcxproj" with global
properties[c:\cirrus\pgsql.sln] 
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :     (TrackFileAccess=false; CLToolExe=clcache.exe;
BuildingSolutionFile=true;CurrentSolutionConfigurationContents=<SolutionConfiguration> [c:\cirrus\pgsql.sln] 
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :   <ProjectConfiguration
Project="{1BD4D6DB-9B78-4A46-B2A7-04508802E281}"AbsolutePath="c:\cirrus\initdb.vcxproj"
BuildProjectInSolution="True">Debug|x64</ProjectConfiguration>[c:\cirrus\pgsql.sln] 
|...
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :   <ProjectConfiguration
Project="{7E9336CA-5E94-4D99-9D34-BF65ED440A6F}"AbsolutePath="c:\cirrus\euc2004_sjis2004.vcxproj"
BuildProjectInSolution="True">Debug|x64</ProjectConfiguration>[c:\cirrus\pgsql.sln] 
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error : </SolutionConfiguration>; SolutionDir=c:\cirrus\;
SolutionExt=.sln;SolutionFileName=pgsql.sln; SolutionName=pgsql; SolutionPath=c:\cirrus\pgsql.sln; Configuration=Debug;
Platform=x64)[c:\cirrus\pgsql.sln] 
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :     with the (default) target(s) but the build result for
thebuilt project is not in the engine cache. In isolated builds this could mean one of the following:
[c:\cirrus\pgsql.sln]
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :     - the reference was called with a target which is not
specifiedin the ProjectReferenceTargets item in project "c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln] 
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :     - the reference was called with global properties
thatdo not match the static graph inferred nodes [c:\cirrus\pgsql.sln] 
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :     - the reference was not explicitly specified as a
ProjectReferenceitem in project "c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln] 
|[16:35:14.518]        c:\cirrus\pgsql.sln.metaproj : error :      [c:\cirrus\pgsql.sln]
|[16:35:14.518]
|[16:35:14.518]     0 Warning(s)
|[16:35:14.518]     149 Error(s)

Did you ever try to use clcache (or others) ?

When I tried, it refused to cache because of our debug settings
(DebugInformationFormat) - which seem to be enabled even in release mode.

I wonder if that'll be an issue for ccache, too.  I think that line may need to
be conditional on debug mode.

https://cirrus-ci.com/task/4808554103177216

|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py Expanded commandline '['/c',
'/Isrc/include','/Isrc/include/port/win32', '/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi',
'/nologo','/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', '_WINDOWS', '/D', '__WINDOWS__', '/D',
'__WIN32__','/D', 'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', '_CRT_NONSTDC_NO_DEPRECATE',
'/D','FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', '/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope',
'/Zc:inline','/Fo.\\Release\\libpgcommon\\', '/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd',
'/TC','/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', '/FC', '/errorReport:queue', '/MP',
'src/common/archive.c','src/common/base64.c', 'src/common/checksum_helper.c', 'src/common/config_info.c',
'src/common/controldata_utils.c','src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c',
'src/common/exec.c','src/common/f2s.c', 'src/common/fe_memutils.c', 'src/common/file_perm.c',
'src/common/file_utils.c','src/common/hashfn.c', 'src/common/hmac_openssl.c', 'src/common/ip.c',
'src/common/jsonapi.c','src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c',
'src/common/logging.c','src/common/md5_common.c', 'src/common/pg_get_line.c', 'src/common/pg_lzcompress.c',
'src/common/pg_prng.c','src/common/pgfnames.c', 'src/common/protocol_openssl.c', 'src/common/psprintf.c',
'src/common/relpath.c','src/common/restricted_token.c', 'src/common/rmtree.c', 'src/common/saslprep.c',
'src/common/scram-common.c','src/common/sprompt.c', 'src/common/string.c', 'src/common/stringinfo.c',
'src/common/unicode_norm.c','src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']' 
|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py Cannot cache invocation as ['/c',
'/Isrc/include','/Isrc/include/port/win32', '/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi',
'/nologo','/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', '_WINDOWS', '/D', '__WINDOWS__', '/D',
'__WIN32__','/D', 'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', '_CRT_NONSTDC_NO_DEPRECATE',
'/D','FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', '/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope',
'/Zc:inline','/Fo.\\Release\\libpgcommon\\', '/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd',
'/TC','/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', '/FC', '/errorReport:queue', '/MP',
'src/common/archive.c','src/common/base64.c', 'src/common/checksum_helper.c', 'src/common/config_info.c',
'src/common/controldata_utils.c','src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c',
'src/common/exec.c','src/common/f2s.c', 'src/common/fe_memutils.c', 'src/common/file_perm.c',
'src/common/file_utils.c','src/common/hashfn.c', 'src/common/hmac_openssl.c', 'src/common/ip.c',
'src/common/jsonapi.c','src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c',
'src/common/logging.c','src/common/md5_common.c', 'src/common/pg_get_line.c', 'src/common/pg_lzcompress.c',
'src/common/pg_prng.c','src/common/pgfnames.c', 'src/common/protocol_openssl.c', 'src/common/psprintf.c',
'src/common/relpath.c','src/common/restricted_token.c', 'src/common/rmtree.c', 'src/common/saslprep.c',
'src/common/scram-common.c','src/common/sprompt.c', 'src/common/string.c', 'src/common/stringinfo.c',
'src/common/unicode_norm.c','src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']: external debug
information(/Zi) is not supported 

--
Justin



Re: Adding CI to our tree (ccache)

От
Andres Freund
Дата:
Hi,

On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> Have you tried to use the yet-to-be-released ccache with MSVC ?

Yes, it doesn't work, because it requires cl.exe to be used in a specific way
(only a single input file, specific output file naming). Which would require a
decent amount of changes to src/tools/msvc. I think it's more realistic with
meson etc.


> Also, do you know about msbuild /outputResultsCache ?

I don't think it's really usable for what we need. But it's hard to tell.


> Did you ever try to use clcache (or others) ?
> 
> When I tried, it refused to cache because of our debug settings
> (DebugInformationFormat) - which seem to be enabled even in release mode.

> I wonder if that'll be an issue for ccache, too.  I think that line may need to
> be conditional on debug mode.

That's relatively easily solvable by using a different debug format IIRC (/Z7
or such).

Greetings,

Andres Freund



Re: Adding CI to our tree (ccache)

От
Justin Pryzby
Дата:
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> > Did you ever try to use clcache (or others) ?
> > 
> > When I tried, it refused to cache because of our debug settings
> > (DebugInformationFormat) - which seem to be enabled even in release mode.
> 
> > I wonder if that'll be an issue for ccache, too.  I think that line may need to
> > be conditional on debug mode.
> 
> That's relatively easily solvable by using a different debug format IIRC (/Z7
> or such).

Yes. I got that working for CI by overriding with a value from the environment.
https://cirrus-ci.com/task/6191974075072512

This is right after rebasing, so it doesn't save anything, but normally cuts
build time to 90sec, which isn't impressive, but it's something.

BTW, I think it's worth compiling the windows build with optimizations (as I
did here).  At least with all the tap tests, this pays for itself.  I suppose
you don't want to use a Release build, but optimizations could be enabled by
an(other) environment variable.

-- 
Justin



Re: Adding CI to our tree

От
Justin Pryzby
Дата:

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

> Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts

I still think the determination of the base branch needs to be resolved before
this can be considered.


> Always run doc build; to allow them to be shown in cfbot, they should not be
> skipped if the linux build fails.
> 
> This could be done on the client side (cfbot).  One advantage of doing it here
> is that fewer docs are uploaded - many patches won't upload docs at all.

Imo this stuff is largely independent from the commit subject....


> XXX: if this is run in the same task, the configure flags should probably be
> consistent ?

What do you mean?


> Subject: [PATCH 3/7] s!build docs as a separate task..

Could you reorder this to earlier, then we can merge it before resolving the
branch issues. And we don't waffle on the CompilerWarnings dependency.


> I believe this'll automatically show up as a separate "column" on the cfbot
> page.

Yup.


> +# Verify docs can be built, and upload changed docs as artifacts
> +task:
> +  name: HTML docs
> +
> +  env:
> +    CPUS: 1
> +
> +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~
'.*\nci-os-only:[^\n]*(docs|html).*'
> +
> +  container:
> +    image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> +    cpu: $CPUS
> +

how about using something like (the syntax might be slightly off)
  skip: !changesInclude('doc/**')
to avoid running it for the many pushes where no docs are changed?


> +  sysinfo_script: |
> +    id
> +    uname -a
> +    cat /proc/cmdline
> +    ulimit -a -H && ulimit -a -S
> +    export
> +
> +    git remote -v
> +    git branch -a
> +    git remote add postgres https://github.com/postgres/postgres
> +    time git fetch -v postgres master
> +    git log -1 postgres/master
> +    git diff --name-only postgres/master..

Hardly "sysinfo"?


> Subject: [PATCH 4/7] wip: cirrus: code coverage
> 
> XXX: lcov should be installed in the OS image

FWIW, you can open a PR in https://github.com/anarazel/pg-vm-images/
both the debian docker and VM have their packages installed
via scripts/linux_debian_install_deps.sh


> From 226699150e3e224198fc297689add21bece51c4b Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 9 Jan 2022 18:25:02 -0600
> Subject: [PATCH 5/7] cirrus/vcregress: test modules/contrib with
>  NO_INSTALLCHECK=1

I don't want to commit the vcregress.pl part myself. But if you split off I'm
happy to push the --temp-config bits.


> From 08933bcd93d4f57ad73ab6df2f1627b93e61b459 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 16 Jan 2022 12:51:13 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: save tmp_install as an artifact..
> 
> ..to allow users to easily download compiled binaries to try a patch.
> If they don't have a development environment handy or not familiar with
> compilation.
> 
> XXX: maybe this should be conditional or commented out ?

Yea, I don't want to do this by default, that's a fair bit of data that very
likely nobody will ever access. One can make entire tasks triggered manually,
but that'd then require building again :/.



> From a7d2bba6f51d816412fb645b0d4821c36ee5c400 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 7/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and ||...

You could put the script in src/tools/ci and call it from the script to avoid
the quoting issues.

Would be good to add a comment explaining the fileLoggerParameters1 thing and
a warning that compiler_warnings_script should be the last script.


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Andres Freund
Дата:
On 2022-02-26 17:09:08 -0800, Andres Freund wrote:
> You could put the script in src/tools/ci and call it from the script to avoid
> the quoting issues.

Might also be a good idea for the bulk of the docs / coverage stuff, even if
there are no quoting issues.



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 26, 2022 at 05:09:08PM -0800, Andres Freund wrote:
> > XXX: if this is run in the same task, the configure flags should probably be
> > consistent ?
> 
> What do you mean?

I mean that commit to run CompilerWarnings unconditionally built docs with
different flags than the other stuff in that task.  If it's going to be a
separate task, then that doesn't matter.

> > +# Verify docs can be built, and upload changed docs as artifacts
> > +task:
> > +  name: HTML docs
> > +
> > +  env:
> > +    CPUS: 1
> > +
> > +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~
'.*\nci-os-only:[^\n]*(docs|html).*'
> > +
> > +  container:
> > +    image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> > +    cpu: $CPUS
> > +
> 
> how about using something like (the syntax might be slightly off)
>   skip: !changesInclude('doc/**')
> to avoid running it for the many pushes where no docs are changed?

This doesn't do the right thing - I just tried.
https://cirrus-ci.org/guide/writing-tasks/#environment-variables
| changesInclude function can be very useful for skipping some tasks when no changes to sources have been made since
thelast successful Cirrus CI build.
 

That means it will not normally rebuild docs (and then this still requires
resolving the "base branch").

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> This doesn't do the right thing - I just tried.
> https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made since
thelast successful Cirrus CI build.
 

> That means it will not normally rebuild docs (and then this still requires
> resolving the "base branch").

Why would we want to rebuild docs if they're the same as in the last build for
the same branch?  For cfbot purposes each commit is independent from the prior
commit, so it should rebuild it every time if the CF entry has changes to the
docs.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 26, 2022 at 06:50:00PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> > This doesn't do the right thing - I just tried.
> > https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> > | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made
sincethe last successful Cirrus CI build.
 
> 
> > That means it will not normally rebuild docs (and then this still requires
> > resolving the "base branch").
> 
> Why would we want to rebuild docs if they're the same as in the last build for
> the same branch?  For cfbot purposes each commit is independent from the prior
> commit, so it should rebuild it every time if the CF entry has changes to the
> docs.

I did git commit --amend --no-edit and repushed to github to trigger a new CI
run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714

This is in a branch with changes to doc.  I wasn't intending it to skip
building docs on this branch just because the same, changed docs were
previously built.

Why wouldn't the docs be built following the same logic as the rest of the
sources ?  If someone renames or removes an xref target, shouldn't CI fail on
its next run for a patch which tries to reference it ?  It would fail on the
buildfarm, and I think one major use for the CI is to minimize the post-push
cleanup cycles.

Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
relative to the previous run for branch: commitfest/NN/MMMM.

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> I did git commit --amend --no-edit and repushed to github to trigger a new CI
> run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714
>
> This is in a branch with changes to doc.  I wasn't intending it to skip
> building docs on this branch just because the same, changed docs were
> previously built.

But why not? If nothing in docs/ changes, there's little point? It'd probably
be good to check .cirrus.yml and docs/**, to make sure that .cirrus logic
changes rerun as well.


> Why wouldn't the docs be built following the same logic as the rest of the
> sources?

Tests have a certain rate of spurious failure, so rerunning them makes
sense. But what do we gain by rebuilding the docs? And especially, what do we
gain about uploading the docs e.g. in the postgres/postgres repo?


> If someone renames or removes an xref target, shouldn't CI fail on its next
> run for a patch which tries to reference it ?

Why wouldn't it?


> It would fail on the buildfarm, and I think one major use for the CI is to
> minimize the post-push cleanup cycles.

I personally see it more as access to a "mini buildfarm" before patches are
committable, but that's of course compatible.


> Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> relative to the previous run for branch: commitfest/NN/MMMM.

Not entirely sure, but it's what I remember observing when ammending commits
in a repo using changesInclues. If I were to build a feature like it I'd look
at the list of files of
  git diff $(git merge-base last_green new_commit)..new_commit

or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
figure out.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Sat, Feb 26, 2022 at 08:08:38PM -0800, Andres Freund wrote:
> On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> > If someone renames or removes an xref target, shouldn't CI fail on its next
> > run for a patch which tries to reference it ?
> 
> Why wouldn't it?

I suppose you're right - I was thinking that cirrus was checking whether the
*patch* had changed any matching files, but it probably checks (as it should)
whether "the sources" have changed.

Hmm, it's behaving strangely...if there's a single argument ('docs/**'), then
it will skip the docs task if I resubmit it after git commit --amend --no-edit.
But with multiple args ('.cirrus.yaml', 'docs/**') it reruns it...
I tried it as skip: !changesInclude() and by adding it to the existing only_if:
(.. || ..) && changesInclude().

> > Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> > relative to the previous run for branch: commitfest/NN/MMMM.
> 
> Not entirely sure, but it's what I remember observing when ammending commits
> in a repo using changesInclues. If I were to build a feature like it I'd look
> at the list of files of
>   git diff $(git merge-base last_green new_commit)..new_commit
> 
> or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
> I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
> figure out.

Anyway...

I still think that if "Build Docs" is a separate cirrus task, it should rebuild
docs on every CI run, even if they haven't changed, for any patch that touches
docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
quickest, so I don't think it's worth doing anything special there (especially
without understanding the behavior of changesInclude()).

Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
showing artifacts from the previous run.  If it's not already done, I think the
first half is a good idea on its own.  But the 2nd part doesn't seem desirable.

However, I realized that we can filter on cfbot with either of these:
| $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
| git log -1 |grep '^Author: Commitfest Bot <cfbot@cputube.org>'
If we can assume that cfbot will continue submitting branches as a single
patch, this resolves the question of a "base branch", for cfbot.

(Actually, I'd prefer if it preserved the original patches as separate commits,
but that isn't what it does).  

These patches implement that idea, and make "code coverage" and "HTML diffs"
stuff only run for cfbot commits.  This still needs another round of testing,
though.

-- 
Justin

PS. I've just done this.  I'm unsure whether to say that it's wonderful or
terrible.  This would certainly be better if each branch preserved the original
set of patches.

$ git remote add cfbot https://github.com/postgresql-cfbot/postgresql
$ git fetch cfbot
$ git branch -a |grep -c cfbot
5417

Вложения

Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Mon, Feb 28, 2022 at 02:58:02PM -0600, Justin Pryzby wrote:
> I still think that if "Build Docs" is a separate cirrus task, it should rebuild
> docs on every CI run, even if they haven't changed, for any patch that touches
> docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
> built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
> quickest, so I don't think it's worth doing anything special there (especially
> without understanding the behavior of changesInclude()).
> 
> Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
> track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
> showing artifacts from the previous run.  If it's not already done, I think the
> first half is a good idea on its own.  But the 2nd part doesn't seem desirable.

Maybe changesInclude() could work if we use this URL (from cirrus'
documentation), which uses the artifacts from the last successful build.

https://api.cirrus-ci.com/v1/artifact/github/justinpryzby/postgres/Documentation/html_docs/html_docs/00-doc.html?branch=citest-cirrus2

That requires knowing the file being modified, so we'd have to generate an
index of changed files - which I've started doing here.

> However, I realized that we can filter on cfbot with either of these:
> | $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
> | git log -1 |grep '^Author: Commitfest Bot <cfbot@cputube.org>'
> If we can assume that cfbot will continue submitting branches as a single
> patch, this resolves the question of a "base branch", for cfbot.

I don't know what you think of that idea, but I think I want to amend my
proposal: show HTML and coverage artifacts for HEAD~1, unless set otherwise by
an environment var.  Today, that'd do the right thing for cfbot, and also for
any 1-patch commits.

> These patches implement that idea, and make "code coverage" and "HTML diffs"
> stuff only run for cfbot commits.  This still needs another round of testing,
> though.

The patch was missing a file due to an issue while rebasing - oops.

BTW (regarding the last patch), I just noticed that -Og optimization can cause
warnings with gcc-4.8.5-39.el7.x86_64.

be-fsstubs.c: In function 'be_lo_export':
be-fsstubs.c:522:24: warning: 'fd' may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (CloseTransientFile(fd) != 0)
                        ^
trigger.c: In function 'ExecCallTriggerFunc':
trigger.c:2400:2: warning: 'result' may be used uninitialized in this function [-Wmaybe-uninitialized]
  return (HeapTuple) DatumGetPointer(result);
  ^
xml.c: In function 'xml_pstrdup_and_free':
xml.c:1205:2: warning: 'result' may be used uninitialized in this function [-Wmaybe-uninitialized]
  return result;

-- 
Justin

Вложения

Re: Adding CI to our tree (ccache)

От
Justin Pryzby
Дата:
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> > Have you tried to use the yet-to-be-released ccache with MSVC ?
> 
> Yes, it doesn't work, because it requires cl.exe to be used in a specific way
> (only a single input file, specific output file naming). Which would require a
> decent amount of changes to src/tools/msvc. I think it's more realistic with
> meson etc.

Did you get to the point that that causes a problem, or did you just realize
that it was a limitation that seems to preclude its use ?  If so, could you
send the branch/commit you had ?

The error I'm getting when I try to use ccache involves .rst files, which don't
exist (and which ccache doesn't know how to find or ignore).
https://cirrus-ci.com/task/5441491957972992

I gather this is the difference between "compiling with MSVC" and compiling
with a visual studio project.



Re: Adding CI to our tree (ccache)

От
Andres Freund
Дата:
Hi,

On 2022-03-03 22:56:15 -0600, Justin Pryzby wrote:
> On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> > On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> > > Have you tried to use the yet-to-be-released ccache with MSVC ?
> > 
> > Yes, it doesn't work, because it requires cl.exe to be used in a specific way
> > (only a single input file, specific output file naming). Which would require a
> > decent amount of changes to src/tools/msvc. I think it's more realistic with
> > meson etc.
> 
> Did you get to the point that that causes a problem, or did you just realize
> that it was a limitation that seems to preclude its use ?

I tried to use it, but saw that no caching was happening, and debugged
it. Which yielded that it can't be used due to the way output files are
specified (and due to multiple files, but that can be prevented with an
msbuild parameter).


> If so, could you send the branch/commit you had ?

This was in a local VM, not cirrus. I ended up making it work with the meson
build, after a bit of fiddling. Although bypassing msbuild (by building with
ninja, using cl.exe) is a larger win...


> The error I'm getting when I try to use ccache involves .rst files, which don't
> exist (and which ccache doesn't know how to find or ignore).
> https://cirrus-ci.com/task/5441491957972992

ccache has code to deal with response files. I suspect the problem here is
rather that ccache expects the compiler as an argument.


> I gather this is the difference between "compiling with MSVC" and compiling
> with a visual studio project.

I doubt it's related to that.

Greetings,

Andres Freund



Re: Adding CI to our tree (ccache)

От
Justin Pryzby
Дата:
On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> I tried to use it, but saw that no caching was happening, and debugged
> it. Which yielded that it can't be used due to the way output files are
> specified (and due to multiple files, but that can be prevented with an
> msbuild parameter).

Could you give a hint about to the msbuild param to avoid processing multiple
files with cl.exe?  I'm not able to find it...

I don't know about the issue with output filenames ..

-- 
Justin



Re: Adding CI to our tree (ccache)

От
Andres Freund
Дата:
Hi,

On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > I tried to use it, but saw that no caching was happening, and debugged
> > it. Which yielded that it can't be used due to the way output files are
> > specified (and due to multiple files, but that can be prevented with an
> > msbuild parameter).
> 
> Could you give a hint about to the msbuild param to avoid processing multiple
> files with cl.exe?  I'm not able to find it...

/p:UseMultiToolTask=true


> I don't know about the issue with output filenames ..

I don't remember the precise details anymore, but it boils down to ccache
requiring the output filename to be specified, but we only specify the output
directory. Or very similar.

Greetings,

Andres Freund



Re: Adding CI to our tree (ccache)

От
Justin Pryzby
Дата:
On Mon, Mar 07, 2022 at 11:10:54AM -0800, Andres Freund wrote:
> On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> > On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > > I tried to use it, but saw that no caching was happening, and debugged
> > > it. Which yielded that it can't be used due to the way output files are
> > > specified (and due to multiple files, but that can be prevented with an
> > > msbuild parameter).
> > 
> > Could you give a hint about to the msbuild param to avoid processing multiple
> > files with cl.exe?  I'm not able to find it...
> 
> /p:UseMultiToolTask=true

Wow - thanks ;)

> > I don't know about the issue with output filenames ..
> 
> I don't remember the precise details anymore, but it boils down to ccache
> requiring the output filename to be specified, but we only specify the output
> directory. Or very similar.

There's already a problem report and PR for this.
I didn't test it, but I hope it'll be fixed in their next minor release.

https://github.com/ccache/ccache/issues/1018

-- 
Justin



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
I'm curious what you think of this patch.

It makes check-world on freebsd over 30% faster - saving 5min.

Apparently gcc -Og was added in gcc 4.8 (c. 2013).

On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote:
> From d180953d273c221a30c5e9ad8d74b1b4dfc60bd1 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 27 Feb 2022 15:17:50 -0600
> Subject: [PATCH 7/7] cirrus: compile with -Og..
> 
> To improve performance of check-world, and improve debugging, without
> significantly slower builds (they're cached anyway).
> 
> This makes freebsd check-world run in 8.5 minutes rather than 15 minutes.
> ---
>  .cirrus.yml                      | 12 +++++++-----
>  src/tools/msvc/MSBuildProject.pm |  4 ++--
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 6f05d420c85..8b673bf58cf 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -113,7 +113,7 @@ task:
>          \
>          CC="ccache cc" \
>          CXX="ccache c++" \
> -        CFLAGS="-O0 -ggdb"
> +        CFLAGS="-Og -ggdb"
>      EOF
>    build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin"
>    upload_caches: ccache
> @@ -208,8 +208,8 @@ task:
>          CC="ccache gcc" \
>          CXX="ccache g++" \
>          CLANG="ccache clang" \
> -        CFLAGS="-O0 -ggdb" \
> -        CXXFLAGS="-O0 -ggdb"
> +        CFLAGS="-Og -ggdb" \
> +        CXXFLAGS="-Og -ggdb"
>      EOF
>    build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
>    upload_caches: ccache
> @@ -329,8 +329,8 @@ task:
>        CC="ccache cc" \
>        CXX="ccache c++" \
>        CLANG="ccache ${brewpath}/llvm/bin/ccache" \
> -      CFLAGS="-O0 -ggdb" \
> -      CXXFLAGS="-O0 -ggdb" \
> +      CFLAGS="-Og -ggdb" \
> +      CXXFLAGS="-Og -ggdb" \
>        \
>        LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
>        PYTHON=python3
> @@ -383,6 +383,8 @@ task:
>      # -fileLoggerParameters1: write to msbuild.warn.log.
>      MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
-fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log
>  
> +    MSBUILD_OPTIMIZE: MaxSpeed
> +
>      # If tests hang forever, cirrus eventually times out. In that case log
>      # output etc is not uploaded, making the problem hard to debug. Of course
>      # tests internally should have shorter timeouts, but that's proven to not
> diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
> index 5e312d232e9..05e0c41eb5c 100644
> --- a/src/tools/msvc/MSBuildProject.pm
> +++ b/src/tools/msvc/MSBuildProject.pm
> @@ -85,7 +85,7 @@ EOF
>          $f, 'Debug',
>          {
>              defs    => "_DEBUG;DEBUG=1",
> -            opt     => 'Disabled',
> +            opt     => $ENV{MSBUILD_OPTIMIZE} || 'Disabled',
>              strpool => 'false',
>              runtime => 'MultiThreadedDebugDLL'
>          });
> @@ -94,7 +94,7 @@ EOF
>          'Release',
>          {
>              defs    => "",
> -            opt     => 'Full',
> +            opt     => $ENV{MSBUILD_OPTIMIZE} || 'Full',
>              strpool => 'true',
>              runtime => 'MultiThreadedDLL'
>          });
> -- 
> 2.17.1
> 



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> I'm curious what you think of this patch.
> 
> It makes check-world on freebsd over 30% faster - saving 5min.

That's nice! While -Og makes interactive debugging noticeably harder IME, it's
not likely to be a large enough difference just for backtraces etc.

I'm far less convinced that using "MaxSpeed" for the msvc build is a good
idea. I've looked at one or two backtraces of optimized msvc builds and
backtraces were quite a bit worse - and they're not great to start with.  What
was the win there?

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > I'm curious what you think of this patch.
> > 
> > It makes check-world on freebsd over 30% faster - saving 5min.
> 
> That's nice! While -Og makes interactive debugging noticeably harder IME, it's
> not likely to be a large enough difference just for backtraces etc.

Yeah.  gcc(1) claims that -Og can improve debugging.

I should've mentioned that this seems to mitigate the performance effect of
--coverage on linux, too.

> I'm far less convinced that using "MaxSpeed" for the msvc build is a good
> idea. I've looked at one or two backtraces of optimized msvc builds and
> backtraces were quite a bit worse - and they're not great to start with.  What
> was the win there?

Did you compare FULL optimization or "favor speed/size" or "default"
optimization ?

It's worth trading some some build time (especially with a compiler cache) for
test time (especially with alltaptests).  But I didn't check backtraces, and I
didn't compare the various optimization options.  The argument may not be as
strong for windows, since it has no build cache (and it has no -Og).  We'd save
a bit more when also running the other tap tests.

CI runs are probably not very consistent, but I've just run
https://cirrus-ci.com/task/5236145167532032
master is the average of 4 patches at the top of cfbot.

                / master / patched / change
subscription    / 197s   / 195s    / +2s
recovery        / 234s   / 212s    / -22s
bin             / 383s   / 373s    / -10s

-- 
Justin



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > > I'm curious what you think of this patch.
> > >
> > > It makes check-world on freebsd over 30% faster - saving 5min.
> >
> > That's nice! While -Og makes interactive debugging noticeably harder IME, it's
> > not likely to be a large enough difference just for backtraces etc.
>
> Yeah.  gcc(1) claims that -Og can improve debugging.

Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
12:43 when I tried (terrible absolute times, but fantastic
improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
with this change.  My working theory had been that there is something
bad happening in the I/O stack under concurrency making FreeBSD on
Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
something I hope to dig into when post-freeze round tuits present
themselves), but that doesn't gel with this huge improvement from
noodling with optimiser details, and I don't know why I don't see
something similar locally.  I'm confused.

Just BTW it's kinda funny that we say -ggdb for macOS and then we use
lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
correct to say -glldb, but doubt it matters much...



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
> 12:43 when I tried (terrible absolute times, but fantastic
> improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
> with this change.  My working theory had been that there is something
> bad happening in the I/O stack under concurrency making FreeBSD on
> Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
> something I hope to dig into when post-freeze round tuits present
> themselves), but that doesn't gel with this huge improvement from
> noodling with optimiser details, and I don't know why I don't see
> something similar locally.  I'm confused.

The "terrible IO wait" thing was before we reduced the number of CPUs and
concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
in which case -Og obviously can make a difference.


> Just BTW it's kinda funny that we say -ggdb for macOS and then we use
> lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
> correct to say -glldb, but doubt it matters much...

Yea. I used -ggdb because I didn't know -glldb existed :). And there's also
the advantage that -ggdb works both with gcc and clang, whereas -glldb doesn't
seem to be known to gcc.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Thu, Mar 10, 2022 at 4:33 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> > I'm confused.
>
> The "terrible IO wait" thing was before we reduced the number of CPUs and
> concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
> in which case -Og obviously can make a difference.

Oh, duh, yeah, that makes sense when you put it that way and
considering the CPU graph:

-O0: https://cirrus-ci.com/task/4578631912521728
-Og: https://cirrus-ci.com/task/5239486182326272



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-03-02 14:50:58 -0600, Justin Pryzby wrote:
> From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Mon, 17 Jan 2022 00:53:04 -0600
> Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..
> 
> This is useful for patches during development, but once a feature is merged,
> new libraries should be added to the OS image files, rather than installed
> during every CI run forever into the future.
> ---
>  .cirrus.yml | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index d10b0a82f9f..1b7c36283e9 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -73,10 +73,11 @@ task:
>      chown -R postgres:postgres .
>      mkdir -p ${CCACHE_DIR}
>      chown -R postgres:postgres ${CCACHE_DIR}
> -  setup_cores_script: |
> +  setup_os_script: |
>      mkdir -m 770 /tmp/cores
>      chown root:postgres /tmp/cores
>      sysctl kern.corefile='/tmp/cores/%N.%P.core'
> +    #pkg install -y ...

Would you mind if I split this into setup_core_files_script and
setup_additional_packages_script:?


> +  # The commit that this branch is rebased on.  There's no easy way to find this.
> +  # This does the right thing for cfbot, which always squishes all patches into a single commit.
> +  # And does the right thing for any 1-patch commits.
> +  # Patches series manually submitted to cirrus may benefit from setting this
> +  # to the number of patches in the series (or directly to the commit the series was rebased on).
> +  BASE_COMMIT: HEAD~1

Still think that something using
  git merge-base $CIRRUS_LAST_GREEN_CHANGE HEAD

might be better. With a bit of error handling for unset
CIRRUS_LAST_GREEN_CHANGE and for git not seeing enough history for
CIRRUS_LAST_GREEN_CHANGE.


> +    apt-get update
> +    apt-get -y install lcov

FWIW, I just added that to the install script used for the container / VM
image build. So it'll be pre-installed once that completes.

https://cirrus-ci.com/build/5818788821073920


> From feceea4413b84f478e6a0888cdfab4be1c80767a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and ||...

That comment isn't accurate anymore now that it's in an external script,
right?


Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Thu, Mar 10, 2022 at 12:50:15PM -0800, Andres Freund wrote:
> > -  setup_cores_script: |
> > +  setup_os_script: |
> >      mkdir -m 770 /tmp/cores
> >      chown root:postgres /tmp/cores
> >      sysctl kern.corefile='/tmp/cores/%N.%P.core'
> > +    #pkg install -y ...
> 
> Would you mind if I split this into setup_core_files_script and
> setup_additional_packages_script:?

That's fine.  FYI I'm also planning on using choco install --no-progress
I could resend my latest patches shortly.

> > Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> > 
> > I'm not sure how to write this test in windows shell; it's also not easy to
> > write it in posix sh, since windows shell is somehow interpretting && and ||...
> 
> That comment isn't accurate anymore now that it's in an external script,
> right?

No, it is accurate.  What I mean is that it's also hard to write it as a
1-liner using posix sh, since the || (and &&) seemed to be interpretted by
cmd.exe and needed escaping - gross.

-- 
Justin



Re: Adding CI to our tree

От
Justin Pryzby
Дата:

Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Pushed 0001, 0002. Only change I made was to add
DEBIAN_FRONTEND=noninteractive to the apt-get invocations, because some
packages will fail / complain verbosely if there's no interactive prompt
during installation.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote:
> Pushed 0001, 0002. Only change I made was to add

Thanks - is there any reason not to do the MSVC compiler warnings one, too ?

I see that it'll warn about issues with at least 3 patches (including one of
yours).

-- 
Justin



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

On 2022-03-22 23:14:23 -0500, Justin Pryzby wrote:
> On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote:
> > Pushed 0001, 0002. Only change I made was to add
> 
> Thanks - is there any reason not to do the MSVC compiler warnings one, too ?

Purely a lack of round tuits. IIRC I thought there was a small aspect that
still needed some polishing, but didn't have time to tackle it yet.

Greetings,

Andres Freund



Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> -Og

Adding this to CXXFLAGS caused a torrent of warnings from g++ about
LLVM headers, which I also see on my local system for LLVM 11 and LLVM
14:

[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h: In member
function ‘llvm::CallInst*
llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*,
llvm::ArrayRef<llvm::Value*>, const llvm::Twine&, llvm::MDNode*)’:
[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h:229:16:
warning: ‘<anonymous>.llvm::Twine::LHS.llvm::Twine::Child::twine’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
[19:47:11.047] 229 | !LHS.twine->isBinary())
[19:47:11.047] | ~~~~^~~~~

https://cirrus-ci.com/task/5597526098182144?logs=build#L6

Not sure who to complain to about that...



Re: Adding CI to our tree

От
Justin Pryzby
Дата:
On Thu, Mar 24, 2022 at 09:52:39AM +1300, Thomas Munro wrote:
> On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > -Og
> 
> Adding this to CXXFLAGS caused a torrent of warnings from g++ about
> LLVM headers, which I also see on my local system for LLVM 11 and LLVM
> 14:

Yes, I mentioned seeing some other warnings here.
20220302205058.GJ15744@telsasoft.com

I think warnings were cleaned up with -O0/1/2 but not with -Og.



Re: Adding CI to our tree

От
Andres Freund
Дата:
Hi,

Now that zstd is used, enable it in CI. I plan to commit this shortly, unless
somebody sees a reason not to do so.

Greetings,

Andres Freund

Вложения

Re: Adding CI to our tree

От
Thomas Munro
Дата:
On Wed, Oct 6, 2021 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> BTW, on those two OSes there are some messages like this each time a
> submake dumps its output to the log:
>
> [03:36:16.591] fcntl(): Bad file descriptor
>
> It seems worth putting up with these compared to the alternatives of
> either not using -j, not using -Otarget and having the output of
> parallel tests all mashed up and unreadable (that still happen
> sometimes but it's unlikely, because the submakes write() whole output
> chunks at infrequent intervals), or redirecting to a file so you can't
> see the realtime test output on the main CI page (not so fun, you have
> to wait until it's finished and view it as an 'artifact').  I tried to
> write a patch for GNU make to fix that[1], let's see if something
> happens.
>
> [1] https://savannah.gnu.org/bugs/?52922

For the record, GNU make finally fixed this problem (though Andres
found a workaround anyway), but in any case it won't be in the
relevant package repos before we switch over to Meson/Ninja :-)