Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
Дата
Msg-id 20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
обсуждение исходный текст
Ответы Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?  (Andres Freund <andres@anarazel.de>)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?  (Peter Eisentraut <peter@eisentraut.org>)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?  ("Tristan Partin" <tristan@neon.tech>)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

As some of you might have seen when running CI, cirrus-ci is restricting how
much CI cycles everyone can use for free (announcement at [1]). This takes
effect September 1st.

This obviously has consequences both for individual users of CI as well as
cfbot.


The first thing I think we should do is to lower the cost of CI. One thing I
had not entirely realized previously, is that macos CI is by far the most
expensive CI to provide. That's not just the case with cirrus-ci, but also
with other providers.  See the series of patches described later in the email.


To me, the situation for cfbot is different than the one for individual
users.

IMO, for the individual user case it's important to use CI for "free", without
a whole lot of complexity. Which imo rules approaches like providing
$cloud_provider compute accounts, that's too much setup work.  With the
improvements detailed below, cirrus' free CI would last about ~65 runs /
month.

For cfbot I hope we can find funding to pay for compute to use for CI. The, by
far, most expensive bit is macos. To a significant degree due to macos
licensing terms not allowing more than 2 VMs on a physical host :(.


The reason we chose cirrus-ci were

a) Ability to use full VMs, rather than a pre-selected set of VMs, which
   allows us to test a larger number

b) Ability to link to log files, without requiring an account. E.g. github
   actions doesn't allow to view logs unless logged in.

c) Amount of compute available.


The set of free CI providers has shrunk since we chose cirrus, as have the
"free" resources provided. I started, quite incomplete as of now, wiki page at
[4].


Potential paths forward for individual CI:

- migrate wholesale to another CI provider

- split CI tasks across different CI providers, rely on github et al
  displaying the CI status for different platforms

- give up


Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
  credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
  could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.


Opinions as to what to do?



The attached series of patches:

1) Makes startup of macos instances faster, using more efficient caching of
   the required packages. Also submitted as [2].

2) Introduces a template initdb that's reused during the tests. Also submitted
   as [3]

3) Remove use of -DRANDOMIZE_ALLOCATED_MEMORY from macos tasks. It's
   expensive. And CI also uses asan on linux, so I don't think it's really
   needed.

4) Switch tasks to use debugoptimized builds. Previously many tasks used -Og,
   to get decent backtraces etc. But the amount of CPU burned that way is too
   large. One issue with that is that use of ccache becomes much more crucial,
   uncached build times do significantly increase.

5) Move use of -Dsegsize_blocks=6 from macos to linux

   Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
   could stop covering both meson and autoconf segsize_blocks. It does affect
   runtime on linux as well.

6) Disable write cache flushes on windows

   It's a bit ugly to do this without using the UI... Shaves off about 30s
   from the tests.

7) pg_regress only checked once a second whether postgres started up, but it's
   usually much faster. Use pg_ctl's logic.  It might be worth replacing the
   use psql with directly using libpq in pg_regress instead, looks like the
   overhead of repeatedly starting psql is noticeable.


FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):

task costs in credits
    linux-sanity: 0.01
    linux-compiler-warnings: 0.05
    linux-meson: 0.07
    freebsd   : 0.08
    linux-autoconf: 0.09
    windows   : 0.18
    macos     : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month


Greetings,

Andres Freund

[1] https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/
[2] https://www.postgresql.org/message-id/20230805202539.r3umyamsnctysdc7%40awork3.anarazel.de
[3] https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: 2023-08-10 release announcement draft
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?