Обсуждение: [MASSMAIL]Security lessons from liblzma
You might have seen reports today about a very complex exploit added to recent versions of liblzma. Fortunately, it was only enabled two months ago and has not been pushed to most stable operating systems like Debian and Ubuntu. The original detection report is: https://www.openwall.com/lists/oss-security/2024/03/29/4 And this ycombinator discussion has details: https://news.ycombinator.com/item?id=39865810 It looks like an earlier commit with a binary blob "test data" contained the bulk of the backdoor, then the configure script enabled it, and then later commits patched up valgrind errors caused by the backdoor. See the commit links in the "Compromised Repository" section. and I think the configure came in through the autoconf output file 'configure', not configure.ac: This is my main take-away from this. We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them. (Debian already does something like this I think.) Now, we don't take pull requests, and all our committers are known individuals, but this might have cautionary lessons for us. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Sat, Mar 30, 2024 at 11:37 AM Bruce Momjian <bruce@momjian.us> wrote: > You might have seen reports today about a very complex exploit added to > recent versions of liblzma. Fortunately, it was only enabled two months > ago and has not been pushed to most stable operating systems like Debian > and Ubuntu. The original detection report is: > > https://www.openwall.com/lists/oss-security/2024/03/29/4 Incredible work from Andres. The attackers made a serious strategic mistake: they made PostgreSQL slightly slower.
Hi, On 2024-03-29 18:37:24 -0400, Bruce Momjian wrote: > You might have seen reports today about a very complex exploit added to > recent versions of liblzma. Fortunately, it was only enabled two months > ago and has not been pushed to most stable operating systems like Debian > and Ubuntu. The original detection report is: > > https://www.openwall.com/lists/oss-security/2024/03/29/4 > > And this ycombinator discussion has details: > > https://news.ycombinator.com/item?id=39865810 > > It looks like an earlier commit with a binary blob "test data" > contained the bulk of the backdoor, then the configure script > enabled it, and then later commits patched up valgrind errors > caused by the backdoor. See the commit links in the "Compromised > Repository" section. > > and I think the configure came in through the autoconf output file > 'configure', not configure.ac: > > This is my main take-away from this. We must stop using upstream > configure and other "binary" scripts. Delete them all and run > "autoreconf -fi" to recreate them. (Debian already does something > like this I think.) I don't think that's a useful lesson here, actually. In this case, if you ran autoreconf -fi in a released tarball, it'd just recreate precisely what the tarball already contained, including the exploit. I think the issue in this case rather was that the tarball contains files that are not in the release - a lot of them. The attackers injected the 'activating' part of the backdoor into the release tarball, while it was not present in the git tree. They did that because they knew that distributions often build from release tarballs. If the release pre-backdoor release tarball had been identical to the git repository, this would likely have been noticed by packagers - but it was normal for there to be a lot of new files. We traditionally had also a lot of generated files in the tarball that weren't in our git tree - but we removed a lot of that a few months ago, when we stopped including bison/flex/other generated code. We still include generated docs, but that's much harder to exploit at scale. However, they still make it harder to verify that the release is exactly the same as the git tree. > Now, we don't take pull requests, and all our committers are known > individuals, but this might have cautionary lessons for us. I am doubtful that every committer would find something sneaky hidden in e.g. one of the test changes in a large commit. It's not too hard to hide something sneaky. I comparison to that hiding something in configure.ac seems less likely to succeed IMO, that imo tends to be more scrutinized. And hiding just in configure directly wouldn't get you far, it'd just get removed when the committer or some other committer at a later time, regenerates configure. Greetings, Andres Freund
> On 29 Mar 2024, at 23:59, Andres Freund <andres@anarazel.de> wrote: > On 2024-03-29 18:37:24 -0400, Bruce Momjian wrote: >> Now, we don't take pull requests, and all our committers are known >> individuals, but this might have cautionary lessons for us. > > I am doubtful that every committer would find something sneaky hidden in > e.g. one of the test changes in a large commit. It's not too hard to hide > something sneaky. One take-away for me is how important it is to ship recipes for regenerating any testdata which is included in generated/compiled/binary format. Kind of how we in our tree ship the config for test TLS certificates and keys which can be manually inspected, and used to rebuild the testdata (although the risk for injections in this particular case seems low). Bad things can still be injected, but formats which allow manual review at least goes some way towards lowering risk. -- Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > One take-away for me is how important it is to ship recipes for regenerating > any testdata which is included in generated/compiled/binary format. IMO that's a hard, no-exceptions requirement just for maintainability's sake, never mind security risks. regards, tom lane
On Fri, Mar 29, 2024 at 7:00 PM Andres Freund <andres@anarazel.de> wrote: > I am doubtful that every committer would find something sneaky hidden in > e.g. one of the test changes in a large commit. It's not too hard to hide > something sneaky. I comparison to that hiding something in configure.ac seems > less likely to succeed IMO, that imo tends to be more scrutinized. And hiding > just in configure directly wouldn't get you far, it'd just get removed when > the committer or some other committer at a later time, regenerates configure. I agree with this. If I were trying to get away with a malicious commit, I'd look for files that other people would be unlikely to examine closely, or would have difficulty examining closely. Test data or test scripts seem like great possibilities. And I also would like it to be part of some relatively large commit that is annoying to read through visually. We don't have a lot of binary format files in the tree, which is good, but there's probably some things like Unicode tables and ECPG expected output files that very, very few people ever actually examine. If we had a file in the tree that looked based on the name like an expected output file for a test, but there was no corresponding test, how many of us would notice that? How many of us would scrutinize it? Imagine hiding something bad in the middle of that file somewhere. Maybe we need some kind of tool that scores files for risk. Longer files are riskier. Binary files are riskier, as are text files that are something other than plain English/C code/SGML. Files that haven't changed in a long time are not risky, but files with few recent changes are riskier than files with many recent changes, especially if only 1 or 2 committers made all of those changes, and especially if those commits also touched a lot of other files. Of course, if we had a tool like this that were public, I suppose anyone targeting PG would look at the tool and try to find ways around its heuristics. But maybe we should have something and not disclose the whole algorithm publicly, or even if we do disclose it all, having something is probably better than having nothing. It might force a hypothetical bad actor to do things that would be more likely to be noticed by the humans. We might also want to move toward signing commits and tags. One of the meson maintainers was recommending that on-list not long ago. We should think about weaknesses that might occur during the packaging process, too. If someone who alleges that their packaging PG is really packaging PG w/badstuff123.patch, how would we catch that? An awful lot of what we do operates on the principle that we know the people who are involved and trust them, and I'm glad we do trust them, but the world is full of people who trusted somebody too much and regretted it afterwards. The fact that we have many committers rather than a single maintainer probably reduces risk at least as far as the source repository is concerned, because there are more people paying attention to potentially notice something that isn't as it should be. But it's also more potential points of compromise, and a lot of things outside of that repository are not easy to audit. I can't for example verify what the infrastructure team is doing, or what Tom does when he builds the release tarballs. It seems like a stretch to imagine someone taking over Tom's online identity while simultaneously rendering him incommunicado ... but at the same time, the people behind this attack obviously put a lot of work into it and had a lot of resources available to craft the attack. We shouldn't make the mistake of assuming that bad things can't happen to us. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2024-03-30 16:50:26 -0400, Robert Haas wrote: > We might also want to move toward signing commits and tags. One of the > meson maintainers was recommending that on-list not long ago. I don't know how valuable the commit signing really is, but I strongly agree that we should sign both tags and tarballs. > We should think about weaknesses that might occur during the packaging > process, too. If someone who alleges that their packaging PG is really > packaging PG w/badstuff123.patch, how would we catch that? I don't think we realistically can. The environment, configure and compiler options will influence things too much to do any sort of automatic verification. But that shouldn't stop us from ensuring that at least the packages distributed via *.postgresql.org are reproducibly built. Another good avenue for introducing an attack would be to propose some distro specific changes to the packaging teams - there's a lot fewer eyes there. I think it might be worth working with some of our packagers to integrate more of their changes into our tree. > I can't for example verify what the infrastructure team is doing, or what > Tom does when he builds the release tarballs. This one however, I think we could improve upon. Making sure the tarball generation is completely binary reproducible and providing means of checking that would surely help. This will be a lot easier if we, as dicussed elsewhere, I believe, split out the generated docs into a separately downloadable archive. We already stopped including other generated files recently. > We shouldn't make the mistake of assuming that bad things can't happen to > us. +1 Greetings, Andres Freund
On Sat, Mar 30, 2024 at 04:50:26PM -0400, Robert Haas wrote: > On Fri, Mar 29, 2024 at 7:00 PM Andres Freund <andres@anarazel.de> wrote: > > I am doubtful that every committer would find something sneaky hidden in > > e.g. one of the test changes in a large commit. It's not too hard to hide > > something sneaky. I comparison to that hiding something in configure.ac seems > > less likely to succeed IMO, that imo tends to be more scrutinized. And hiding > > just in configure directly wouldn't get you far, it'd just get removed when > > the committer or some other committer at a later time, regenerates configure. > > I agree with this. If I were trying to get away with a malicious > commit, I'd look for files that other people would be unlikely to > examine closely, or would have difficulty examining closely. Test data > or test scripts seem like great possibilities. And I also would like > it to be part of some relatively large commit that is annoying to read > through visually. We don't have a lot of binary format files in the > tree, which is good, but there's probably some things like Unicode > tables and ECPG expected output files that very, very few people ever > actually examine. If we had a file in the tree that looked based on > the name like an expected output file for a test, but there was no > corresponding test, how many of us would notice that? How many of us > would scrutinize it? Imagine hiding something bad in the middle of > that file somewhere. So, in this case, the hooks were in 'configure', but not configure.ac, and the exploit was in a test file which was in the tarball but _not_ in the git tree. So, they used the obfuscation of 'configure's syntax, and the lack of git oversight by not putting the test files in the git tree. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On 3/30/24 17:12, Andres Freund wrote: > Hi, > > On 2024-03-30 16:50:26 -0400, Robert Haas wrote: >> We might also want to move toward signing commits and tags. One of the >> meson maintainers was recommending that on-list not long ago. > > I don't know how valuable the commit signing really is, but I strongly agree > that we should sign both tags and tarballs. +1 >> We should think about weaknesses that might occur during the packaging >> process, too. If someone who alleges that their packaging PG is really >> packaging PG w/badstuff123.patch, how would we catch that? > > I don't think we realistically can. The environment, configure and compiler > options will influence things too much to do any sort of automatic > verification. > > But that shouldn't stop us from ensuring that at least the packages > distributed via *.postgresql.org are reproducibly built. > > Another good avenue for introducing an attack would be to propose some distro > specific changes to the packaging teams - there's a lot fewer eyes there. I > think it might be worth working with some of our packagers to integrate more > of their changes into our tree. Huge +1 to that. I have thought many times, and even said to Devrim, a huge number of people trust him to not be evil. Virtually every RPM source, including ours, contains out of tree patches that get applied on top of the release tarball. At least for the PGDG packages, it would be nice to integrate them into our git repo as build options or whatever so that the packages could be built without any patches applied to it. Add a tarball that is signed and traceable back to the git tag, and we would be in a much better place than we are now. >> I can't for example verify what the infrastructure team is doing, Not sure what you feel like you should be able to follow -- anything specific? >> or what Tom does when he builds the release tarballs. Tom follows this, at least last time I checked: https://wiki.postgresql.org/wiki/Release_process > This one however, I think we could improve upon. Making sure the tarball > generation is completely binary reproducible and providing means of checking > that would surely help. This will be a lot easier if we, as dicussed > elsewhere, I believe, split out the generated docs into a separately > downloadable archive. We already stopped including other generated files > recently. again, big +1 >> We shouldn't make the mistake of assuming that bad things can't happen to >> us. > > +1 and again with the +1 ;-) -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 3/30/24 19:54, Joe Conway wrote: >> On 2024-03-30 16:50:26 -0400, Robert Haas wrote: >>> or what Tom does when he builds the release tarballs. > > Tom follows this, at least last time I checked: > > https://wiki.postgresql.org/wiki/Release_process Reading through that, I wonder if this part is true anymore: In principle this could be done anywhere, but again there's a concern about reproducibility, since the results may vary depending on installed bison, flex, docbook, etc versions. Current practice is to always do this as pgsql on borka.postgresql.org, so it can only be done by people who have a login there. In detail: Maybe if we split out the docs from the release tarball, we could also add the script (mk-release) to our git repo? Some other aspects of that wiki page look out of date too. Perhaps it needs an overall update? Maybe Tom and/or Magnus could weigh in here. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes: > On 3/30/24 19:54, Joe Conway wrote: >> Tom follows this, at least last time I checked: >> https://wiki.postgresql.org/wiki/Release_process > Reading through that, I wonder if this part is true anymore: > In principle this could be done anywhere, but again there's a concern > about reproducibility, since the results may vary depending on > installed bison, flex, docbook, etc versions. Current practice is to > always do this as pgsql on borka.postgresql.org, so it can only be > done by people who have a login there. In detail: The reproducibility argument would still apply to the docs (in whatever form we're packaging them), but hopefully not to the basic source tarball. > Maybe if we split out the docs from the release tarball, we could also > add the script (mk-release) to our git repo? If memory serves, the critical steps are already in our source tree, as "make dist" (but I'm not sure how that's going to work if we want to get away from using autoconf/make). It's not clear to me how much of the rest of mk-release is relevant to people who might be trying to generate things elsewhere. I'd like mk-release to continue to work for older branches, too, so it's going to be some sort of hybrid mess for a few years here. regards, tom lane
On Sat, Mar 30, 2024 at 07:54:00PM -0400, Joe Conway wrote: > Virtually every RPM source, including ours, contains out of tree patches > that get applied on top of the release tarball. At least for the PGDG > packages, it would be nice to integrate them into our git repo as build > options or whatever so that the packages could be built without any patches > applied to it. Add a tarball that is signed and traceable back to the git > tag, and we would be in a much better place than we are now. How would someone access the out-of-tree patches? I think Debian includes the patches in its source tarball. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Sat, Mar 30, 2024 at 09:52:47PM -0400, Bruce Momjian wrote: > On Sat, Mar 30, 2024 at 07:54:00PM -0400, Joe Conway wrote: > > Virtually every RPM source, including ours, contains out of tree patches > > that get applied on top of the release tarball. At least for the PGDG > > packages, it would be nice to integrate them into our git repo as build > > options or whatever so that the packages could be built without any patches > > applied to it. Add a tarball that is signed and traceable back to the git > > tag, and we would be in a much better place than we are now. > > How would someone access the out-of-tree patches? I think Debian > includes the patches in its source tarball. If you ask where they are maintained, the answer is here: https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads the other major versions have their own branch. Michael
On 3/30/24 21:52, Bruce Momjian wrote: > On Sat, Mar 30, 2024 at 07:54:00PM -0400, Joe Conway wrote: >> Virtually every RPM source, including ours, contains out of tree patches >> that get applied on top of the release tarball. At least for the PGDG >> packages, it would be nice to integrate them into our git repo as build >> options or whatever so that the packages could be built without any patches >> applied to it. Add a tarball that is signed and traceable back to the git >> tag, and we would be in a much better place than we are now. > > How would someone access the out-of-tree patches? I think Debian > includes the patches in its source tarball. I am saying maybe those patches should be eliminated in favor of our tree including build options that would produce the same result. For example, these patches are applied to our release tarball files when the RPM is being built for pg16 on RHEL 9: ----- https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-rpm-pgsql.patch;h=d9b6d12b7517407ac81352fa325ec91b05587641;hb=HEAD https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-var-run-socket.patch;h=f2528efaf8f4681754b20283463eff3e14eedd39;hb=HEAD https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-conf.patch;h=da28ed793232316dd81fdcbbe59a6479b054a364;hb=HEAD https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-perl-rpath.patch;h=748c42f0ec2c9730af3143e90e5b205c136f40d9;hb=HEAD ----- Nothing too crazy, but wouldn't it be better if no patches were required at all? Ideally we should have reproducible builds so that starting with our tarball (which is traceable back to the git release tag) one can easily obtain the same binary as what the RPMs/DEBs deliver. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Sun, 2024-03-31 at 08:15 -0400, Joe Conway wrote: > > I am saying maybe those patches should be eliminated in favor of our > tree including build options that would produce the same result. Works for me, as a long as I can commit them and upcoming potential patches to PostgreSQL git repo. OTOH, we also carry non-patches like README files, systemd unit files, pam files, setup script, etc., which are very RPM specific. Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
Joe Conway <mail@joeconway.com> writes: > I am saying maybe those patches should be eliminated in favor of our > tree including build options that would produce the same result. I don't really see how that can be expected to work sanely. It turns the responsibility for platform-specific build issues on its head, and it doesn't work at all for issues discovered after we make a release. The normal understanding of how you can vet a distro's package is that you look at the package contents (the SRPM in Red Hat world and whatever the equivalent concept is elsewhere), check that the contained tarball matches upstream and that the patches and build instructions look sane, and then build it locally and check for a match to the distro's binary package. Even if we could overcome the obstacles to putting the patch files into the upstream tarball, we're surely not going to include the build instructions, so we'd not have moved the needle very far in terms of whether the packager could do something malicious. regards, tom lane
Hi, On Sat, 2024-03-30 at 21:52 -0400, Bruce Momjian wrote: > How would someone access the out-of-tree patches? Here are the v17 patches: https://git.postgresql.org/gitweb/?p=pgrpms.git;a=tree;f=rpm/redhat/main/non-common/postgresql-17/main You can replace -17 with -16 (and etc) for the other major releases. Please note that both Debian folks and me build about 300 other packages to support the ecosystem. Just saying. Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
On 3/31/24 11:49, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> I am saying maybe those patches should be eliminated in favor of our >> tree including build options that would produce the same result. > > I don't really see how that can be expected to work sanely. > It turns the responsibility for platform-specific build issues > on its head, and it doesn't work at all for issues discovered > after we make a release. The normal understanding of how you > can vet a distro's package is that you look at the package > contents (the SRPM in Red Hat world and whatever the equivalent > concept is elsewhere), check that the contained tarball > matches upstream and that the patches and build instructions > look sane, and then build it locally and check for a match to > the distro's binary package. Even if we could overcome the > obstacles to putting the patch files into the upstream tarball, > we're surely not going to include the build instructions, so > we'd not have moved the needle very far in terms of whether the > packager could do something malicious. True enough I guess. But it has always bothered me how many patches get applied to the upstream tarballs by the OS package builders. Some of them, e.g. glibc on RHEL 7, include more than 1000 patches that you would have to manually vet if you cared enough and had the skills. Last time I looked at the openssl package sources it was similar in volume and complexity. They might as well be called forks if everyone were being honest about it... I know our PGDG packages are no big deal compared to that, fortunately. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Sun, Mar 31, 2024 at 01:05:40PM -0400, Joe Conway wrote: > But it has always bothered me how many patches get applied to the upstream > tarballs by the OS package builders. Some of them, e.g. glibc on RHEL 7, > include more than 1000 patches that you would have to manually vet if you > cared enough and had the skills. Last time I looked at the openssl package > sources it was similar in volume and complexity. They might as well be > called forks if everyone were being honest about it... I think this more an artifact of how RHEL development works, i.e. trying to ship the same major version of glibc for 10 years, but still fix lots of bugs and possibly some performance improvements your larger customers ask for. So I guess a lot of those 1000 patches are just cherry-picks / backports of upstream commits from newer releases. I guess it would be useful to maybe have another look at the patches that are being applied for apt/yum.postgresql.org for the 18 release cycle, but I do not think those are a security problem. Not sure about RPM builds, but at least in theory the APT builds should be reproducible. What would be a significant gain in security/trust was an easy service/recipe on how to verify the reproducibility (i) by independently building packages (and maybe the more popular extensions) and comparing them to the {apt,yum}.postgresql.org repository packages (ii) by being able to build the release tarballs reproducibly. Michael
Michael Banck <mbanck@gmx.net> writes: > On Sun, Mar 31, 2024 at 01:05:40PM -0400, Joe Conway wrote: >> But it has always bothered me how many patches get applied to the upstream >> tarballs by the OS package builders. > I think this more an artifact of how RHEL development works, i.e. trying > to ship the same major version of glibc for 10 years, but still fix lots > of bugs and possibly some performance improvements your larger customers > ask for. So I guess a lot of those 1000 patches are just cherry-picks / > backports of upstream commits from newer releases. Yeah. Also, precisely because they keep supporting versions that are out-of-support according to upstream, the idea that all the patches can be moved upstream isn't going to work for them, and they're unlikely to be excited about partial solutions. The bigger problem though is: if we do this, are we going to take patches that we fundamentally don't agree with? For instance, if a packager chooses to rip out the don't-run-server-as-root check. (Pretty sure I've heard of people doing that.) That would look like blessing things we don't think are good ideas, and it would inevitably lead to long arguments with packagers about why-dont-you-do-this-some- other-way. I'm not excited about that prospect. regards, tom lane
Hi, On 2024-03-31 12:18:29 +0200, Michael Banck wrote: > If you ask where they are maintained, the answer is here: > > https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads > > the other major versions have their own branch. Luckily these are all quite small, leaving little space to hide stuff. I'd still like to get rid of at least some of them. I've previously proposed a patch to make pkglibdir configurable, I think we should just go for that. For the various defines, ISTM we should just make them #ifndef guarded, then they could be overridden by defining them at configure time. Some of them, like DEFAULT_PGSOCKET_DIR seem to be overridden by just about every distro. And others would be nice to easily override anyway, I e.g. dislike the default DEFAULT_PAGER value. On 2024-03-31 16:55:26 +0100, Devrim Gündüz wrote: > Here are the v17 patches: > > https://git.postgresql.org/gitweb/?p=pgrpms.git;a=tree;f=rpm/redhat/main/non-common/postgresql-17/main A bit bigger/more patches, but still not too bad. postgresql-17-conf.patch Uncomments a few values to their default, that's a bit odd. postgresql-17-ecpg_config.h: postgresql-17-pg_config.h: Hm, wonder if we should make this easier somehow. Perhaps we ought to support installing the various *config.h headers into a different directory than the architecture independent stuff? At least on debian based systems it seems we ought to support installing pg_config.h etc into /usr/include/<tripplet> or something along those lines. postgresql-17-rpm-pgsql.patch: We should probably make this stuff properly configurable. The current logic with inferring whether to add /postgresql is just weird. Perhaps a configure option that defaults to the current logic when set to an empty string but can be overridden? Greetings, Andres Freund
I looked through the repositories of 19 linux distros [1] to see what kind of patches are applied often. Many of them share the same package managers / repositories and thus the same patches. I made sure to look at some smaller, "other" distros as well, to see what kind of problems appear outside the mainstream distros. Andres Freund: > I've previously proposed a patch to make pkglibdir configurable, I think we > should just go for that. +1. Other paths, which some distros need to configure are pkgincludedir and pgxs (independently of pkglibdir). Also a configurable directoy to look up extensions, possibly even to be changed at run-time like [2]. The patch says this: > This directory is prepended to paths when loading extensions (control and SQL files), and to the '$libdir' directive whenloading modules that back functions. The location is made configurable to allow build-time testing of extensions thatdo not have been installed to their proper location yet. This seems like a great thing to have. This might also be relevant in light of recent discussions in the ecosystem around extension management. All the path-related issues have in common, that while it's easy to move files around to their proper locations later, they all need to adjust pg_config's output. Andres Freund: > For the various defines, ISTM we should just make them #ifndef guarded, then > they could be overridden by defining them at configure time. Some of them, > like DEFAULT_PGSOCKET_DIR seem to be overridden by just about every > distro. And others would be nice to easily override anyway, I e.g. dislike the > default DEFAULT_PAGER value. DEFAULT_PAGER is also overriden by a few distros. DEFAULT_EDITOR by one as well. As you said DEFAULT_PGSOCKET_DIR in **a lot** of them. Andres Freund: > postgresql-17-rpm-pgsql.patch: > > We should probably make this stuff properly configurable. The current logic > with inferring whether to add /postgresql is just weird. Perhaps a configure > option that defaults to the current logic when set to an empty string but can > be overridden? +1 for that option to force the suffix, no matter whether postgres/pgsql are in the path already. Some other commonly patched issues are: - Building only libpq, only libecpg - or disabling both and building only the server code. This comes up often, it's not supported nicely in our build system, yet. I think meson already has some build targets for parts of that, but it's very hard / impossible to then install only this subset of the build. It would be great to be able to build and install only the frontend code (including only the frontend headers) or only the backend code (including headers) etc. - Related to the above is pg_config and how to deal with it when installing separate client and server packages, in different locations, too. Some distros provide a second version of pg_config as pg_server_config. Those two issues and the path-related issues above make it harder than it should be to provide separate packages for each major version, which can be installed at the same time (versioned prefixes, multiple server packages, but maybe only a single libpq package etc.). Some small patches that might not be widespread, but could possibly still be upstreamed: - jit-s390x [3] (Alpine, Debian, Fedora, openSUSE) - pg_config --major-version option for extension builds [4] (Alpine) - Some fixes for man pages [5] (AlmaLinux, CentOS, Fedora) TLDR: I think it should be possible to make the build system more flexible in some areas without introducing distro specific things in core. This should eliminate the need for many of the same patches across the board for a lot of distros. Best, Wolfgang [1]: ALT Linux, Adélie Linux, AlmaLinux, Alpine Linux, Arch Linux, CentOS, Crux, Debian, Fedora, Gentoo, GoboLinux, Guix, Mandriva, NixOS, OpenWRT, Rocky Linux, Ubuntu, Void Linux, openSUSE [2]: https://salsa.debian.org/postgresql/postgresql/-/blob/17/debian/patches/extension_destdir?ref_type=heads [3]: https://salsa.debian.org/postgresql/postgresql/-/blob/17/debian/patches/jit-s390x?ref_type=heads [4]: https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/main/postgresql16/pg_config-add-major-version.patch?ref_type=heads [5]: https://gitlab.com/redhat/centos-stream/rpms/postgresql/-/blob/c9s/postgresql-man.patch
On Apr 1, 2024, at 06:55, walther@technowledgy.de wrote: > Also a configurable directoy to look up extensions, possibly even to be changed at run-time like [2]. The patch says this: > >> This directory is prepended to paths when loading extensions (control and SQL files), and to the '$libdir' directive whenloading modules that back functions. The location is made configurable to allow build-time testing of extensions thatdo not have been installed to their proper location yet. > > This seems like a great thing to have. This might also be relevant in light of recent discussions in the ecosystem aroundextension management. > > All the path-related issues have in common, that while it's easy to move files around to their proper locations later,they all need to adjust pg_config's output. Funny timing, I was planning to resurrect this old patch[1] and propose that patch this week. One of motivators is the increasinguse of Docker images in Kubernetes to run Postgres, where there’s a desire to keep the core service and extensionsimmutable, and to have a second directory mounted to a persistent volume into which other extensions can be installedand preserved independently of the Docker image. The current approach involves symlinking shenanigans[2] that complicate things pretty substantially, making it more difficultto administer. A second directory fit for purpose would be far better. There are some other motivators, so I’ll do some additional diligence and start a separate thread (or reply to the original[3]). Best, David [1] https://commitfest.postgresql.org/5/170/ [2] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14 [3] https://www.postgresql.org/message-id/flat/51AE0845.8010600%40ocharles.org.uk
On Sun, Mar 31, 2024 at 02:12:57PM -0700, Andres Freund wrote: > Hi, > > On 2024-03-31 12:18:29 +0200, Michael Banck wrote: > > If you ask where they are maintained, the answer is here: > > > > https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads > > > > the other major versions have their own branch. > > Luckily these are all quite small, leaving little space to hide stuff. I'd > still like to get rid of at least some of them. > > I've previously proposed a patch to make pkglibdir configurable, I think we > should just go for that. > > For the various defines, ISTM we should just make them #ifndef guarded, then > they could be overridden by defining them at configure time. Some of them, > like DEFAULT_PGSOCKET_DIR seem to be overridden by just about every > distro. And others would be nice to easily override anyway, I e.g. dislike the > default DEFAULT_PAGER value. I realize we can move some changes into our code, but packagers are still going to need a way to do immediate adjustments to match their OS in time frames that don't match the Postgres release schedule. I was more asking if users have access to patches so they could recreate the build by using the Postgres git tree and supplied OS-specific patches. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Bruce Momjian <bruce@momjian.us> writes: > I was more asking if users have access to patches so they could recreate > the build by using the Postgres git tree and supplied OS-specific > patches. AFAIK, every open-source distro makes all the pieces needed to rebuild their packages available to users. It wouldn't be much of an open-source situation otherwise. You do have to learn their package build process. regards, tom lane
On Fri, Mar 29, 2024 at 06:37:24PM -0400, Bruce Momjian wrote: > You might have seen reports today about a very complex exploit added to > recent versions of liblzma. Fortunately, it was only enabled two months > ago and has not been pushed to most stable operating systems like Debian > and Ubuntu. The original detection report is: > > https://www.openwall.com/lists/oss-security/2024/03/29/4 I was watching this video about the exploit: https://www.youtube.com/watch?v=bS9em7Bg0iU and at 2:29, they mention "hero software developer", our own Andres Freund as the person who discovered the exploit. I noticed the author's name at the openwall email link above, but I assumed it was someone else with the same name. They mentioned it was found while researching Postgres performance, and then I noticed the email address matched! I thought the analogy he uses at the end of the video is very clear. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Mon, Apr 1, 2024 at 03:17:55PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I was more asking if users have access to patches so they could recreate > > the build by using the Postgres git tree and supplied OS-specific > > patches. > > AFAIK, every open-source distro makes all the pieces needed to > rebuild their packages available to users. It wouldn't be much > of an open-source situation otherwise. You do have to learn > their package build process. I wasn't clear if all the projects provide a source tree that can be verified against the project's source tree, and then independent patches, or if the patches were integrated and therefore harder to verify against the project source tree. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Apr 1, 2024 at 03:17:55PM -0400, Tom Lane wrote: >> AFAIK, every open-source distro makes all the pieces needed to >> rebuild their packages available to users. It wouldn't be much >> of an open-source situation otherwise. You do have to learn >> their package build process. > I wasn't clear if all the projects provide a source tree that can be > verified against the project's source tree, and then independent > patches, or if the patches were integrated and therefore harder to > verify against the project source tree. In the systems I'm familiar with, an SRPM-or-equivalent includes the pristine upstream tarball and then some patch files to apply to it. The patch files have to be maintained anyway, and if you don't ship them then you're not shipping "source". regards, tom lane
On 2024-03-31 Su 17:12, Andres Freund wrote: > Hi, > > On 2024-03-31 12:18:29 +0200, Michael Banck wrote: >> If you ask where they are maintained, the answer is here: >> >> https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads >> >> the other major versions have their own branch. > Luckily these are all quite small, leaving little space to hide stuff. I'd > still like to get rid of at least some of them. > > I've previously proposed a patch to make pkglibdir configurable, I think we > should just go for that. > > For the various defines, ISTM we should just make them #ifndef guarded, then > they could be overridden by defining them at configure time. Some of them, > like DEFAULT_PGSOCKET_DIR seem to be overridden by just about every > distro. And others would be nice to easily override anyway, I e.g. dislike the > default DEFAULT_PAGER value. > +1 to this proposal. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, As most will know by now, the way xz debacle was able to make sshd vulnerable was through a dependency from sshd to libsystemd and then from libsystemd to liblzma. One lesson from this is that unnecessary dependencies can still increase risk. It's worth noting that we have an optional dependency on libsystemd as well. Openssh has now integrated [1] a patch to remove the dependency on libsystemd for triggering service manager readyness notifications, by inlining the necessary function. That's not hard, the protocol is pretty simple. I suspect we should do the same. We're not even close to being a target as attractive as openssh, but still, it seems unnecessary. Intro into the protocol is at [2], with real content and outline of the relevant code at [3]. An argument could be made to instead just remove support, but I think it's quite valuable to have intra service dependencies that can rely on the server actually having started up. Greetings, Andres Freund [1] https://bugzilla.mindrot.org/show_bug.cgi?id=2641 [2] https://www.freedesktop.org/software/systemd/man/devel/systemd.html#Readiness%20Protocol [3] https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes
On 30.03.24 00:14, Daniel Gustafsson wrote: > One take-away for me is how important it is to ship recipes for regenerating > any testdata which is included in generated/compiled/binary format. Kind of > how we in our tree ship the config for test TLS certificates and keys which can > be manually inspected, and used to rebuild the testdata (although the risk for > injections in this particular case seems low). Bad things can still be > injected, but formats which allow manual review at least goes some way towards > lowering risk. I actually find the situation with the test TLS files quite unsatisfactory. While we have build rules, the output files are not reproducible, both because of some inherent randomness in the generation, and because different OpenSSL versions produce different details. So you just have to "trust" that what's there now makes sense. Of course, you can use openssl tools to unpack these files, but that is difficult and unreliable unless you know exactly what you are looking for. Also, for example, do we even know whether all the files that are there now are even used by any tests? A few years ago, some guy on the internet sent in a purported update to one of the files [0], which I ended up committing, but I remember that that situation gave me quite some pause at the time. It would be better if we created the required test files as part of the test run. (Why not? Too slow?) Alternatively, I have been thinking that maybe we could make the output more reproducible by messing with whatever random seed OpenSSL uses. Or maybe use a Python library to create the files. Some things to think about. [0]: https://www.postgresql.org/message-id/FEF81714-D479-4512-839B-C769D2605F8A%40yesql.se
> On 3 Apr 2024, at 20:09, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 30.03.24 00:14, Daniel Gustafsson wrote: >> One take-away for me is how important it is to ship recipes for regenerating >> any testdata which is included in generated/compiled/binary format. Kind of >> how we in our tree ship the config for test TLS certificates and keys which can >> be manually inspected, and used to rebuild the testdata (although the risk for >> injections in this particular case seems low). Bad things can still be >> injected, but formats which allow manual review at least goes some way towards >> lowering risk. > > I actually find the situation with the test TLS files quite unsatisfactory. While we have build rules, the output filesare not reproducible, both because of some inherent randomness in the generation, and because different OpenSSL versionsproduce different details. This testdata is by nature not reproducible, and twisting arms to make it so will only result in testing against synthetic data which risk hiding bugs IMO. > So you just have to "trust" that what's there now makes sense. Not entirely, you can review the input files which are used to generate the test data and verify that those make sense.. > Of course, you can use openssl tools to unpack these files, but that is difficult and unreliable unless you know exactlywhat you are looking for. ..and check like you mention above, but it's as you say not entirely trivial. It would be nice to improve this but I'm not sure how. Document how to inspect the data in src/test/ssl/README perhaps? > It would be better if we created the required test files as part of the test run. (Why not? Too slow?) The make sslfiles step requires OpenSSL 1.1.1, which is higher than what we require to be installed to build postgres. The higher-than-base requirement is due to it building test data only used when running 1.1.1 or higher, so technically it could be made to work if anyone is interesting in investing time in 1.0.2. Time is one aspect, on my crusty old laptop it takes ~2 seconds to regenerate the files. That in itself isn't that much, but we've rejected test-time additions far less than that. We could however make CI and the Buildfarm run the regeneration and leave it up to each developer to decide locally? Or remove them and regenerate on the first SSL test run and then use the cached ones after that? On top of time I have a feeling the regeneration won't run on Windows. When it's converted to use Meson then that might be fixed though. -- Daniel Gustafsson
On Wed, Apr 3, 2024 at 7:57 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
As most will know by now, the way xz debacle was able to make sshd vulnerable
was through a dependency from sshd to libsystemd and then from libsystemd to
liblzma. One lesson from this is that unnecessary dependencies can still
increase risk.
Yeah, I think that's something to consider for every dependency added. I think we're fairly often protected against "adding too many libraries" because many libraries simply don't exist for all the platforms we want to build on. But it's nevertheless something to think about each time.
It's worth noting that we have an optional dependency on libsystemd as well.
Openssh has now integrated [1] a patch to remove the dependency on libsystemd
for triggering service manager readyness notifications, by inlining the
necessary function. That's not hard, the protocol is pretty simple.
I suspect we should do the same. We're not even close to being a target as
attractive as openssh, but still, it seems unnecessary.
+1.
When the code is this simple, we should definitely consider carrying it ourselves. At least if we don't expect to need *other* functionality from the same library in the future, which I doubt we will from libsystemd.
An argument could be made to instead just remove support, but I think it's
quite valuable to have intra service dependencies that can rely on the server
actually having started up.
If we remove support we're basically just asking most of our linux packagers to add it back in, and they will add it back in the same way we did it. I think we do everybody a disservice if we do that. It's useful functionality.
//Magnus
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Apr 3, 2024 at 7:57 PM Andres Freund <andres@anarazel.de> wrote: >> Openssh has now integrated [1] a patch to remove the dependency on >> libsystemd >> for triggering service manager readyness notifications, by inlining the >> necessary function. That's not hard, the protocol is pretty simple. >> I suspect we should do the same. We're not even close to being a target as >> attractive as openssh, but still, it seems unnecessary. > +1. I didn't read the patch, but if it's short and stable enough then this seems like a good idea. (If openssh and we are using such a patch, that will probably be a big enough stake in the ground to prevent somebody deciding to change the protocol ...) >> An argument could be made to instead just remove support, but I think it's >> quite valuable to have intra service dependencies that can rely on the >> server actually having started up. > If we remove support we're basically just asking most of our linux > packagers to add it back in, and they will add it back in the same way we > did it. I think we do everybody a disservice if we do that. It's useful > functionality. Yeah, that idea seems particularly silly in view of the desire expressed earlier in this thread to reduce the number of patches carried by packagers. People packaging for systemd-using distros will not consider that this functionality is optional. regards, tom lane
Hi, On 2024-04-03 17:58:55 -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Wed, Apr 3, 2024 at 7:57 PM Andres Freund <andres@anarazel.de> wrote: > >> Openssh has now integrated [1] a patch to remove the dependency on > >> libsystemd > >> for triggering service manager readyness notifications, by inlining the > >> necessary function. That's not hard, the protocol is pretty simple. > >> I suspect we should do the same. We're not even close to being a target as > >> attractive as openssh, but still, it seems unnecessary. > > > +1. > > I didn't read the patch, but if it's short and stable enough then this > seems like a good idea. It's basically just checking for an env var, opening the unix socket indicated by that, writing a string to it and closing the socket again. > (If openssh and we are using such a patch, that will probably be a big > enough stake in the ground to prevent somebody deciding to change the > protocol ...) One version of the openssh patch to remove liblzma was submitted by one of the core systemd devs, so I think they agree that it's a stable API. The current protocol supports adding more information by adding attributes, so it should be extensible enough anyway. > >> An argument could be made to instead just remove support, but I think it's > >> quite valuable to have intra service dependencies that can rely on the > >> server actually having started up. > > > If we remove support we're basically just asking most of our linux > > packagers to add it back in, and they will add it back in the same way we > > did it. I think we do everybody a disservice if we do that. It's useful > > functionality. > > Yeah, that idea seems particularly silly in view of the desire > expressed earlier in this thread to reduce the number of patches > carried by packagers. People packaging for systemd-using distros > will not consider that this functionality is optional. Yep. Greetings, Andres Freund
On 03.04.24 23:19, Magnus Hagander wrote: > When the code is this simple, we should definitely consider carrying it > ourselves. At least if we don't expect to need *other* functionality > from the same library in the future, which I doubt we will from libsystemd. Well, I've long had it on my list to do some integration to log directly to the journal, so you can preserve metadata better. I'm not sure right now whether this would use libsystemd, but it's not like there is absolutely no other systemd-related functionality that could be added. Personally, I think this proposed change is trying to close a barndoor after a horse has bolted. There are many more interesting and scary libraries in the dependency tree of "postgres", so just picking off one right now doesn't really accomplish anything. The next release of libsystemd will drop all the compression libraries as hard dependencies, so the issue in that sense is gone anyway. Also, fun fact: liblzma is also a dependency via libxml2.
On Wed, Apr 3, 2024 at 3:42 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > So you just have to "trust" that what's there now makes sense. > > Not entirely, you can review the input files which are used to generate the > test data and verify that those make sense.. Yeah, I mean, in theory I suppose that's true, but look at this commit: https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89 In case the link stops working for some reason, the commit message is as follows: JT> Tests: Update two test files. JT> JT> The original files were generated with random local to my machine. JT> To better reproduce these files in the future, a constant seed was used JT> to recreate these files. Essentially, your argument is the same as his, namely: hey, don't worry, you could totally verify these test files if you wanted to! But of course, nobody did, because it was hard, and everybody had better things to do with their time. And I think the same thing is probably true here: nobody really is going to verify much about these files. I just went and ran openssl x509 -in $f -text on each .crt file and it worked, so all of those files look basically like certificates. But like, hypothetically, how do I know that the modulus chosen for a particular certificate was chosen at random, rather than maliciously? Similarly for the key files. Are there padding bytes in any of these files that could be used to store evil information? I don't know that, either. I'm not sure how far it's worth continuing down this path of paranoia; I doubt that Daniel Gustafsson is a clever alias for a nefarious cabal of state-sponsored hackers, and the hypotheses that supposedly-random values were chosen non-randomly borders on unfalsifiable. Nonetheless, I think Peter is correct to point out that these are the kinds of files about which it is reasonable to be concerned, because they seem to have properties quite similar to those of the files used in an actual attack. -- Robert Haas EDB: http://www.enterprisedb.com
> On 4 Apr 2024, at 21:38, Robert Haas <robertmhaas@gmail.com> wrote: > Essentially, your argument is the same as his, namely: hey, don't > worry, you could totally verify these test files if you wanted to! But > of course, nobody did, because it was hard, and everybody had better > things to do with their time. And I think the same thing is probably > true here: nobody really is going to verify much about these files. I don't disagree, like I said that very email: it's non-trivial and I wish we could make it better somehow, but I don't hav an abundance of good ideas. Removing the generated versions and creating them when running tests makes sneaking in malicious content harder since it then has to be submitted in clear-text *only*. The emphasis added since it's like that today as well: *I* fully trust our team of committers to not accept a binary file in a patch without replacing with a regenerated version, but enforcing it might make it easier for a wider community to share that level of trust? -- Daniel Gustafsson
On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > I don't disagree, like I said that very email: it's non-trivial and I wish we > could make it better somehow, but I don't hav an abundance of good ideas. Is the basic issue that we can't rely on the necessary toolchain to be present on every machine where someone might try to build PostgreSQL? > Removing the generated versions and creating them when running tests makes > sneaking in malicious content harder since it then has to be submitted in > clear-text *only*. The emphasis added since it's like that today as well: *I* > fully trust our team of committers to not accept a binary file in a patch > without replacing with a regenerated version, but enforcing it might make it > easier for a wider community to share that level of trust? To be honest, I'm not at all sure that I would have considered regenerating a binary file to be a must-do kind of a thing, so I guess that's a lesson learned for me. Trust is a really tricky thing in cases like this. It's not just about whether some committer is secretly a malicious actor; it's also about whether everyone understands the best practices and follows them consistently. In that regard, I don't even trust myself. I hope that it's unlikely that I would mistakenly commit something malicious, but I think it could happen, and I think it could happen to anyone else, too. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> I don't disagree, like I said that very email: it's non-trivial and I wish we >> could make it better somehow, but I don't hav an abundance of good ideas. > Is the basic issue that we can't rely on the necessary toolchain to be > present on every machine where someone might try to build PostgreSQL? IIUC, it's not really that, but that regenerating these files is expensive; multiple seconds even on fast machines. Putting that into tests that are run many times a day is unappetizing. regards, tom lane
> On 4 Apr 2024, at 22:40, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> I don't disagree, like I said that very email: it's non-trivial and I wish we >> could make it better somehow, but I don't hav an abundance of good ideas. > > Is the basic issue that we can't rely on the necessary toolchain to be > present on every machine where someone might try to build PostgreSQL? AFAIK we haven't historically enforced that installations have the openssl binary in PATH, but it would be a pretty low bar to add. The bigger issue is likely to find someone to port this to Windows, it probably won't be too hard but as with all things building on Windows, we need someone skilled in that area to do it. >> Removing the generated versions and creating them when running tests makes >> sneaking in malicious content harder since it then has to be submitted in >> clear-text *only*. The emphasis added since it's like that today as well: *I* >> fully trust our team of committers to not accept a binary file in a patch >> without replacing with a regenerated version, but enforcing it might make it >> easier for a wider community to share that level of trust? > > To be honest, I'm not at all sure that I would have considered > regenerating a binary file to be a must-do kind of a thing, so I guess > that's a lesson learned for me. Trust is a really tricky thing in > cases like this. It's not just about whether some committer is > secretly a malicious actor; it's also about whether everyone > understands the best practices and follows them consistently. In that > regard, I don't even trust myself. I hope that it's unlikely that I > would mistakenly commit something malicious, but I think it could > happen, and I think it could happen to anyone else, too. It absolutelty could. Re-reading Ken Thompsons Turing Lecture "Reflections on Trusting Trust" at periodic intervals is a good reminder to self just how complicated this is. -- Daniel Gustafsson
> On 4 Apr 2024, at 22:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: >>> I don't disagree, like I said that very email: it's non-trivial and I wish we >>> could make it better somehow, but I don't hav an abundance of good ideas. > >> Is the basic issue that we can't rely on the necessary toolchain to be >> present on every machine where someone might try to build PostgreSQL? > > IIUC, it's not really that, but that regenerating these files is > expensive; multiple seconds even on fast machines. Putting that > into tests that are run many times a day is unappetizing. That's one aspect of it. We could cache the results of course to amortize the cost over multiple test-runs but at the end of the day it will add time to test-runs regardless of what we do. One thing to consider would be to try and rearrange/refactor the tests to require a smaller set of keys and certificates. I haven't looked into what sort of savings that could yield (if any) but if we go the route of regeneration at test-time we shouldn't leave potential savings on the table. -- Daniel Gustafsson
It would be better if we created the required test files as part of the
test run. (Why not? Too slow?) Alternatively, I have been thinking
that maybe we could make the output more reproducible by messing with
whatever random seed OpenSSL uses. Or maybe use a Python library to
create the files. Some things to think about.
I think this last idea is the way to go. I've hand-crafted GIF images and PGP messages in the past; surely we have enough combined brain power around here to craft our own SSL files? It may even be a wheel that someone has invented already.
Cheers,
Greg
On Thu, Apr 4, 2024 at 10:56:01PM +0200, Daniel Gustafsson wrote: > > On 4 Apr 2024, at 22:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > >>> I don't disagree, like I said that very email: it's non-trivial and I wish we > >>> could make it better somehow, but I don't hav an abundance of good ideas. > > > >> Is the basic issue that we can't rely on the necessary toolchain to be > >> present on every machine where someone might try to build PostgreSQL? > > > > IIUC, it's not really that, but that regenerating these files is > > expensive; multiple seconds even on fast machines. Putting that > > into tests that are run many times a day is unappetizing. > > That's one aspect of it. We could cache the results of course to amortize the > cost over multiple test-runs but at the end of the day it will add time to > test-runs regardless of what we do. > > One thing to consider would be to try and rearrange/refactor the tests to > require a smaller set of keys and certificates. I haven't looked into what > sort of savings that could yield (if any) but if we go the route of > regeneration at test-time we shouldn't leave potential savings on the table. Rather then everyone testing it on every build, couldn't we have an automated test every night that checked binary files. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Thu, 4 Apr 2024 at 22:56, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 4 Apr 2024, at 22:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > >>> I don't disagree, like I said that very email: it's non-trivial and I wish we > >>> could make it better somehow, but I don't hav an abundance of good ideas. > > > >> Is the basic issue that we can't rely on the necessary toolchain to be > >> present on every machine where someone might try to build PostgreSQL? > > > > IIUC, it's not really that, but that regenerating these files is > > expensive; multiple seconds even on fast machines. Putting that > > into tests that are run many times a day is unappetizing. > > That's one aspect of it. We could cache the results of course to amortize the > cost over multiple test-runs but at the end of the day it will add time to > test-runs regardless of what we do. How about we make it meson/make targets, so they are simply cached just like any of our other build artefacts are cached. Then only clean builds are impacted, not every test run.
> On 4 Apr 2024, at 23:02, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > How about we make it meson/make targets, so they are simply cached > just like any of our other build artefacts are cached. Then only clean > builds are impacted, not every test run. They already are (well, make not meson yet), they're just not hooked up to any top-level commands. Running "make ssfiles-clean ssfiles" in src/test/ssl regenerates all the files from the base config files that define their contents. -- Daniel Gustafsson
Jelte Fennema-Nio <postgres@jeltef.nl> writes: > How about we make it meson/make targets, so they are simply cached > just like any of our other build artefacts are cached. Then only clean > builds are impacted, not every test run. Every buildfarm and CI run is "clean" in those terms. regards, tom lane
On Thu, 4 Apr 2024 at 23:06, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 4 Apr 2024, at 23:02, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > > How about we make it meson/make targets, so they are simply cached > > just like any of our other build artefacts are cached. Then only clean > > builds are impacted, not every test run. > > They already are (well, make not meson yet), they're just not hooked up to any > top-level commands. Running "make ssfiles-clean ssfiles" in src/test/ssl > regenerates all the files from the base config files that define their > contents. Okay turns out even generating them in parallel isn't any faster than running that sequentially. I guess it's because of the strong random generation being the slow part. Command I used was the following and took ~5 seconds on my machine: make -C src/test/ssl sslfiles-clean && make -C src/test/ssl sslfiles -j20 And I think that's actually a good thing, because that would mean total build time is pretty much not impacted if we'd include it as part of the regular clean build. Since building these certs are bottlenecked on randomness, not on CPU (as pretty much all of our other build artifacts are). So they should pipeline pretty very well with the other items, assuming build concurrency is set slightly higher than the number of cores. As a proof of concept I ran the above command in a simple bash loop constantly: while true; do make -C src/test/ssl sslfiles-clean && make -C src/test/ssl sslfiles -j20; done And then ran a clean (parallel) build in another shell: ninja -C build clean && ninja -C build And total build time went from 41 to 43 seconds. To be clear, that's while constantly running the ssl file creation. If I only run the creation once, there's no noticeable increase in build time at all.
Jelte Fennema-Nio <postgres@jeltef.nl> writes: > Okay turns out even generating them in parallel isn't any faster than > running that sequentially. I guess it's because of the strong random > generation being the slow part. Command I used was the following and > took ~5 seconds on my machine: > make -C src/test/ssl sslfiles-clean && make -C src/test/ssl sslfiles -j20 Just for comparison's sake, this takes about 2 minutes on mamba's host. Now that's certainly museum-grade hardware, but I don't think it's even the slowest machine in the buildfarm. On a Raspberry Pi 4B, it was about 25 seconds. (I concur with your result that parallelism helps little.) regards, tom lane
On Thu, Apr 4, 2024 at 4:48 PM Daniel Gustafsson <daniel@yesql.se> wrote: > AFAIK we haven't historically enforced that installations have the openssl > binary in PATH, but it would be a pretty low bar to add. The bigger issue is > likely to find someone to port this to Windows, it probably won't be too hard > but as with all things building on Windows, we need someone skilled in that > area to do it. I wonder how hard it would be to just code up our own binary to do this. If it'd be a pain to do that, or to maintain it across SSL versions, then it's a bad plan and we shouldn't do it. But if it's not that much code, maybe it'd be worth considering. I'm also sort of afraid that we're getting sucked into thinking real hard about this SSL certificate issue rather than trying to brainstorm all the other places that might be problematic. The latter might be a more fruitful exercise (or maybe not, what do I know?). -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 5, 2024 at 6:24 AM Robert Haas <robertmhaas@gmail.com> wrote: > I wonder how hard it would be to just code up our own binary to do > this. If it'd be a pain to do that, or to maintain it across SSL > versions, then it's a bad plan and we shouldn't do it. But if it's not > that much code, maybe it'd be worth considering. I think my biggest concern, other than the maintenance costs, would be the statement "we know SSL works on Windows because we test it against some certificates we hand-rolled ourselves." We can become experts in certificate formats, of course, but... does it buy us much? If someone comes and complains that a certificate doesn't work correctly (as they have *very* recently [3]), I would like to be able to write a test that uses what OpenSSL actually generates as opposed to learning how to make it myself first. > I'm also sort of afraid that we're getting sucked into thinking real > hard about this SSL certificate issue rather than trying to brainstorm > all the other places that might be problematic. The latter might be a > more fruitful exercise (or maybe not, what do I know?). +1. Don't get me wrong: I spent a lot of time refactoring the sslfiles machinery a while back, and I would love for it to all go away. I don't really want to interrupt any lines of thought that are moving in that direction. Please continue. _And also:_ the xz attack relied on a long chain of injections, both technical and social. I'm still wrapping my head around Russ Cox's writeup [1, 2], but the "hidden blob of junk" is only a single part of all that. I'm not even sure it was a necessary part; it just happened to work well for this particular project and line of attack. I've linked Russ Cox in particular because Golang has made a bunch of language-level decisions with the supply chain in mind, including the philosophy that a build should ideally not be able to run arbitrary code at all, and therefore generated files _must_ be checked in. I remember $OLDJOB having buildbots that would complain if the contents of the file you checked in didn't match what was (reproducibly!) generated. I think there's a lot more to think about here. --Jacob [1] https://research.swtch.com/xz-timeline [2] https://research.swtch.com/xz-script [3] https://www.postgresql.org/message-id/flat/17760-b6c61e752ec07060%40postgresql.org
On Fri, Apr 05, 2024 at 11:40:46AM -0700, Jacob Champion wrote: > On Fri, Apr 5, 2024 at 6:24 AM Robert Haas <robertmhaas@gmail.com> wrote: >> I'm also sort of afraid that we're getting sucked into thinking real >> hard about this SSL certificate issue rather than trying to brainstorm >> all the other places that might be problematic. The latter might be a >> more fruitful exercise (or maybe not, what do I know?). > > +1. Don't get me wrong: I spent a lot of time refactoring the sslfiles > machinery a while back, and I would love for it to all go away. I > don't really want to interrupt any lines of thought that are moving in > that direction. Please continue. There are a few things that I've not seen mentioned on this thread. Any random byte sequences included in the tree should have ways to be regenerated. One problem with xz was that the binary blobs were directly part of the tree, with the input file and the test crafted so as the test would skip portions of the input, one line in ./configure being enough to switch the backdoor to evil mode (correct me if I read that wrong). There could be other things that one would be tempted to introduce for Postgres as test data to introduce a backdoor with a slight tweak of the source tarball. To name a few: - Binary WAL sequences, arguing that this WAL data is useful even with alignment restrictions. - Data for COPY. - Physical dumps, in raw or just SQL data. Anything like that can also be used in some test data provided by somebody in a proposed patch or a bug report, for the sake of reproducing an issue. If there is an attack on this data, another contributor could run it and get his/her own host powned. One thing that would be easy to hide is something that reads on-disk file in a large dump file, with something edited in its inner parts. An extra thing is if we should expand the use of signed commits and potentially physical keys for committers, provided by pg-infra which would be the source of trust? Some have mentioned that in the past, and this could reduce the risk of committing problematic code if a committer's host is powned because the physical key would be required. Saying that, my spidey sense tingles at the recent commit 3311ea86edc7, that had the idea to introduce a 20k line output file based on a 378 line input file full of random URLs. In my experience, tests don't require to be that large to be useful, and the input data is very hard to parse. -- Michael
Вложения
Hi, > There are many more interesting and scary libraries in the dependency > tree of "postgres", so just picking off one right now doesn't really > accomplish anything. The next release of libsystemd will drop all > the compression libraries as hard dependencies, so the issue in that > sense is gone anyway. Also, fun fact: liblzma is also a dependency > via libxml2. Having an audit of all libraries linked to postgres and their level of trust should help to point the next weak point. I'm pretty sure we have several of these tiny libraries maintained by a lone out of time hacker linked somewhere. What is the next xz ? Regards, Étienne -- DALIBO
On Fri, Apr 5, 2024 at 5:14 PM Michael Paquier <michael@paquier.xyz> wrote: > Saying that, my spidey sense tingles at the recent commit > 3311ea86edc7, that had the idea to introduce a 20k line output file > based on a 378 line input file full of random URLs. In my experience, > tests don't require to be that large to be useful, and the input data > is very hard to parse. That's a good point. I've proposed a patch over at [1] to shrink it substantially. --Jacob [1] https://www.postgresql.org/message-id/CAOYmi%2B%3Dkx14ui_A__4L_XcFePSuUuR1kwJfUKxphuZU_i6%3DWpA%40mail.gmail.com
On Sat, Mar 30, 2024 at 04:50:26PM -0400, Robert Haas wrote: > An awful lot of what we do operates on the principle that we know the > people who are involved and trust them, and I'm glad we do trust them, > but the world is full of people who trusted somebody too much and > regretted it afterwards. The fact that we have many committers rather > than a single maintainer probably reduces risk at least as far as the > source repository is concerned, because there are more people paying > attention to potentially notice something that isn't as it should be. One unwritten requirement for committers is that we are able to communicate with them securely. If we cannot do that, they potentially could be forced by others, e.g., governments, to add code to our repositories. Unfortunately, there is on good way for them to communicate with us securely once they are unable to communicate with us securely. I suppose some special word could be used to communicate that status --- that is how it was done in non-electronic communication in the past. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On 4/8/24 22:57, Bruce Momjian wrote: > On Sat, Mar 30, 2024 at 04:50:26PM -0400, Robert Haas wrote: >> An awful lot of what we do operates on the principle that we know the >> people who are involved and trust them, and I'm glad we do trust them, >> but the world is full of people who trusted somebody too much and >> regretted it afterwards. The fact that we have many committers rather >> than a single maintainer probably reduces risk at least as far as the >> source repository is concerned, because there are more people paying >> attention to potentially notice something that isn't as it should be. > > One unwritten requirement for committers is that we are able to > communicate with them securely. If we cannot do that, they potentially > could be forced by others, e.g., governments, to add code to our > repositories. > > Unfortunately, there is on good way for them to communicate with us > securely once they are unable to communicate with us securely. I > suppose some special word could be used to communicate that status --- > that is how it was done in non-electronic communication in the past. I don't know how that really helps. If one of our committers is under duress, they probably cannot risk outing themselves anyway. The best defense, IMHO, is the fact that our source code is open and can be reviewed freely. The trick is to get folks to do the review. I know, for example, at $past_employer we had a requirement to get someone on our staff to review every single commit in order to maintain certain certifications. Of course there is no guarantee that such reviews would catch everything, but maybe we could establish post commit reviews by contributors in a more rigorous way? Granted, getting more qualified volunteers is not a trivial problem... -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Apr 4, 2024 at 1:10 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 03.04.24 23:19, Magnus Hagander wrote:
> When the code is this simple, we should definitely consider carrying it
> ourselves. At least if we don't expect to need *other* functionality
> from the same library in the future, which I doubt we will from libsystemd.
Well, I've long had it on my list to do some integration to log directly
to the journal, so you can preserve metadata better. I'm not sure right
now whether this would use libsystemd, but it's not like there is
absolutely no other systemd-related functionality that could be added.
Ah interesting. I hadn't thought of that use-case.
Personally, I think this proposed change is trying to close a barndoor
after a horse has bolted. There are many more interesting and scary
libraries in the dependency tree of "postgres", so just picking off one
right now doesn't really accomplish anything. The next release of
libsystemd will drop all the compression libraries as hard dependencies,
so the issue in that sense is gone anyway. Also, fun fact: liblzma is
also a dependency via libxml2.
To be clear, I didn't mean to single out this one, just saying that it's something we should keep in consideration in general when adding library dependencies. Every new dependency, no matter how small, increases the management and risks for it. And we should just be aware of that and weigh them against each other.
As in we should *consider* it, that doesn't' mean we should necessarily *do* it.
(And yes, there are many scary dependencies down the tree)
Hi, On 2024-04-04 01:10:20 +0200, Peter Eisentraut wrote: > On 03.04.24 23:19, Magnus Hagander wrote: > > When the code is this simple, we should definitely consider carrying it > > ourselves. At least if we don't expect to need *other* functionality > > from the same library in the future, which I doubt we will from > > libsystemd. > > Well, I've long had it on my list to do some integration to log directly to > the journal, so you can preserve metadata better. I'm not sure right now > whether this would use libsystemd, but it's not like there is absolutely no > other systemd-related functionality that could be added. Interesting. I think that'd in all likelihood end up using libsystemd. > Personally, I think this proposed change is trying to close a barndoor after > a horse has bolted. There are many more interesting and scary libraries in > the dependency tree of "postgres", so just picking off one right now doesn't > really accomplish anything. The next release of libsystemd will drop all > the compression libraries as hard dependencies, so the issue in that sense > is gone anyway. Also, fun fact: liblzma is also a dependency via libxml2. I agree that doing this just because of future risk in liblzma is probably not worth it. Despite soon not being an unconditional dependency of libsystemd anymore, I'd guess that in a few months it's going to be one of the better vetted libraries out there. But I don't think that means it's not worth reducing the dependencies that we have unconditionally loaded. Having a dependency to a fairly large library (~900k), which could be avoided with ~30 LOC, is just not worth it. Independent of liblzma. Not from a performance POV, nor from a risk POV. I'm actually fairly bothered by us linking to libxml2. It was effectively unmaintained for most of the last decade, with just very occasional drive-by commits. And it's not that there weren't significant bugs or such. Maintenance has picked up some, but it's still not well maintained, I'd say. If I wanted to attack postgres, it's where I'd start. My worry level, in decreasing order, about postmaster-level dependencies: - libxml2 - effectively unmaintained, just about maintained today - gssapi - heavily undermaintained from what I can see, lots of complicated code - libldap - undermaintained, lots of creaky old code - libpam - undermaintained, lots of creaky old code - the rest Greetings, Andres Freund
On Fri, Apr 12, 2024 at 09:00:11AM -0700, Andres Freund wrote: > I'm actually fairly bothered by us linking to libxml2. It was effectively > unmaintained for most of the last decade, with just very occasional drive-by > commits. And it's not that there weren't significant bugs or such. Maintenance > has picked up some, but it's still not well maintained, I'd say. If I wanted > to attack postgres, it's where I'd start. Indeed, libxml2 worries me to, as much as out-of-core extensions. There are a bunch of these out there, some of them not that maintained, and they could face similar attacks. -- Michael