Обсуждение: Test suite fails on alpha architecture
Hello PostgreSQL developers, The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions up to 8.2 worked fine). Apparently there is some disagreement about how to report divisions by zero: float8.out: - ERROR: value out of range: overflow + ERROR: invalid argument for power function errors.out: - ERROR: division by zero + ERROR: floating-point exception + DETAIL: An invalid floating-point operation was signaled. This probably = means an out-of-range result or an invalid operation, such as division by z= ero. and some more (case, transactions, guc, plpgsql). The full build log including diffs and initdb/postmaster logs is on http://experimental.ftbfs.de/fetch.php?&pkg=3Dpostgresql-8.3&ver=3D8.3%7Ebe= ta2-1&arch=3Dalpha&stamp=3D1193991806&file=3Dlog&as=3Draw Thank you! Martin --=20 Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
Martin Pitt <martin@piware.de> writes: > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions > up to 8.2 worked fine). We redid some of the float error handling for 8.3, in hopes of getting closer to the IEEE standard behavior for NaNs and infinities and so on. I guess that isn't working on your Alpha. I have a vague recollection that Alphas use non-IEEE floats so maybe this is not too surprising. Can you grant one of us access to the machine to work on it? Or poke into it yourself? regards, tom lane
Hi, Tom Lane [2007-11-03 14:27 -0400]: > Martin Pitt <martin@piware.de> writes: > > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions > > up to 8.2 worked fine). > > We redid some of the float error handling for 8.3, in hopes of getting > closer to the IEEE standard behavior for NaNs and infinities and so on. > I guess that isn't working on your Alpha. I have a vague recollection > that Alphas use non-IEEE floats so maybe this is not too surprising. > > Can you grant one of us access to the machine to work on it? I don't own any alpha machine, but maybe Frank, Steven, or anyone from the Debian alpha porter list can create a temporary account for you? > Or poke into it yourself? There is no developer accessible alpha porter box for Debian unfortunately. :( Thank you, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
On Sat, Nov 03, 2007 at 06:32:34PM -0400, Martin Pitt wrote: > Tom Lane [2007-11-03 14:27 -0400]: > > Martin Pitt <martin@piware.de> writes: > > > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions > > > up to 8.2 worked fine). > > > > We redid some of the float error handling for 8.3, in hopes of getting > > closer to the IEEE standard behavior for NaNs and infinities and so on. > > I guess that isn't working on your Alpha. I have a vague recollection > > that Alphas use non-IEEE floats so maybe this is not too surprising. > > > > Can you grant one of us access to the machine to work on it? > > I don't own any alpha machine, but maybe Frank, Steven, or anyone from > the Debian alpha porter list can create a temporary account for you? I'm not sure how we handle that for our experimental buildds. Admins? Gruesse, -- Frank Lichtenheld <djpig@debian.org> www: http://www.djpig.de/
Hi *: Martin Pitt escribió: > Hi, > > Tom Lane [2007-11-03 14:27 -0400]: >> Martin Pitt <martin@piware.de> writes: >>> The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions >>> up to 8.2 worked fine). >> We redid some of the float error handling for 8.3, in hopes of getting >> closer to the IEEE standard behavior for NaNs and infinities and so on. >> I guess that isn't working on your Alpha. I have a vague recollection >> that Alphas use non-IEEE floats so maybe this is not too surprising. >> >> Can you grant one of us access to the machine to work on it? > > I don't own any alpha machine, but maybe Frank, Steven, or anyone from > the Debian alpha porter list can create a temporary account for you? > >> Or poke into it yourself? > > There is no developer accessible alpha porter box for Debian > unfortunately. :( > Since Debian is having some problems with its alpha development machine, the Gentoo/Alpha port is happy to offer some help with this problem. We can provide with shell account access (or even a chroot) in our development machine (AlphaServer ES40) for debugging this PostgreSQL bug. If it only happens on Debian, I could create a debian chroot for testing. Gentoo/Alpha Team can be reached by mail: alpha@gentoo.org or by IRC in Freenode #gentoo-alpha. Thanks and feel free to ask if you need something more. -- Jose Luis Rivero <yoswink@gentoo.org> Gentoo/Doc Gentoo/Alpha
"José Luis Rivero (yoswink)" <yoswink@gentoo.org> writes: > Since Debian is having some problems with its alpha development machine, > the Gentoo/Alpha port is happy to offer some help with this problem. > We can provide with shell account access (or even a chroot) in our > development machine (AlphaServer ES40) for debugging this PostgreSQL > bug. If it only happens on Debian, I could create a debian chroot for > testing. I'm guessing that it's specific to Alpha (and maybe glibc) but not any particular Linux distro. So let's try Gentoo first, and then Martin can check if the fix works for Debian. If you could set me up a shell account accessible by ssh, I should have time to poke at this tomorrow. I don't need root access but will need all the usual C development tools (gcc, gdb, etc). Thanks for helping! regards, tom lane
Hi! On Tue, 06 Nov 2007, Tom Lane wrote: > If you could set me up a shell account accessible by ssh, I should have > time to poke at this tomorrow. I don't need root access but will need > all the usual C development tools (gcc, gdb, etc). Just send me a (preferably signed) mail with desired username and SSH-PubKey. Regards, Tobias -- In the future, everyone will be anonymous for 15 minutes.
On Tue, Nov 06, 2007 at 12:52:34PM -0500, Tom Lane wrote: > "José Luis Rivero (yoswink)" <yoswink@gentoo.org> writes: > > Since Debian is having some problems with its alpha development machine, > > the Gentoo/Alpha port is happy to offer some help with this problem. > > We can provide with shell account access (or even a chroot) in our > > development machine (AlphaServer ES40) for debugging this PostgreSQL > > bug. If it only happens on Debian, I could create a debian chroot for > > testing. > I'm guessing that it's specific to Alpha (and maybe glibc) but not any > particular Linux distro. It may be specific to particular versions of glibc and the kernel. At least one of the test regressions is actually due to the bug described in <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug into the rest of the failures further at this point. But if it can be reproduced on other distros as well, all the better. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. vorlon@debian.org http://www.debian.org/
Steve Langasek <vorlon@debian.org> writes: > It may be specific to particular versions of glibc and the kernel. At least > one of the test regressions is actually due to the bug described in > <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug > into the rest of the failures further at this point. Thanks for the tip about that bug. Using the gentoo project's kindly-lent Alpha, I see that the failure in our float8 regression test is indeed explained by floor() doing the wrong thing. The case that fails is regression=# select (-34.84)::float8 ^ '1e200'; ERROR: 2201F: invalid argument for power function LOCATION: dpow, float.c:1337 where we are expecting to get "value out of range: overflow". Instead this test is failing: /* * The SQL spec requires that we emit a particular SQLSTATE error code for * certain error conditions. */ if ((arg1 == 0 && arg2 < 0) || (arg1 < 0 && floor(arg2) != arg2)) ereport(ERROR, (errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION), errmsg("invalid argument for power function"))); and indeed regression=# select floor(1e200::float8) - 1e200::float8; ?column? ------------------------ -1.69964157701365e+184 (1 row) so it seems floor(3m) is off by one in the last place. > But if it can be reproduced on other distros as well, all the better. All the other diffs that Martin showed are divide-by-zero failures, and I do not see any of them on Gentoo's machine. I think that this must be a compiler bug. The first example in his diffs is just "select 1/0", which executes this code: int32 arg1 = PG_GETARG_INT32(0); int32 arg2 = PG_GETARG_INT32(1); int32 result; if (arg2 == 0) ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO), errmsg("division by zero"))); result = arg1 / arg2; It looks to me like Debian's compiler must be allowing the division instruction to be speculatively executed before the if-test branch is taken. Perhaps it is supposing that this is OK because control will return from ereport(), when in fact it will not (the routine throws a longjmp). Since we've not seen such behavior on any other platform, however, I suspect this is just a bug and not intentional. FWIW the Gentoo machine is running $ gcc -v Using built-in specs. Target: alpha-unknown-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu Thread model: posix gcc version 4.1.2 (Gentoo 4.1.2) Bottom line is that I see nothing here that the Postgres project can fix --- these are library and compiler bugs. regards, tom lane
Martin Pitt <martin@piware.de> writes: > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions > up to 8.2 worked fine). Apparently there is some disagreement about > how to report divisions by zero: BTW, having now looked closely at the diffs, the problems do not seem to be anywhere near the code we changed for 8.3. So I think the real issue is that your compiler and glibc changed under you. Could you perhaps retest 8.2 with the current toolchain and confirm that it fails too? regards, tom lane
Hi Tom, Tom Lane [2007-11-07 13:49 -0500]: > Bottom line is that I see nothing here that the Postgres project can > fix --- these are library and compiler bugs. Thank you for your detailled analysis! I'll file bugs to the appropriate places then. Thanks, Martin --=20 Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
Tom Lane <tgl@sss.pgh.pa.us> writes: > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > > result = arg1 / arg2; > > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. Can you create a stand-alone testcase for this? -- Falk
Falk Hueffner <falk@debian.org> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> It looks to me like Debian's compiler must be allowing the division >> instruction to be speculatively executed before the if-test branch >> is taken. > Can you create a stand-alone testcase for this? I don't have access to a machine on which the failure occurs, but perhaps Martin can try it. I'd think it'd be pretty easy, say #include <stdio.h> #include <stdlib.h> void ereport(const char *msg) { fprintf(stderr, "%s\n", msg); exit(0); } int main(int argc, char **argv) { int arg1 = atoi(argv[1]); int arg2 = atoi(argv[2]); int result; if (arg2 == 0) ereport("division by zero"); result = arg1 / arg2; printf("%d\n", result); return 0; } cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c ./a.out 1 0 I would not be surprised at all if it's compile-switch dependent; these look to be the switches Martin tested with. regards, tom lane
On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote: > Steve Langasek <vorlon@debian.org> writes: > > It may be specific to particular versions of glibc and the kernel. At least > > one of the test regressions is actually due to the bug described in > > <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug > > into the rest of the failures further at this point. > > But if it can be reproduced on other distros as well, all the better. > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > result = arg1 / arg2; > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. > FWIW the Gentoo machine is running > $ gcc -v > Using built-in specs. > Target: alpha-unknown-linux-gnu > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu > Thread model: posix > gcc version 4.1.2 (Gentoo 4.1.2) Ok, and Debian is building with gcc 4.2: $ gcc -v Using built-in specs. Target: alpha-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu --host=alpha-linux-gnu --target=alpha-linux-gnu Thread model: posix gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3) $ Any chance of testing with a newer version of gcc on Gentoo as well to help confirm that the compiler is to blame? > Bottom line is that I see nothing here that the Postgres project can > fix --- these are library and compiler bugs. Right; though whereas the floor() bug could simply be ignored since it will be fixed in glibc (or the kernel) when the time comes, if the other regressions are the result of a compiler problem then ignoring those failures would indeed mean distributing broken binaries. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. vorlon@debian.org http://www.debian.org/
Tom Lane wrote: > BTW, having now looked closely at the diffs, the problems do not seem=20= =20 > be anywhere near the code we changed for 8.3. So I think the real=20=20= =20 > issue is that your compiler and glibc changed under you. Could you=20=20 > perhaps retest 8.2 with the current toolchain and confirm that it=20=20 > fails too? I'm not in the Debian team but may this help? http://buildd.debian.org/build.php?arch=3Dalpha&pkg=3Dpostgresql-8.2 I'm very interested in 8.2.5 going into Testing for it to reach=20=20 Backports, and it turns out that the Alpha build is blocking it. -- Pedro Gimeno
Pedro Gimeno <pgsql-001@personal.formauri.es> writes: > Tom Lane wrote: >> BTW, having now looked closely at the diffs, the problems do not seem >> be anywhere near the code we changed for 8.3. So I think the real >> issue is that your compiler and glibc changed under you. Could you >> perhaps retest 8.2 with the current toolchain and confirm that it >> fails too? > I'm not in the Debian team but may this help? > http://buildd.debian.org/build.php?arch=alpha&pkg=postgresql-8.2 Yeah, that seems to confirm that it is a tools problem rather than anything we did to 8.3 ... and moreover that the breakage went in sometime between Jun 23 and Aug 17. regards, tom lane
On Wed, Nov 07, 2007 at 02:41:51PM -0500, Steve Langasek wrote: > On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote: > > All the other diffs that Martin showed are divide-by-zero failures, > > and I do not see any of them on Gentoo's machine. I think that this > > must be a compiler bug. The first example in his diffs is just > > "select 1/0", which executes this code: > > > int32 arg1 = PG_GETARG_INT32(0); > > int32 arg2 = PG_GETARG_INT32(1); > > int32 result; > > > if (arg2 == 0) > > ereport(ERROR, > > (errcode(ERRCODE_DIVISION_BY_ZERO), > > errmsg("division by zero"))); > > > result = arg1 / arg2; > > > It looks to me like Debian's compiler must be allowing the division > > instruction to be speculatively executed before the if-test branch > > is taken. Perhaps it is supposing that this is OK because control > > will return from ereport(), when in fact it will not (the routine > > throws a longjmp). Since we've not seen such behavior on any other > > platform, however, I suspect this is just a bug and not intentional. > > > FWIW the Gentoo machine is running > > > $ gcc -v > > Using built-in specs. > > Target: alpha-unknown-linux-gnu > > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu > > Thread model: posix > > gcc version 4.1.2 (Gentoo 4.1.2) > > Ok, and Debian is building with gcc 4.2: > > $ gcc -v > Using built-in specs. > Target: alpha-linux-gnu > Configured with: ../src/configure -v > --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr > --enable-shared --with-system-zlib --libexecdir=/usr/lib > --without-included-gettext --enable-threads=posix --enable-nls > --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 > --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp > --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu > --host=alpha-linux-gnu --target=alpha-linux-gnu > Thread model: posix > gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3) > $ > > Any chance of testing with a newer version of gcc on Gentoo as well to help > confirm that the compiler is to blame? > In Gentoo the testcase gives the same "division by zero" under these gcc versions: Current Stable: gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2) Current Testing: gcc version 4.2.2 (Gentoo 4.2.2 p1.0) Feel free to add me if you have an open bug for this, in order to test anything you need or provide some more information about our platform. Thanks. > > Bottom line is that I see nothing here that the Postgres project can > > fix --- these are library and compiler bugs. > > Right; though whereas the floor() bug could simply be ignored since it will > be fixed in glibc (or the kernel) when the time comes, if the other > regressions are the result of a compiler problem then ignoring those > failures would indeed mean distributing broken binaries. > -- Jose Luis Rivero <yoswink@gentoo.org> Gentoo/Doc Gentoo/Alpha
Heya, I know I'm quite late with my answer, sorry. Frank Lichtenheld <djpig@debian.org> writes: > On Sat, Nov 03, 2007 at 06:32:34PM -0400, Martin Pitt wrote: >>> Can you grant one of us access to the machine to work on it? >> I don't own any alpha machine, but maybe Frank, Steven, or anyone from >> the Debian alpha porter list can create a temporary account for you? > I'm not sure how we handle that for our experimental buildds. Admins? One of the alphas used in the experimental buildd network is actually in bdale's basement, so I'm not really happy to hand out access to it. The other one (digitalis), which is hosted at the university of Darmstadt and is our under full control, should actually be used as a porting machine if needed.=20 Debian Developers [1] can get access to them by pinging either Andreas Barth, Martin Zobel-Helas or me. We have our own userdir-ldap setup, so please include a mail address and a verifiable GPG key in your ping, together with a short description what you want to do. Marc Footnotes:=20 [1] And Debian contributors, as long as there is some sort of trust relationship --=20 BOFH #357: I'd love to help you -- it's just that the Boss won't let me near the computer.=20
On Wed, Nov 07, 2007 at 02:44:23PM -0500, Tom Lane wrote: > I don't have access to a machine on which the failure occurs, but > perhaps Martin can try it. I'd think it'd be pretty easy, say > #include <stdio.h> > #include <stdlib.h> > void > ereport(const char *msg) > { > fprintf(stderr, "%s\n", msg); > exit(0); > } > > int > main(int argc, char **argv) > { > int arg1 = atoi(argv[1]); > int arg2 = atoi(argv[2]); > int result; > > if (arg2 == 0) > ereport("division by zero"); > > result = arg1 / arg2; > > printf("%d\n", result); > > return 0; > } > cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c > ./a.out 1 0 > I would not be surprised at all if it's compile-switch dependent; these > look to be the switches Martin tested with. So strangely, when I first ran this test case I recall being able to reproduce the SIGFPE; but now going back to it I'm getting the correct "division by zero" output. But postgresql still fails to build with the same errors as before. FWIW, the first test suite failure involving floor() has been resolved now in the glibc package in unstable. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. vorlon@debian.org http://www.debian.org/
Hi, Tom Lane [2007-11-07 13:49 -0500]: > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > > result = arg1 / arg2; > > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. I tried this on a Debian Alpha porter box (thanks, Steve, for pointing me at it) with Debian's gcc 4.2.2. Latest sid indeed still has this bug (the floor() one is confirmed fixed), not only on Alpha, but also on sparc. Since the simple test case did not reproduce the error, I tried to make a more sophisticated one which resembles more closely what PostgreSQL does (sigsetjmp/siglongjmp instead of exit(), some macros, etc.). Unfortunately in vain, since the test case still works perfectly with both no compiler options and also the ones used for PostgreSQL. I attach it here nevertheless just in case someone has more luck than me. So I tried to approach it from the other side: Building postgresql with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I get above bug. So I guess I'll build with -O1 for the time being on sparc and alpha to get correct binaries until this is sorted out. Any idea what else I could try? Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
Вложения
Martin Pitt [2007-12-04 23:43 +0100]: > So I tried to approach it from the other side: Building postgresql > with CFLAGS=3D"-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I > get above bug. Just FAOD, building with gcc 4.1 and -O2 works fine. I guess this sufficiently proves that this is a gcc 4.2 bug. Martin --=20 Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org