Обсуждение: Test suite fails on alpha architecture

Поиск
Список
Период
Сортировка

Test suite fails on alpha architecture

От
Martin Pitt
Дата:
Hello PostgreSQL developers,

The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
up to 8.2 worked fine). Apparently there is some disagreement about
how to report divisions by zero:

float8.out:
- ERROR:  value out of range: overflow
+ ERROR:  invalid argument for power function

errors.out:
- ERROR:  division by zero
+ ERROR:  floating-point exception
+ DETAIL:  An invalid floating-point operation was signaled. This probably =
means an out-of-range result or an invalid operation, such as division by z=
ero.

and some more (case, transactions, guc, plpgsql).

The full build log including diffs and initdb/postmaster logs is on
http://experimental.ftbfs.de/fetch.php?&pkg=3Dpostgresql-8.3&ver=3D8.3%7Ebe=
ta2-1&arch=3Dalpha&stamp=3D1193991806&file=3Dlog&as=3Draw

Thank you!

Martin
--=20
Martin Pitt         http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
Martin Pitt <martin@piware.de> writes:
> The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
> up to 8.2 worked fine).

We redid some of the float error handling for 8.3, in hopes of getting
closer to the IEEE standard behavior for NaNs and infinities and so on.
I guess that isn't working on your Alpha.  I have a vague recollection
that Alphas use non-IEEE floats so maybe this is not too surprising.

Can you grant one of us access to the machine to work on it?
Or poke into it yourself?

            regards, tom lane

Re: Test suite fails on alpha architecture

От
Martin Pitt
Дата:
Hi,

Tom Lane [2007-11-03 14:27 -0400]:
> Martin Pitt <martin@piware.de> writes:
> > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
> > up to 8.2 worked fine).
>
> We redid some of the float error handling for 8.3, in hopes of getting
> closer to the IEEE standard behavior for NaNs and infinities and so on.
> I guess that isn't working on your Alpha.  I have a vague recollection
> that Alphas use non-IEEE floats so maybe this is not too surprising.
>
> Can you grant one of us access to the machine to work on it?

I don't own any alpha machine, but maybe Frank, Steven, or anyone from
the Debian alpha porter list can create a temporary account for you?

> Or poke into it yourself?

There is no developer accessible alpha porter box for Debian
unfortunately. :(

Thank you,

Martin

--
Martin Pitt         http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Re: Test suite fails on alpha architecture

От
Frank Lichtenheld
Дата:
On Sat, Nov 03, 2007 at 06:32:34PM -0400, Martin Pitt wrote:
> Tom Lane [2007-11-03 14:27 -0400]:
> > Martin Pitt <martin@piware.de> writes:
> > > The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
> > > up to 8.2 worked fine).
> >
> > We redid some of the float error handling for 8.3, in hopes of getting
> > closer to the IEEE standard behavior for NaNs and infinities and so on.
> > I guess that isn't working on your Alpha.  I have a vague recollection
> > that Alphas use non-IEEE floats so maybe this is not too surprising.
> >
> > Can you grant one of us access to the machine to work on it?
>
> I don't own any alpha machine, but maybe Frank, Steven, or anyone from
> the Debian alpha porter list can create a temporary account for you?

I'm not sure how we handle that for our experimental buildds. Admins?

Gruesse,
--
Frank Lichtenheld <djpig@debian.org>
www: http://www.djpig.de/

Re: Test suite fails on alpha architecture

От
"José Luis Rivero (yoswink)"
Дата:
Hi *:

Martin Pitt escribió:
> Hi,
>
> Tom Lane [2007-11-03 14:27 -0400]:
>> Martin Pitt <martin@piware.de> writes:
>>> The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
>>> up to 8.2 worked fine).
>> We redid some of the float error handling for 8.3, in hopes of getting
>> closer to the IEEE standard behavior for NaNs and infinities and so on.
>> I guess that isn't working on your Alpha.  I have a vague recollection
>> that Alphas use non-IEEE floats so maybe this is not too surprising.
>>
>> Can you grant one of us access to the machine to work on it?
>
> I don't own any alpha machine, but maybe Frank, Steven, or anyone from
> the Debian alpha porter list can create a temporary account for you?
>
>> Or poke into it yourself?
>
> There is no developer accessible alpha porter box for Debian
> unfortunately. :(
>

Since Debian is having some problems with its alpha development machine,
the Gentoo/Alpha port is happy to offer some help with this problem.

We can provide with shell account access (or even a chroot) in our
development machine (AlphaServer ES40) for debugging this PostgreSQL
bug. If it only happens on Debian, I could create a debian chroot for
testing.

Gentoo/Alpha Team can be reached by mail: alpha@gentoo.org or by IRC in
Freenode #gentoo-alpha.

Thanks and feel free to ask if you need something more.

--
Jose Luis Rivero <yoswink@gentoo.org>
Gentoo/Doc Gentoo/Alpha


Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
"José Luis Rivero (yoswink)" <yoswink@gentoo.org> writes:
> Since Debian is having some problems with its alpha development machine,
> the Gentoo/Alpha port is happy to offer some help with this problem.

> We can provide with shell account access (or even a chroot) in our
> development machine (AlphaServer ES40) for debugging this PostgreSQL
> bug. If it only happens on Debian, I could create a debian chroot for
> testing.

I'm guessing that it's specific to Alpha (and maybe glibc) but not any
particular Linux distro.  So let's try Gentoo first, and then Martin can
check if the fix works for Debian.

If you could set me up a shell account accessible by ssh, I should have
time to poke at this tomorrow.  I don't need root access but will need
all the usual C development tools (gcc, gdb, etc).

Thanks for helping!
        regards, tom lane


Re: Test suite fails on alpha architecture

От
Tobias Klausmann
Дата:
Hi!

On Tue, 06 Nov 2007, Tom Lane wrote:
> If you could set me up a shell account accessible by ssh, I should have
> time to poke at this tomorrow.  I don't need root access but will need
> all the usual C development tools (gcc, gdb, etc).

Just send me a (preferably signed) mail with desired username and
SSH-PubKey.

Regards,
Tobias
--
In the future, everyone will be anonymous for 15 minutes.

Re: Test suite fails on alpha architecture

От
Steve Langasek
Дата:
On Tue, Nov 06, 2007 at 12:52:34PM -0500, Tom Lane wrote:
> "José Luis Rivero (yoswink)" <yoswink@gentoo.org> writes:
> > Since Debian is having some problems with its alpha development machine,
> > the Gentoo/Alpha port is happy to offer some help with this problem.

> > We can provide with shell account access (or even a chroot) in our
> > development machine (AlphaServer ES40) for debugging this PostgreSQL
> > bug. If it only happens on Debian, I could create a debian chroot for
> > testing.

> I'm guessing that it's specific to Alpha (and maybe glibc) but not any
> particular Linux distro.

It may be specific to particular versions of glibc and the kernel.  At least
one of the test regressions is actually due to the bug described in
<http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug
into the rest of the failures further at this point.

But if it can be reproduced on other distros as well, all the better.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
vorlon@debian.org                                   http://www.debian.org/


Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
Steve Langasek <vorlon@debian.org> writes:
> It may be specific to particular versions of glibc and the kernel.  At least
> one of the test regressions is actually due to the bug described in
> <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug
> into the rest of the failures further at this point.

Thanks for the tip about that bug.  Using the gentoo project's
kindly-lent Alpha, I see that the failure in our float8 regression test
is indeed explained by floor() doing the wrong thing.  The case that
fails is

regression=# select (-34.84)::float8  ^ '1e200';
ERROR:  2201F: invalid argument for power function
LOCATION:  dpow, float.c:1337

where we are expecting to get "value out of range: overflow".  Instead this
test is failing:

    /*
     * The SQL spec requires that we emit a particular SQLSTATE error code for
     * certain error conditions.
     */
    if ((arg1 == 0 && arg2 < 0) ||
        (arg1 < 0 && floor(arg2) != arg2))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION),
                 errmsg("invalid argument for power function")));

and indeed

regression=# select floor(1e200::float8) - 1e200::float8;
        ?column?
------------------------
 -1.69964157701365e+184
(1 row)

so it seems floor(3m) is off by one in the last place.

> But if it can be reproduced on other distros as well, all the better.

All the other diffs that Martin showed are divide-by-zero failures,
and I do not see any of them on Gentoo's machine.  I think that this
must be a compiler bug.  The first example in his diffs is just
"select 1/0", which executes this code:

    int32        arg1 = PG_GETARG_INT32(0);
    int32        arg2 = PG_GETARG_INT32(1);
    int32        result;

    if (arg2 == 0)
        ereport(ERROR,
                (errcode(ERRCODE_DIVISION_BY_ZERO),
                 errmsg("division by zero")));

    result = arg1 / arg2;

It looks to me like Debian's compiler must be allowing the division
instruction to be speculatively executed before the if-test branch
is taken.  Perhaps it is supposing that this is OK because control
will return from ereport(), when in fact it will not (the routine
throws a longjmp).  Since we've not seen such behavior on any other
platform, however, I suspect this is just a bug and not intentional.

FWIW the Gentoo machine is running

$ gcc -v
Using built-in specs.
Target: alpha-unknown-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr
--bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include
--datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2
--mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man
--infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info
--with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu
--build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib
--disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib
--disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu 
Thread model: posix
gcc version 4.1.2 (Gentoo 4.1.2)

Bottom line is that I see nothing here that the Postgres project can
fix --- these are library and compiler bugs.

            regards, tom lane

Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
Martin Pitt <martin@piware.de> writes:
> The testsuite of 8.3 beta 2 fails on the Alpha architecture (versions
> up to 8.2 worked fine). Apparently there is some disagreement about
> how to report divisions by zero:

BTW, having now looked closely at the diffs, the problems do not seem to
be anywhere near the code we changed for 8.3.  So I think the real issue
is that your compiler and glibc changed under you.  Could you perhaps
retest 8.2 with the current toolchain and confirm that it fails too?

            regards, tom lane

Re: Test suite fails on alpha architecture

От
Martin Pitt
Дата:
Hi Tom,

Tom Lane [2007-11-07 13:49 -0500]:
> Bottom line is that I see nothing here that the Postgres project can
> fix --- these are library and compiler bugs.

Thank you for your detailled analysis! I'll file bugs to the
appropriate places then.

Thanks,

Martin

--=20
Martin Pitt         http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Re: Test suite fails on alpha architecture

От
Falk Hueffner
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine.  I think that this
> must be a compiler bug.  The first example in his diffs is just
> "select 1/0", which executes this code:
>
>     int32        arg1 = PG_GETARG_INT32(0);
>     int32        arg2 = PG_GETARG_INT32(1);
>     int32        result;
>
>     if (arg2 == 0)
>         ereport(ERROR,
>                 (errcode(ERRCODE_DIVISION_BY_ZERO),
>                  errmsg("division by zero")));
>
>     result = arg1 / arg2;
>
> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken.  Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp).  Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.

Can you create a stand-alone testcase for this?

--
    Falk

Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
Falk Hueffner <falk@debian.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> It looks to me like Debian's compiler must be allowing the division
>> instruction to be speculatively executed before the if-test branch
>> is taken.

> Can you create a stand-alone testcase for this?

I don't have access to a machine on which the failure occurs, but
perhaps Martin can try it.  I'd think it'd be pretty easy, say

#include <stdio.h>
#include <stdlib.h>

void
ereport(const char *msg)
{
    fprintf(stderr, "%s\n", msg);
    exit(0);
}

int
main(int argc, char **argv)
{
    int    arg1 = atoi(argv[1]);
    int    arg2 = atoi(argv[2]);
    int    result;

    if (arg2 == 0)
        ereport("division by zero");

    result = arg1 / arg2;

    printf("%d\n", result);

    return 0;
}


cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c
./a.out 1 0

I would not be surprised at all if it's compile-switch dependent; these
look to be the switches Martin tested with.

            regards, tom lane

Re: Test suite fails on alpha architecture

От
Steve Langasek
Дата:
On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote:
> Steve Langasek <vorlon@debian.org> writes:
> > It may be specific to particular versions of glibc and the kernel.  At least
> > one of the test regressions is actually due to the bug described in
> > <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug
> > into the rest of the failures further at this point.

> > But if it can be reproduced on other distros as well, all the better.

> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine.  I think that this
> must be a compiler bug.  The first example in his diffs is just
> "select 1/0", which executes this code:

>     int32        arg1 = PG_GETARG_INT32(0);
>     int32        arg2 = PG_GETARG_INT32(1);
>     int32        result;

>     if (arg2 == 0)
>         ereport(ERROR,
>                 (errcode(ERRCODE_DIVISION_BY_ZERO),
>                  errmsg("division by zero")));

>     result = arg1 / arg2;

> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken.  Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp).  Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.

> FWIW the Gentoo machine is running

> $ gcc -v
> Using built-in specs.
> Target: alpha-unknown-linux-gnu
> Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr
--bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include
--datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2
--mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man
--infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info
--with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu
--build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib
--disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib
--disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu 
> Thread model: posix
> gcc version 4.1.2 (Gentoo 4.1.2)

Ok, and Debian is building with gcc 4.2:

$ gcc -v
Using built-in specs.
Target: alpha-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp
--with-long-double-128 --enable-checking=release --build=alpha-linux-gnu
--host=alpha-linux-gnu --target=alpha-linux-gnu
Thread model: posix
gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)
$

Any chance of testing with a newer version of gcc on Gentoo as well to help
confirm that the compiler is to blame?

> Bottom line is that I see nothing here that the Postgres project can
> fix --- these are library and compiler bugs.

Right; though whereas the floor() bug could simply be ignored since it will
be fixed in glibc (or the kernel) when the time comes, if the other
regressions are the result of a compiler problem then ignoring those
failures would indeed mean distributing broken binaries.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
vorlon@debian.org                                   http://www.debian.org/

Re: Test suite fails on alpha architecture

От
Pedro Gimeno
Дата:
Tom Lane wrote:

> BTW, having now looked closely at the diffs, the problems do not seem=20=
=20
> be anywhere near the code we changed for 8.3.  So I think the real=20=20=
=20
> issue is that your compiler and glibc changed under you.  Could you=20=20
> perhaps retest 8.2 with the current toolchain and confirm that it=20=20
> fails too?

I'm not in the Debian team but may this help?

http://buildd.debian.org/build.php?arch=3Dalpha&pkg=3Dpostgresql-8.2

I'm very interested in 8.2.5 going into Testing for it to reach=20=20
Backports, and it turns out that the Alpha build is blocking it.

-- Pedro Gimeno

Re: Test suite fails on alpha architecture

От
Tom Lane
Дата:
Pedro Gimeno <pgsql-001@personal.formauri.es> writes:
> Tom Lane wrote:
>> BTW, having now looked closely at the diffs, the problems do not seem
>> be anywhere near the code we changed for 8.3.  So I think the real
>> issue is that your compiler and glibc changed under you.  Could you
>> perhaps retest 8.2 with the current toolchain and confirm that it
>> fails too?

> I'm not in the Debian team but may this help?
> http://buildd.debian.org/build.php?arch=alpha&pkg=postgresql-8.2

Yeah, that seems to confirm that it is a tools problem rather than
anything we did to 8.3 ... and moreover that the breakage went in
sometime between Jun 23 and Aug 17.

            regards, tom lane

Re: Test suite fails on alpha architecture

От
Jose Luis Rivero
Дата:
On Wed, Nov 07, 2007 at 02:41:51PM -0500, Steve Langasek wrote:
> On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote:
> > All the other diffs that Martin showed are divide-by-zero failures,
> > and I do not see any of them on Gentoo's machine.  I think that this
> > must be a compiler bug.  The first example in his diffs is just
> > "select 1/0", which executes this code:
>
> >     int32        arg1 = PG_GETARG_INT32(0);
> >     int32        arg2 = PG_GETARG_INT32(1);
> >     int32        result;
>
> >     if (arg2 == 0)
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_DIVISION_BY_ZERO),
> >                  errmsg("division by zero")));
>
> >     result = arg1 / arg2;
>
> > It looks to me like Debian's compiler must be allowing the division
> > instruction to be speculatively executed before the if-test branch
> > is taken.  Perhaps it is supposing that this is OK because control
> > will return from ereport(), when in fact it will not (the routine
> > throws a longjmp).  Since we've not seen such behavior on any other
> > platform, however, I suspect this is just a bug and not intentional.
>
> > FWIW the Gentoo machine is running
>
> > $ gcc -v
> > Using built-in specs.
> > Target: alpha-unknown-linux-gnu
> > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr
--bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2--includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include
--datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2
--mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man
--infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info
--with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4--host=alpha-unknown-linux-gnu
--build=alpha-unknown-linux-gnu--disable-altivec --enable-nls --without-included-gettext --with-system-zlib
--disable-checking--disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib
--disable-libmudflap--disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix--enable-__cxa_atexit --enable-clocale=gnu 
> > Thread model: posix
> > gcc version 4.1.2 (Gentoo 4.1.2)
>
> Ok, and Debian is building with gcc 4.2:
>
> $ gcc -v
> Using built-in specs.
> Target: alpha-linux-gnu
> Configured with: ../src/configure -v
> --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
> --enable-shared --with-system-zlib --libexecdir=/usr/lib
> --without-included-gettext --enable-threads=posix --enable-nls
> --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
> --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp
> --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu
> --host=alpha-linux-gnu --target=alpha-linux-gnu
> Thread model: posix
> gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)
> $
>
> Any chance of testing with a newer version of gcc on Gentoo as well to help
> confirm that the compiler is to blame?
>

In Gentoo the testcase gives the same "division by zero" under these
gcc versions:

Current Stable:
    gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)

Current Testing:
    gcc version 4.2.2 (Gentoo 4.2.2 p1.0)

Feel free to add me if you have an open bug for this, in order to test anything you
need or provide some more information about our platform.

Thanks.

> > Bottom line is that I see nothing here that the Postgres project can
> > fix --- these are library and compiler bugs.
>
> Right; though whereas the floor() bug could simply be ignored since it will
> be fixed in glibc (or the kernel) when the time comes, if the other
> regressions are the result of a compiler problem then ignoring those
> failures would indeed mean distributing broken binaries.
>

--
Jose Luis Rivero <yoswink@gentoo.org>
Gentoo/Doc Gentoo/Alpha

Re: Test suite fails on alpha architecture

От
Marc 'HE' Brockschmidt
Дата:
Heya,

I know I'm quite late with my answer, sorry.

Frank Lichtenheld <djpig@debian.org> writes:
> On Sat, Nov 03, 2007 at 06:32:34PM -0400, Martin Pitt wrote:
>>> Can you grant one of us access to the machine to work on it?
>> I don't own any alpha machine, but maybe Frank, Steven, or anyone from
>> the Debian alpha porter list can create a temporary account for you?
> I'm not sure how we handle that for our experimental buildds. Admins?

One of the alphas used in the experimental buildd network is actually in
bdale's basement, so I'm not really happy to hand out access to it. The
other one (digitalis), which is hosted at the university of Darmstadt
and is our under full control, should actually be used as a porting
machine if needed.=20

Debian Developers [1] can get access to them by pinging either Andreas
Barth, Martin Zobel-Helas or me. We have our own userdir-ldap setup, so
please include a mail address and a verifiable GPG key in your ping,
together with a short description what you want to do.

Marc

Footnotes:=20
[1]  And Debian contributors, as long as there is some sort of trust
     relationship
--=20
BOFH #357:
I'd love to help you -- it's just that the Boss won't let me near
the computer.=20

Re: Test suite fails on alpha architecture

От
Steve Langasek
Дата:
On Wed, Nov 07, 2007 at 02:44:23PM -0500, Tom Lane wrote:

> I don't have access to a machine on which the failure occurs, but
> perhaps Martin can try it.  I'd think it'd be pretty easy, say

> #include <stdio.h>
> #include <stdlib.h>

> void
> ereport(const char *msg)
> {
>     fprintf(stderr, "%s\n", msg);
>     exit(0);
> }
>
> int
> main(int argc, char **argv)
> {
>     int    arg1 = atoi(argv[1]);
>     int    arg2 = atoi(argv[2]);
>     int    result;
>
>     if (arg2 == 0)
>         ereport("division by zero");
>
>     result = arg1 / arg2;
>
>     printf("%d\n", result);
>
>     return 0;
> }

> cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c
> ./a.out 1 0

> I would not be surprised at all if it's compile-switch dependent; these
> look to be the switches Martin tested with.

So strangely, when I first ran this test case I recall being able to
reproduce the SIGFPE; but now going back to it I'm getting the correct
"division by zero" output.

But postgresql still fails to build with the same errors as before.

FWIW, the first test suite failure involving floor() has been resolved now
in the glibc package in unstable.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
vorlon@debian.org                                   http://www.debian.org/

Re: Test suite fails on alpha architecture

От
Martin Pitt
Дата:
Hi,

Tom Lane [2007-11-07 13:49 -0500]:
> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine.  I think that this
> must be a compiler bug.  The first example in his diffs is just
> "select 1/0", which executes this code:
>
>     int32        arg1 = PG_GETARG_INT32(0);
>     int32        arg2 = PG_GETARG_INT32(1);
>     int32        result;
>
>     if (arg2 == 0)
>         ereport(ERROR,
>                 (errcode(ERRCODE_DIVISION_BY_ZERO),
>                  errmsg("division by zero")));
>
>     result = arg1 / arg2;
>
> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken.  Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp).  Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.

I tried this on a Debian Alpha porter box (thanks, Steve, for pointing
me at it) with Debian's gcc 4.2.2. Latest sid indeed still has this
bug (the floor() one is confirmed fixed), not only on Alpha, but also
on sparc.

Since the simple test case did not reproduce the error, I tried to
make a more sophisticated one which resembles more closely what
PostgreSQL does (sigsetjmp/siglongjmp instead of exit(), some macros,
etc.). Unfortunately in vain, since the test case still works
perfectly with both no compiler options and also the ones used for
PostgreSQL. I attach it here nevertheless just in case someone has
more luck than me.

So I tried to approach it from the other side: Building postgresql
with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I
get above bug.

So I guess I'll build with -O1 for the time being on sparc and alpha
to get correct binaries until this is sorted out. Any idea what else I
could try?

Thanks,

Martin

--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Вложения

Re: Test suite fails on alpha architecture

От
Martin Pitt
Дата:
Martin Pitt [2007-12-04 23:43 +0100]:
> So I tried to approach it from the other side: Building postgresql
> with CFLAGS=3D"-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I
> get above bug.

Just FAOD, building with gcc 4.1 and -O2 works fine. I guess this
sufficiently proves that this is a gcc 4.2 bug.

Martin
--=20
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org