Обсуждение: Recent failures on buildfarm member hornet

Поиск
Список
Период
Сортировка

Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
hornet has failed its last five runs with

2020-10-05 22:45:42.784 UTC [34734498:40] pg_regress/create_aggregate LOG:  statement: create aggregate
my_percentile_disc(float8ORDER BY anyelement) ( 
      stype = internal,
      sfunc = ordered_set_transition,
      finalfunc = percentile_disc_final,
      finalfunc_extra = true,
      finalfunc_modify = read_write
    );
TRAP: FailedAssertion("variadicArgType != InvalidOid", File: "pg_aggregate.c", Line: 216, PID: 34734498)

After looking at the commits immediately preceding the first failure, and
digging around in the aggregate-related code, it seems like commit
cc99baa43 (Improve pg_list.h's linitial(), lsecond() and co macros)
must've broke it somehow.  The nearest thing that I can see to a theory
is that where DefineAggregate does
        numDirectArgs = intVal(lsecond(args));
it's coming out with the wrong result, leading to a failure of the
numDirectArgs-vs-numArgs sanity check in AggregateCreate.  But how could
that be?  I hesitate to blame the compiler twice in one week.  OTOH, it's
a not-very-mainstream compiler on a not-very-mainstream architecture.

Noah, can you poke into this in a little more detail and try to verify
what is happening?

            regards, tom lane



Re: Recent failures on buildfarm member hornet

От
Noah Misch
Дата:
On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
> hornet has failed its last five runs with
> 
> 2020-10-05 22:45:42.784 UTC [34734498:40] pg_regress/create_aggregate LOG:  statement: create aggregate
my_percentile_disc(float8ORDER BY anyelement) (
 
>       stype = internal,
>       sfunc = ordered_set_transition,
>       finalfunc = percentile_disc_final,
>       finalfunc_extra = true,
>       finalfunc_modify = read_write
>     );
> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: "pg_aggregate.c", Line: 216, PID: 34734498)
> 
> After looking at the commits immediately preceding the first failure, and
> digging around in the aggregate-related code, it seems like commit 
> cc99baa43 (Improve pg_list.h's linitial(), lsecond() and co macros)
> must've broke it somehow.  The nearest thing that I can see to a theory
> is that where DefineAggregate does
>         numDirectArgs = intVal(lsecond(args));
> it's coming out with the wrong result, leading to a failure of the
> numDirectArgs-vs-numArgs sanity check in AggregateCreate.

Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)
Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
suppose the damage arose elsewhere.

> But how could
> that be?  I hesitate to blame the compiler twice in one week.  OTOH, it's
> a not-very-mainstream compiler on a not-very-mainstream architecture.

A compiler bug is plausible.  The compiler is eight years old, and we're not
seeing the problem on 32-bit (mandrill) or on 2019-vintage xlc (hoverfly).
Shall I make this animal use -O0 on v14+?



Re: Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
>> hornet has failed its last five runs with
>> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: "pg_aggregate.c", Line: 216, PID: 34734498)

> Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)

OK, that's pretty unsurprising.

> Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
> suppose the damage arose elsewhere.

Now that *is* surprising.  Could you poke a little further to determine
which module is getting miscompiled?  (gram.o seems like the next most
likely bet.)

            regards, tom lane



Re: Recent failures on buildfarm member hornet

От
Noah Misch
Дата:
On Tue, Oct 06, 2020 at 09:56:49PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
> >> hornet has failed its last five runs with
> >> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: "pg_aggregate.c", Line: 216, PID: 34734498)
> 
> > Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)
> 
> OK, that's pretty unsurprising.
> 
> > Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
> > suppose the damage arose elsewhere.
> 
> Now that *is* surprising.  Could you poke a little further to determine
> which module is getting miscompiled?  (gram.o seems like the next most
> likely bet.)

gram.o was it.  (The way I rebuilt gram.o also rebuilt parser.o and scan.o,
but those seem far less likely.)



Re: Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Oct 06, 2020 at 09:56:49PM -0400, Tom Lane wrote:
>> Now that *is* surprising.  Could you poke a little further to determine
>> which module is getting miscompiled?  (gram.o seems like the next most
>> likely bet.)

> gram.o was it.  (The way I rebuilt gram.o also rebuilt parser.o and scan.o,
> but those seem far less likely.)

This suggests that the problem is misoptimization of gram.y's
makeOrderedSetArgs:

    /* don't merge into the next line, as list_concat changes directargs */
    ndirectargs = list_length(directargs);

    return list_make2(list_concat(directargs, orderedargs),
                      makeInteger(ndirectargs));

I think that if the compiler did what the comment says not to, it'd match
this symptom.  However, I'm baffled as to why our recent pg_list.h changes
would've affected that.

            regards, tom lane



Re: Recent failures on buildfarm member hornet

От
David Rowley
Дата:
Thanks for investigating this, Noah.

On Thu, 8 Oct 2020 at 06:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Noah Misch <noah@leadboat.com> writes:
> > gram.o was it.  (The way I rebuilt gram.o also rebuilt parser.o and scan.o,
> > but those seem far less likely.)
>
> This suggests that the problem is misoptimization of gram.y's
> makeOrderedSetArgs:

It would be interesting to see gram.s from both cc99baa4~1 and cc99baa4.

David



Re: Recent failures on buildfarm member hornet

От
Noah Misch
Дата:
On Thu, Oct 08, 2020 at 09:15:12AM +1300, David Rowley wrote:
> On Thu, 8 Oct 2020 at 06:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > gram.o was it.  (The way I rebuilt gram.o also rebuilt parser.o and scan.o,
> > > but those seem far less likely.)
> >
> > This suggests that the problem is misoptimization of gram.y's
> > makeOrderedSetArgs:
> 
> It would be interesting to see gram.s from both cc99baa4~1 and cc99baa4.

Attached.  Both generated like this:

  (cd src/backend/parser && xlc_r -D_LARGE_FILES=1 -qnoansialias -g -O2 -S -qmaxmem=33554432 -I. -I.
-I../../../src/include-c gram.c; )
 

(Again, the compiler is eight years old, and we're not seeing the problem on
32-bit (mandrill) or on 2019-vintage xlc (hoverfly).  As soon as the situation
no longer piques your curiosity, I'm happy to make hoverfly use -O0 on v14+.)

Вложения

Re: Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Thu, Oct 08, 2020 at 09:15:12AM +1300, David Rowley wrote:
>> It would be interesting to see gram.s from both cc99baa4~1 and cc99baa4.

> Attached.  Both generated like this:

Hm.  I'm too lazy to go bone up on PPC64 ABI conventions, but this does
look suspiciously like the compiler is doing what I feared:

GOOD:

    lwa        r31,4(r27)            # fetching list_length(directargs) ?
    .line    16295
    ori        r3,r27,0x0000
    bl         .list_concat{PR}
    ori        r0,r0,0x0000
    std        r3,112(SP)
    .line    16295
    extsw      r3,r31        # ... and passing it to makeInteger
    bl         .makeInteger{PR}
    ori        r0,r0,0x0000

BAD:

    ori        r3,r31,0x0000
    bl         .list_concat{PR}
    ori        r0,r0,0x0000
    std        r3,112(SP)
    .line    16288
    lwa        r3,4(r31)            # fetching list_length(directargs) ?
    bl         .makeInteger{PR}
    ori        r0,r0,0x0000

(I'm confused about why the line numbers don't match up, since cc99baa4
did not touch gram.y.  But whatever.)

I'm tempted to propose the attached small code rearrangement, which
might dissuade the compiler from thinking it can get away with this.
While I concur with your point that an old xlc version might not be
that exciting, there could be other compilers doing the same thing
in the future.

            regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d101d8171..480d168346 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -16291,7 +16291,7 @@ makeOrderedSetArgs(List *directargs, List *orderedargs,
                    core_yyscan_t yyscanner)
 {
     FunctionParameter *lastd = (FunctionParameter *) llast(directargs);
-    int            ndirectargs;
+    Value       *ndirectargs;

     /* No restriction unless last direct arg is VARIADIC */
     if (lastd->mode == FUNC_PARAM_VARIADIC)
@@ -16315,10 +16315,10 @@ makeOrderedSetArgs(List *directargs, List *orderedargs,
     }

     /* don't merge into the next line, as list_concat changes directargs */
-    ndirectargs = list_length(directargs);
+    ndirectargs = makeInteger(list_length(directargs));

     return list_make2(list_concat(directargs, orderedargs),
-                      makeInteger(ndirectargs));
+                      ndirectargs);
 }

 /* insertSelectOptions()

Re: Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
I wrote:
> I'm tempted to propose the attached small code rearrangement, which
> might dissuade the compiler from thinking it can get away with this.
> While I concur with your point that an old xlc version might not be
> that exciting, there could be other compilers doing the same thing
> in the future.

After thinking about it a bit more, I'm not even convinced that what
xlc seems to be doing is illegal per C spec.  There are no sequence
points within

    return list_make2(list_concat(directargs, orderedargs),
                      makeInteger(ndirectargs));

and therefore there's an argument to be made that the compiler
doesn't have to care whether any side-effects of list_concat() occur
before or after the evaluation of makeInteger(ndirectargs).
If the potential side-effects of list_concat() can be disregarded
until the end of this statement, then the code change is perfectly legal.

Maybe some very careful language-lawyering could prove differently,
but it's not as open-and-shut as one could wish.  So now I'm thinking
that we need this patch anyway, xlc or not.

            regards, tom lane



Re: Recent failures on buildfarm member hornet

От
Noah Misch
Дата:
On Wed, Oct 07, 2020 at 06:03:16PM -0400, Tom Lane wrote:
> I wrote:
> > I'm tempted to propose the attached small code rearrangement, which
> > might dissuade the compiler from thinking it can get away with this.

That patch does get hornet's -O2 build to again pass "make check".  It doesn't
harm the code, so we may as well use it.

> > While I concur with your point that an old xlc version might not be
> > that exciting, there could be other compilers doing the same thing
> > in the future.
> 
> After thinking about it a bit more, I'm not even convinced that what
> xlc seems to be doing is illegal per C spec.  There are no sequence
> points within
> 
>     return list_make2(list_concat(directargs, orderedargs),
>                       makeInteger(ndirectargs));

There is, however, a sequence point between list_length(directargs) and
list_concat(), and the problem arises because xlc reorders those two.  It's
true that makeInteger() could run before or after list_concat(), but that
alone would not have been a problem.



Re: Recent failures on buildfarm member hornet

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Wed, Oct 07, 2020 at 06:03:16PM -0400, Tom Lane wrote:
>> After thinking about it a bit more, I'm not even convinced that what
>> xlc seems to be doing is illegal per C spec.  There are no sequence
>> points within
>> 
>>     return list_make2(list_concat(directargs, orderedargs),
>>                       makeInteger(ndirectargs));

> There is, however, a sequence point between list_length(directargs) and
> list_concat(), and the problem arises because xlc reorders those two.  It's
> true that makeInteger() could run before or after list_concat(), but that
> alone would not have been a problem.

Yeah, that is the theory on which the existing code is built,
specifically that the list_length fetch must occur before list_concat
runs.  What I am wondering about is a more aggressive interpretation of
"sequence point", namely that the compiler is free to disregard exactly
when list_concat's side-effects occur between this statement's sequence
points.  I'm not sure that the C spec allows that interpretation, but
I'm not sure it doesn't, either.

            regards, tom lane



Re: Recent failures on buildfarm member hornet

От
Noah Misch
Дата:
On Wed, Oct 07, 2020 at 06:22:04PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Oct 07, 2020 at 06:03:16PM -0400, Tom Lane wrote:
> >> After thinking about it a bit more, I'm not even convinced that what
> >> xlc seems to be doing is illegal per C spec.  There are no sequence
> >> points within
> >> 
> >>     return list_make2(list_concat(directargs, orderedargs),
> >>                       makeInteger(ndirectargs));
> 
> > There is, however, a sequence point between list_length(directargs) and
> > list_concat(), and the problem arises because xlc reorders those two.  It's
> > true that makeInteger() could run before or after list_concat(), but that
> > alone would not have been a problem.
> 
> Yeah, that is the theory on which the existing code is built,
> specifically that the list_length fetch must occur before list_concat
> runs.  What I am wondering about is a more aggressive interpretation of
> "sequence point", namely that the compiler is free to disregard exactly
> when list_concat's side-effects occur between this statement's sequence
> points.  I'm not sure that the C spec allows that interpretation, but
> I'm not sure it doesn't, either.

C does not allow that, because the list_length() happens in a different "full
expression" (different statement):

    ndirectargs = list_length(directargs);/* noah adds: a sequence point is here */

    return list_make2(list_concat(directargs, orderedargs),
                      makeInteger(ndirectargs));

A compiler may implement like this:

    ndirectargs = list_length(directargs);
    a := list_concat(directargs, orderedargs);
    b := makeInteger(ndirectargs);
    return list_make2(a, b);

Or this:

    ndirectargs = list_length(directargs);
    b := makeInteger(ndirectargs);
    a := list_concat(directargs, orderedargs);
    return list_make2(a, b);

But not like this:

    a := list_concat(directargs, orderedargs);
    ndirectargs = list_length(directargs);
    b := makeInteger(ndirectargs);
    return list_make2(a, b);