Обсуждение: A note about recent ecpg buildfarm failures

Поиск
Список
Период
Сортировка

A note about recent ecpg buildfarm failures

От
Tom Lane
Дата:
Since my commits 9e138a401 et al on Saturday, buildfarm members
blobfish, brotula, and wunderpus have been showing core dumps
in the ecpg preprocessor.  This seemed inexplicable given what
the commits changed, and even more so seeing that only HEAD is
failing, while the change was back-patched into all branches.

Mark Wong and I poked into this off-list, and what we find is that
this seems to be a compiler bug.  Those animals are all running
nearly the same version of clang (3.8.x / ppc64le).  Looking into
the assembly code for preproc.y, the crash is occurring at a branch
that is supposed to jump forward exactly 32768 bytes, but according
to gdb's disassembler it's jumping backwards exactly -32768 bytes,
into invalid memory.  It will come as no surprise to hear that the
branch displacement field in PPC conditional branches is 16 bits
wide, so that positive 32768 doesn't fit but negative 32768 does.
Evidently what is happening is that either the compiler or the
assembler is failing to detect the edge-case field overflow and
switch to different coding.  So the apparent dependency on 9e138a401
is because that happened to insert exactly the right number of
instructions in-between to trigger this scenario.  It's pure luck we
didn't trip over it before, although none of those buildfarm animals
have been around for all that long.

Moral: don't use clang 3.8.x on ppc64.  I think Mark is going
to upgrade those animals to some more recent compiler version.

            regards, tom lane


Re: A note about recent ecpg buildfarm failures

От
Robert Haas
Дата:
On Tue, Feb 26, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Since my commits 9e138a401 et al on Saturday, buildfarm members
> blobfish, brotula, and wunderpus have been showing core dumps
> in the ecpg preprocessor.  This seemed inexplicable given what
> the commits changed, and even more so seeing that only HEAD is
> failing, while the change was back-patched into all branches.
>
> Mark Wong and I poked into this off-list, and what we find is that
> this seems to be a compiler bug.  Those animals are all running
> nearly the same version of clang (3.8.x / ppc64le).  Looking into
> the assembly code for preproc.y, the crash is occurring at a branch
> that is supposed to jump forward exactly 32768 bytes, but according
> to gdb's disassembler it's jumping backwards exactly -32768 bytes,
> into invalid memory.  It will come as no surprise to hear that the
> branch displacement field in PPC conditional branches is 16 bits
> wide, so that positive 32768 doesn't fit but negative 32768 does.
> Evidently what is happening is that either the compiler or the
> assembler is failing to detect the edge-case field overflow and
> switch to different coding.  So the apparent dependency on 9e138a401
> is because that happened to insert exactly the right number of
> instructions in-between to trigger this scenario.  It's pure luck we
> didn't trip over it before, although none of those buildfarm animals
> have been around for all that long.
>
> Moral: don't use clang 3.8.x on ppc64.  I think Mark is going
> to upgrade those animals to some more recent compiler version.

Wow, that's some pretty impressive debugging!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: A note about recent ecpg buildfarm failures

От
Mark Wong
Дата:
On Tue, Feb 26, 2019 at 01:25:29PM -0500, Tom Lane wrote:
> Since my commits 9e138a401 et al on Saturday, buildfarm members
> blobfish, brotula, and wunderpus have been showing core dumps
> in the ecpg preprocessor.  This seemed inexplicable given what
> the commits changed, and even more so seeing that only HEAD is
> failing, while the change was back-patched into all branches.
> 
> Mark Wong and I poked into this off-list, and what we find is that
> this seems to be a compiler bug.  Those animals are all running
> nearly the same version of clang (3.8.x / ppc64le).  Looking into
> the assembly code for preproc.y, the crash is occurring at a branch
> that is supposed to jump forward exactly 32768 bytes, but according
> to gdb's disassembler it's jumping backwards exactly -32768 bytes,
> into invalid memory.  It will come as no surprise to hear that the
> branch displacement field in PPC conditional branches is 16 bits
> wide, so that positive 32768 doesn't fit but negative 32768 does.
> Evidently what is happening is that either the compiler or the
> assembler is failing to detect the edge-case field overflow and
> switch to different coding.  So the apparent dependency on 9e138a401
> is because that happened to insert exactly the right number of
> instructions in-between to trigger this scenario.  It's pure luck we
> didn't trip over it before, although none of those buildfarm animals
> have been around for all that long.
> 
> Moral: don't use clang 3.8.x on ppc64.  I think Mark is going
> to upgrade those animals to some more recent compiler version.

I've tried clang 3.9 and 4.0 by hand and they seem to be ok.  These were
the other two readily available versions on Debian stretch.

I'll stop those other clang-3.8 animals...

Regards,
Mark

--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/