Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

Поиск
Список
Период
Сортировка
От Andrew Gierth
Тема Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Дата
Msg-id 875zstb2hr.fsf@news-spur.riddles.org.uk
обсуждение исходный текст
Ответ на Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-admin
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> So I'm going to guess that your bug 236025 is actually an
 Andrew> alignment problem, with the compiler making some assumption
 Andrew> about alignment that we're violating. I'll investigate and see
 Andrew> what I can find.

OK, I have completed my analysis of both reports.

The bottom line is that this is a disagreement between gcc and the
(clang-compiled) system libraries over what the stack alignment should
be; GCC wants and assumes 16 byte alignment, but clang won't provide
that. It's not any kind of bug in PostgreSQL.

For most applications there is no issue because GCC aligns the stack
itself on entry into main(), so the only time it becomes an issue is if
two conditions are met: (1) the application must call into an outside
(non-GCC-compiled) library which then calls _back_ into the application,
AND (2) the subsequent code executes instructions that rely on the stack
alignment for correctness (and not just performance).

PostgreSQL compiled by GCC on i386 without architecture options will not
rely on the alignment of the stack so condition (2) is not met. Only if
you specify an architecture such as -march=pentium3 (which enables SSE)
will any instructions be used which require strict alignment.

It may not be obvious how condition (1) is met, but notice that the
report from Peter has the crash happening in either a background worker
or the checkpointer process; this is significant because those are
spawned from postmaster while in a signal handler, and the signal
handler's stack frame has disturbed the stack alignment (and with the
system libraries compiled with clang and not gcc, no attempt is made to
adjust that).

So the implications for the postgresql port on freebsd/i386 are:

1. If you compile with GCC and no architecture options you should have
no problems on any cpu.

This presumably covers the case of the packaged binaries.

2. If you compile with GCC and any of -msse, -msse2, -march=pentium3 or
later, or any similar flag that enables use of SSE or later (I believe
that no MMX instructions require special alignment), then you will also
need -mstackrealign (or patch the source to add the equivalent attribute
to every signal handler function or other callback, which I don't really
recommend). (Maybe the port should add this option defensively?)

The crash in (freebsd) bug #236025 is explained by the fact that the
user had -msse2 set when compiling with GCC. Peter's crash is explained
by the use of -march=pentium3 when compiling with GCC.

3. If you compile with clang and -msse2 then there should be no stack
alignment issues (since clang doesn't assume the stack is aligned) but
obviously you then can't run the binary on a pre-pentium4 cpu.

-- 
Andrew (irc:RhodiumToad)


В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible