Re: 9.4 beta1 crash on Debian sid/i386

Поиск
Список
Период
Сортировка
От Christoph Berg
Тема Re: 9.4 beta1 crash on Debian sid/i386
Дата
Msg-id 20140518090834.GA18253@msgid.df7cb.de
обсуждение исходный текст
Ответ на Re: 9.4 beta1 crash on Debian sid/i386  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: 9.4 beta1 crash on Debian sid/i386
Список pgsql-hackers
Re: Tom Lane 2014-05-18 <9058.1400385611@sss.pgh.pa.us>
> Christoph Berg <cb@df7cb.de> writes:
> > Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us>
> >> It would appear that something is wrong with check_stack_depth(),
> >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
> 
> > ulimit -s is 8192 (kB); max_stack_depth is 2MB.
> 
> > check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> > and I can see stack_base_ptr - &stack_top_loc grow over repeated
> > invocations of the function (stack_depth itself is optimized out).
> > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".
> 
> Hm.  Did you check that stack_base_ptr is non-NULL?  If it were somehow
> not getting set, that would disable the error report.  But on most
> architectures that would also result in silly values for the pointer
> difference, so I doubt this is the issue.

stack_base_ptr was non-NULL. The stack size started around 3 or 5kB
(don't remember exactly), and grew by something like a few 100B in
each iteration, so this looked sane.

> > Interestingly, the Debian buildd managed to run the testsuite for
> > i386, while I could reproduce the problem on the pgapt build machine
> > and on my notebook, so there must be some system difference. Possibly
> > the reason is these two machines are running a 64bit kernel and I'm
> > building in a 32bit chroot, though that hasn't been a problem before.
> 
> I'm suspicious that something has changed in your build environment,
> because that stack-checking logic hasn't changed since these commits:

It's something in the combination of build and runtime environment. I
can reproduce the problem in the package that the Debian
i386/experimental buildd has compiled, including passing the
regression tests there. Possibly a change in libc there. I'll try to
ask some kernel/libc people if they have an idea. My current bet is on
the gcc hardening flags we are using.

> The lack of reports from the buildfarm or other users is also evidence
> against there being a widespread issue here.

The only animal running Debian testing/unstable I can see is dugong,
which is ia64 - which has been removed from Debian some months ago.
I guess I should look into setting up a new animal for this.

> A different thought: I have heard of environments in which the available
> stack depth is much less than what ulimit would suggest because the ulimit
> space gets split up for multiple per-thread stacks.  That should not be
> happening in a Postgres backend, since we don't do threading, but I'm
> running out of ideas to investigate ...

I've done some builds now and there's no clear picture yet when the
problem is occurring. Still trying...

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: vacuumdb --all --analyze-in-stages - wrong order?
Следующее
От: Andres Freund
Дата:
Сообщение: Re: 9.4 beta1 crash on Debian sid/i386