Re: 9.4 beta1 crash on Debian sid/i386

Поиск
Список
Период
Сортировка
От Christoph Berg
Тема Re: 9.4 beta1 crash on Debian sid/i386
Дата
Msg-id 20140519115318.GB7296@msgid.df7cb.de
обсуждение исходный текст
Ответ на Re: 9.4 beta1 crash on Debian sid/i386  (Christoph Berg <cb@df7cb.de>)
Ответы Re: 9.4 beta1 crash on Debian sid/i386
Re: 9.4 beta1 crash on Debian sid/i386
Список pgsql-hackers
Re: To Tom Lane 2014-05-19 <20140519091808.GA7296@msgid.df7cb.de>
> Re: Tom Lane 2014-05-18 <26862.1400449277@sss.pgh.pa.us>
> > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
> > the available stack depth.  I'd classify that as a kernel bug.  I wonder
> > if it's a different manifestation of this issue:
> > https://bugzilla.redhat.com/show_bug.cgi?id=952946
> > 
> > A different line of thought is that if ulimit -s is 8192, why are we
> > not getting 8MB of stack?  But in any case, if we're only going to
> > get 1944kB, getrlimit ought to tell us that.
> 
> The issue looks exactly like what you are writing in that bugzilla
> bug, including the fact that [stack] in /proc/maps gets replaced by
> [heap] once the bus error happens (Comment 11).

I've done some more digging. The problem exists also on plain 32bit
kernels, not only 64bit running a 32bit userland. (Tested on Debian
Wheezy's 3.2.57 kernel.)

The problem seems to be that the address layout puts heap and stack
too close together - there's only about 125MB between the start of
heap and the end of stack. Apparently 9.4 is a bit more memory-hungry
on the heap side when running infinite_recurse() so it SIGBUSses
before it reaches the 2MB max_stack_depth. In 9.3 I can easily see the
same problem with max_stack_depth = '7MB', when at the time of the
crash, the stack is 2797568 bytes as reported by /proc/maps, and in
9.1, the crash happens at 3084288. (Both do catch the problem properly
with max_stack_depth = '2MB' at which point 2105344 bytes stack are
allocated.)

Debian/Ubuntu have been using hardened PostgreSQL builds for years
now, including running the regression tests - apparently we were
always close to a crash, it just had not happened yet.

So there's a few points to consider:
* ASLR leaves only 125MB for brk()-style heap plus stack
* RLIMIT_STACK is treated as an upper limit, not a reservation
* PostgreSQL thinks max_stack_depth=2MB plus check_stack_depth() is safe, instead of having a SIGBUS handler
* PostgreSQL allocates lots of heap using brk() instead of mmap()

If any of that wouldn't hold, the problem wouln't appear.

I'm not sure where to go from here. Getting the kernel (or the libc)
changed seems hard, and that would probably only affect future
distributions anyway. A short-term fix might be to reduce
max_stack_depth for the regression tests, which tests the
functionality, but leaves the problem open for production.
Implementing a SIGBUS/SIGSEGV handler would probably mean that the
whole ouch-lets-restart-on-error logic would become ineffective,
unless we go check with address caused the error and decided if it was
part of the stack or not.

An hack would be to touch some address deep in the stack early at
backend start, so the address space would be reserved for the stack.
Though it seems ugly to do that for all backends, not only that are
actually using much stack. (The cost would be one memory page, which
isn't too much, otoh.)

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: pg_isready --username seems an empty promise
Следующее
От: Andres Freund
Дата:
Сообщение: Re: 9.4 beta1 crash on Debian sid/i386