Re: Why is infinite_recurse test suddenly failing?

Поиск
Список
Период
Сортировка
От Mark Wong
Тема Re: Why is infinite_recurse test suddenly failing?
Дата
Msg-id 20190514145901.GA10216@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: Why is infinite_recurse test suddenly failing?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Fri, May 10, 2019 at 11:27:07AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-05-10 11:38:57 -0400, Tom Lane wrote:
> > Core was generated by `postgres: debian regression [local] SELECT                                     '.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > 2748    malloc.c: No such file or directory.
> > #0  sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > #1  0x00003fff915bedc8 in _int_malloc (av=0x3fff916e0d28 <main_arena>, bytes=8192) at malloc.c:3865
> > #2  0x00003fff915c1064 in __GI___libc_malloc (bytes=8192) at malloc.c:2928
> > #3  0x00000000106acfd8 in AllocSetContextCreateInternal (parent=0x1000babdad0, name=0x1085508c "inline_function",
minContextSize=<optimizedout>, initBlockSize=<optimized out>, maxBlockSize=8388608) at aset.c:477
 
> > #4  0x00000000103d5e00 in inline_function (funcid=20170, result_type=<optimized out>, result_collid=<optimized
out>,input_collid=<optimized out>, funcvariadic=<optimized out>, func_tuple=<optimized out>, context=0x3fffe3da15d0,
args=<optimizedout>) at clauses.c:4459
 
> > #5  simplify_function (funcid=<optimized out>, result_type=<optimized out>, result_typmod=<optimized out>,
result_collid=<optimizedout>, input_collid=<optimized out>, args_p=<optimized out>, funcvariadic=<optimized out>,
process_args=<optimizedout>, allow_non_const=<optimized out>, context=<optimized out>) at clauses.c:4040
 
> > #6  0x00000000103d2e74 in eval_const_expressions_mutator (node=0x1000babe968, context=0x3fffe3da15d0) at
clauses.c:2474
> > #7  0x00000000103511bc in expression_tree_mutator (node=<optimized out>, mutator=0x103d2b10
<eval_const_expressions_mutator>,context=0x3fffe3da15d0) at nodeFuncs.c:2893
 
> 
> 
> > So that lets out any theory that somehow we're getting into a weird
> > control path that misses calling check_stack_depth;
> > expression_tree_mutator does so for one, and it was called just nine
> > stack frames down from the crash.
> 
> Right. There's plenty places checking it...
> 
> 
> > I am wondering if, somehow, the stack depth limit seen by the postmaster
> > sometimes doesn't apply to its children.  That would be pretty wacko
> > kernel behavior, especially if it's only intermittently true.
> > But we're running out of other explanations.
> 
> I wonder if this is a SIGSEGV that actually signals an OOM
> situation. Linux, if it can't actually extend the stack on-demand due to
> OOM, sends a SIGSEGV.  The signal has that information, but
> unfortunately the buildfarm code doesn't print it.  p $_siginfo would
> show us some of that...
> 
> Mark, how tight is the memory on that machine?

There's about 2GB allocated:

debian@postgresql-debian:~$ cat /proc/meminfo
MemTotal:        2080704 kB
MemFree:         1344768 kB
MemAvailable:    1824192 kB


At the moment it looks like plenty. :)  Maybe I should set something up
to monitor these things.

> Does dmesg have any other
> information (often segfaults are logged by the kernel with the code
> IIRC).

It's been up for about 49 days:

debian@postgresql-debian:~$ uptime
 14:54:30 up 49 days, 14:59,  3 users,  load average: 0.00, 0.34, 1.04


I see one line from dmesg that is related to postgres:

[3939350.616849] postgres[17057]: bad frame in setup_rt_frame: 00003fffe3d9fe00 nip 00003fff915bdba0 lr
00003fff915bde9c


But only that one time in 49 days up.  Otherwise I see a half dozen
hung_task_timeout_secs messages around jdb2 and dhclient.

Regards,
Mark

-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Why is infinite_recurse test suddenly failing?
Следующее
От: Mark Wong
Дата:
Сообщение: Re: Why is infinite_recurse test suddenly failing?