Re: Why is infinite_recurse test suddenly failing?
От | Mark Wong |
---|---|
Тема | Re: Why is infinite_recurse test suddenly failing? |
Дата | |
Msg-id | 20190514145901.GA10216@2ndQuadrant.com обсуждение исходный текст |
Ответ на | Re: Why is infinite_recurse test suddenly failing? (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Fri, May 10, 2019 at 11:27:07AM -0700, Andres Freund wrote: > Hi, > > On 2019-05-10 11:38:57 -0400, Tom Lane wrote: > > Core was generated by `postgres: debian regression [local] SELECT '. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748 > > 2748 malloc.c: No such file or directory. > > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748 > > #1 0x00003fff915bedc8 in _int_malloc (av=0x3fff916e0d28 <main_arena>, bytes=8192) at malloc.c:3865 > > #2 0x00003fff915c1064 in __GI___libc_malloc (bytes=8192) at malloc.c:2928 > > #3 0x00000000106acfd8 in AllocSetContextCreateInternal (parent=0x1000babdad0, name=0x1085508c "inline_function", minContextSize=<optimizedout>, initBlockSize=<optimized out>, maxBlockSize=8388608) at aset.c:477 > > #4 0x00000000103d5e00 in inline_function (funcid=20170, result_type=<optimized out>, result_collid=<optimized out>,input_collid=<optimized out>, funcvariadic=<optimized out>, func_tuple=<optimized out>, context=0x3fffe3da15d0, args=<optimizedout>) at clauses.c:4459 > > #5 simplify_function (funcid=<optimized out>, result_type=<optimized out>, result_typmod=<optimized out>, result_collid=<optimizedout>, input_collid=<optimized out>, args_p=<optimized out>, funcvariadic=<optimized out>, process_args=<optimizedout>, allow_non_const=<optimized out>, context=<optimized out>) at clauses.c:4040 > > #6 0x00000000103d2e74 in eval_const_expressions_mutator (node=0x1000babe968, context=0x3fffe3da15d0) at clauses.c:2474 > > #7 0x00000000103511bc in expression_tree_mutator (node=<optimized out>, mutator=0x103d2b10 <eval_const_expressions_mutator>,context=0x3fffe3da15d0) at nodeFuncs.c:2893 > > > > So that lets out any theory that somehow we're getting into a weird > > control path that misses calling check_stack_depth; > > expression_tree_mutator does so for one, and it was called just nine > > stack frames down from the crash. > > Right. There's plenty places checking it... > > > > I am wondering if, somehow, the stack depth limit seen by the postmaster > > sometimes doesn't apply to its children. That would be pretty wacko > > kernel behavior, especially if it's only intermittently true. > > But we're running out of other explanations. > > I wonder if this is a SIGSEGV that actually signals an OOM > situation. Linux, if it can't actually extend the stack on-demand due to > OOM, sends a SIGSEGV. The signal has that information, but > unfortunately the buildfarm code doesn't print it. p $_siginfo would > show us some of that... > > Mark, how tight is the memory on that machine? There's about 2GB allocated: debian@postgresql-debian:~$ cat /proc/meminfo MemTotal: 2080704 kB MemFree: 1344768 kB MemAvailable: 1824192 kB At the moment it looks like plenty. :) Maybe I should set something up to monitor these things. > Does dmesg have any other > information (often segfaults are logged by the kernel with the code > IIRC). It's been up for about 49 days: debian@postgresql-debian:~$ uptime 14:54:30 up 49 days, 14:59, 3 users, load average: 0.00, 0.34, 1.04 I see one line from dmesg that is related to postgres: [3939350.616849] postgres[17057]: bad frame in setup_rt_frame: 00003fffe3d9fe00 nip 00003fff915bdba0 lr 00003fff915bde9c But only that one time in 49 days up. Otherwise I see a half dozen hung_task_timeout_secs messages around jdb2 and dhclient. Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
В списке pgsql-hackers по дате отправления: