Re: [BUGS] signal 11 segfaults with parallel workers

Поиск
Список
Период
Сортировка
От Rick Otten
Тема Re: [BUGS] signal 11 segfaults with parallel workers
Дата
Msg-id CAMAYy4LwudXQ326o-xZdf2WZiWrA8iu8S6FNPxcPtvPN0b1xRw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUGS] signal 11 segfaults with parallel workers  (Rick Otten <rottenwindfish@gmail.com>)
Ответы Re: [BUGS] signal 11 segfaults with parallel workers  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [BUGS] signal 11 segfaults with parallel workers  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Ok, I got a core this time at 23:00 when the database went down.
Here is the basic backtrace:

$  gdb /usr/lib/postgresql/9.6/bin/postgres core
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
Find the GDB manual and other documentation resources online at:
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/9.6/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/32/108810b4ff9528a94d48315dd9333c501fc52d.debug...done.
done.
[New LWP 4294]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worker f'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  MemoryContextAlloc (context=0x0, size=size@entry=1024) at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c:761
761 /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c: No such file or directory.
(gdb) bt
#0  MemoryContextAlloc (context=0x0, size=size@entry=1024) at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c:761
#1  0x0000560b7a518ec4 in SPI_connect () at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/executor/spi.c:102
#2  0x00007fec467b9261 in _PG_init () from /usr/lib/postgresql/9.6/lib/multicorn.so
#3  0x0000560b7a717cf2 in internal_load_library (libname=libname@entry=0x7ff48208dbf8 <error: Cannot access memory at address 0x7ff48208dbf8>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/fmgr/dfmgr.c:276
#4  0x0000560b7a7188c0 in RestoreLibraryState (start_address=0x7ff48208dbf8 <error: Cannot access memory at address 0x7ff48208dbf8>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/fmgr/dfmgr.c:741
#5  0x0000560b7a3ee4f7 in ParallelWorkerMain (main_arg=<optimized out>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/access/transam/parallel.c:1065
#6  0x0000560b7a59ae29 in StartBackgroundWorker () at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/bgworker.c:742
#7  0x0000560b7a5a701b in do_start_bgworker (rw=<optimized out>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:5579
#8  maybe_start_bgworkers () at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:5776
#9  0x0000560b7a5a7cd5 in sigusr1_handler (postgres_signal_arg=<optimized out>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:4973
#10 <signal handler called>
#11 0x00007ff480425573 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#12 0x0000560b7a3858ef in ServerLoop () at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:1679
#13 0x0000560b7a5a9053 in PostmasterMain (argc=1, argv=<optimized out>)
    at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:1323
#14 0x0000560b7a387511 in main (argc=1, argv=0x560b7ba23630) at /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/main/main.c:228
(gdb) 

The query that took it down this time (based on the pid reported in the stacktrace) does indeed spin out a parallel plan, but it is a simple query.  I was surprised to see the multicorn library mentioned in this trace, it has nothing to do with the multicorn FDW installed on the system.

I've run the query several times in the last few minutes and can't get it to generate a core again.



On Sun, Jul 30, 2017 at 5:25 PM, Rick Otten <rottenwindfish@gmail.com> wrote:
Well, I'm not sure how to inspect the temp tablespace other than from the filesystem itself.  I have it configured on its own disk.  Usually the disk space ebbs and flows with query activity.  Since we've been crashing however, it never reclaims the disk that was in use just before the crash.  So our temp space 'floor" keeps getting higher and higher.

At least that is what it has been doing for the past week or two, and what it looked like this morning.  Now that the database has been back up for 8 or 9 hours following this controlled restart, I just went to look at it, and all of the temp space has been reclaimed - for the first time since the crashing started. ... Interesting...


On Sun, Jul 30, 2017 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Rick Otten <rottenwindfish@gmail.com> writes:
> One thing that is bugging me is I think when the database crashes, it
> doesn't clean up the temp_tablespace(s).

Hm, interesting, what do you see in there?

                        regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitionsfail
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [BUGS] signal 11 segfaults with parallel workers