Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.
Дата
Msg-id 14134.1505572349@sss.pgh.pa.us
обсуждение исходный текст
Ответы [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() duringstartup.  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> Perform only one ReadControlFile() during startup.

This patch or something closely related to it has broken the postmaster's
ability to recover from a backend crash.  For example, after exercising
the backend crash Andreas just reported:

regression=# select from information_schema.user_mapping_options;
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.
!> \q

attempting to reconnect fails, because the postmaster isn't there.
It left a core file behind though, in which I find

Program terminated with signal 11, Segmentation fault.
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
124             AssertArg(MemoryContextIsValid(context));
(gdb) bt
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
#1  pfree (pointer=0x7fdb36fc3f00) at mcxt.c:951
#2  0x0000000000512843 in XLOGShmemInit () at xlog.c:4897
#3  0x0000000000737fd9 in CreateSharedMemoryAndSemaphores (   makePrivate=0 '\000', port=5440) at ipci.c:220
#4  0x00000000006e4a78 in reset_shared () at postmaster.c:2516
#5  PostmasterStateMachine () at postmaster.c:3832
#6  0x00000000006e541d in reaper (postgres_signal_arg=<value optimized out>)   at postmaster.c:3081
#7  <signal handler called>
#8  0x0000003b78ae1603 in __select_nocancel ()   at ../sysdeps/unix/syscall-template.S:82
#9  0x00000000008a432a in pg_usleep (microsec=<value optimized out>)   at pgsleep.c:56
#10 0x00000000006e75d7 in ServerLoop (argc=<value optimized out>,    argv=<value optimized out>) at postmaster.c:1705
#11 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)   at postmaster.c:1364

It's dying at "pfree(localControlFile)".  localControlFile seems to
be pointing at a region of memory that's entirely zeroes; certainly
the data that it just moved into shared memory is all zeroes.
It looks like someone didn't think hard enough about when to reset
ControlFile to null.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gerdan Santos
Дата:
Сообщение: Re: [HACKERS] Variable substitution in psql backtick expansion
Следующее
От: chenhj
Дата:
Сообщение: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files