backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

Поиск
Список
Период
Сортировка
От MauMau
Тема backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Дата
Msg-id 20DAEA8949EC4E2289C6E8E58560DEC0@maumau
обсуждение исходный текст
Ответ на Back-branch update releases coming in a couple weeks  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> Since we've fixed a couple of relatively nasty bugs recently, the core
> committee has determined that it'd be a good idea to push out PG update
> releases soon.  The current plan is to wrap on Monday Feb 4 for public
> announcement Thursday Feb 7.  If you're aware of any bug fixes you think
> ought to get included, now's the time to get them done ...

I've just encountered another serious bug, which I wish to be fixed in the 
upcoming minor release.

I'm using streaming replication with PostgreSQL 9.1.6 on Linux (RHEL6.2, 
kernel 2.6.32).  But this problem should happen regardless of the use of 
streaming replication.

When I ran "pg_ctl stop -mi" against the primary, some applications 
connected to the primary did not stop.  The cause was that the backends was 
deadlocked in quickdie() with some call stack like the following.  I'm sorry 
to have left the stack trace file on the testing machine, so I'll show you 
the precise stack trace tomorrow.

some lock function
malloc()
gettext()
errhint()
quickdie()
<signal handler called because of SIGQUIT>
free()
...
PostgresMain()
...

The root cause is that gettext() is called in the signal handler quickdie() 
via errhint().  As you know, malloc() cannot be called in a signal handler:

http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

[Excerpt]
On most systems, malloc and free are not reentrant, because they use a 
static data structure which records what memory blocks are free. As a 
result, no library functions that allocate or free memory are reentrant. 
This includes functions that allocate space to store a result.


And gettext() calls malloc(), as reported below:

http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00056.html

I think the solution is the typical one.  That is, to just remember the 
receipt of SIGQUIT by setting a global variable and call siglongjmp() in 
quickdie(), and perform tasks currently done in quickdie() when sigsetjmp() 
returns in PostgresMain().

What do think about the solution?  Could you include the fix?  If it's okay 
and you want, I'll submit the patch.

Regards
MauMau




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: autovacuum not prioritising for-wraparound tables
Следующее
От: Zoltán Böszörményi
Дата:
Сообщение: Re: Strange Windows problem, lock_timeout test request