backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
От | MauMau |
---|---|
Тема | backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) |
Дата | |
Msg-id | 20DAEA8949EC4E2289C6E8E58560DEC0@maumau обсуждение исходный текст |
Ответ на | Back-branch update releases coming in a couple weeks (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) |
Список | pgsql-hackers |
From: "Tom Lane" <tgl@sss.pgh.pa.us> > Since we've fixed a couple of relatively nasty bugs recently, the core > committee has determined that it'd be a good idea to push out PG update > releases soon. The current plan is to wrap on Monday Feb 4 for public > announcement Thursday Feb 7. If you're aware of any bug fixes you think > ought to get included, now's the time to get them done ... I've just encountered another serious bug, which I wish to be fixed in the upcoming minor release. I'm using streaming replication with PostgreSQL 9.1.6 on Linux (RHEL6.2, kernel 2.6.32). But this problem should happen regardless of the use of streaming replication. When I ran "pg_ctl stop -mi" against the primary, some applications connected to the primary did not stop. The cause was that the backends was deadlocked in quickdie() with some call stack like the following. I'm sorry to have left the stack trace file on the testing machine, so I'll show you the precise stack trace tomorrow. some lock function malloc() gettext() errhint() quickdie() <signal handler called because of SIGQUIT> free() ... PostgresMain() ... The root cause is that gettext() is called in the signal handler quickdie() via errhint(). As you know, malloc() cannot be called in a signal handler: http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy [Excerpt] On most systems, malloc and free are not reentrant, because they use a static data structure which records what memory blocks are free. As a result, no library functions that allocate or free memory are reentrant. This includes functions that allocate space to store a result. And gettext() calls malloc(), as reported below: http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00056.html I think the solution is the typical one. That is, to just remember the receipt of SIGQUIT by setting a global variable and call siglongjmp() in quickdie(), and perform tasks currently done in quickdie() when sigsetjmp() returns in PostgresMain(). What do think about the solution? Could you include the fix? If it's okay and you want, I'll submit the patch. Regards MauMau
В списке pgsql-hackers по дате отправления: