Re: Possible problem with shm_mq spin lock

Поиск
Список
Период
Сортировка
От Haribabu Kommi
Тема Re: Possible problem with shm_mq spin lock
Дата
Msg-id CAJrrPGcYxYDT8vX_-+uPyc-Cit-je6_MskqFCBDi2j1BWofv2A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Possible problem with shm_mq spin lock  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Possible problem with shm_mq spin lock
Список pgsql-hackers
On Sun, Oct 26, 2014 at 10:17 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2014-10-26 08:52:42 +1100, Haribabu Kommi wrote:
>> I am thinking of a possible problem with shm_mq structure spin lock.
>> This is used for protecting the shm_mq structure.
>>
>> During the processing of any code under the spin lock, if the process
>> receives SIGQUIT signal then it is leading to a dead lock situation.
>>
>> SIGQUIT->proc_exit->shm_mq_detach->try to acquire spin lock. The spin
>> lock is already took by the process.
>>
>> It is very dificult to reproduce the problem as because the code under
>> the lock is very minimal.
>> Please let me know if I missed anything.
>
> I think you missed the following bit in postgres.c:
>
> /*
>  * quickdie() occurs when signalled SIGQUIT by the postmaster.
>  *
>  * Some backend has bought the farm,
>  * so we need to stop what we're doing and exit.
>  */
> void
> quickdie(SIGNAL_ARGS)
> {
> ...
>         /*
>          * We DO NOT want to run proc_exit() callbacks -- we're here because
>          * shared memory may be corrupted, so we don't want to try to clean up our
>          * transaction.  Just nail the windows shut and get out of town.  Now that
>          * there's an atexit callback to prevent third-party code from breaking
>          * things by calling exit() directly, we have to reset the callbacks
>          * explicitly to make this work as intended.
>          */
>         on_exit_reset();

Thanks for the details. I am sorry It is not proc_exit. It is the exit
callback functions
that can cause problem.

The following is the callstack where the problem can happen, if the signal
handler is called after the spin lock took by the worker.

Breakpoint 1, 0x000000000072dd83 in shm_mq_detach ()
(gdb) bt
#0  0x000000000072dd83 in shm_mq_detach ()
#1  0x000000000072e7db in shm_mq_detach_callback ()
#2  0x0000000000726d71 in dsm_detach ()
#3  0x0000000000726c43 in dsm_backend_shutdown ()
#4  0x0000000000727450 in shmem_exit ()
#5  0x00000000007272fc in proc_exit_prepare ()
#6  0x0000000000727501 in atexit_callback ()
#7  0x00000030ff435da2 in exit () from /lib64/libc.so.6
#8  0x00000000006ddaec in bgworker_quickdie ()
#9  <signal handler called>
#10 0x000000000072ce9a in shm_mq_sendv ()


Regards,
Hari Babu
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Possible problem with shm_mq spin lock
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Possible problem with shm_mq spin lock