Re: BUG #9551: Hang in State "authentication" Prevents Vacuum from Freeing Dead Rows

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #9551: Hang in State "authentication" Prevents Vacuum from Freeing Dead Rows
Дата
Msg-id 7087.1394738351@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #9551: Hang in State "authentication" Prevents Vacuum from Freeing Dead Rows  (jmorton@gmail.com)
Ответы Re: BUG #9551: Hang in State "authentication" Prevents Vacuum from Freeing Dead Rows  ("James L. Morton" <jmorton@gmail.com>)
Список pgsql-bugs
jmorton@gmail.com writes:
> (gdb) bt
> #0  0x00000030820e104e in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x00000030820d1bf7 in _L_lock_646 () from /lib64/libc.so.6
> #2  0x00000030820d168a in __vsyslog_chk () from /lib64/libc.so.6
> #3  0x00000030820d1a73 in __syslog_chk () from /lib64/libc.so.6
> #4  0x00000000006dd6f1 in EmitErrorReport ()
> #5  0x00000000006ddaf5 in errfinish ()
> #6  0x0000000000623f6a in ProcessInterrupts ()
> #7  0x000000000062439c in StatementCancelHandler ()
> #8  <signal handler called>
> #9  0x00000030820306f7 in kill () from /lib64/libc.so.6
> #10 0x0000000000616d01 in ?? ()
> #11 0x0000000000616e3f in handle_sig_alarm ()
> #12 <signal handler called>
> #13 0x00000030820d5eb5 in send () from /lib64/libc.so.6
> #14 0x00000030820d1750 in __vsyslog_chk () from /lib64/libc.so.6
> #15 0x00000030820d1a73 in __syslog_chk () from /lib64/libc.so.6
> #16 0x00000000006dd6f1 in EmitErrorReport ()
> #17 0x00000000006ddaf5 in errfinish ()
> #18 0x0000000000583c57 in ?? ()
> #19 0x0000000000583c81 in ?? ()
> #20 0x00000000005858af in ClientAuthentication ()
> #21 0x00000000006e9bb6 in InitPostgres ()
> #22 0x000000000062609d in PostgresMain ()
> #23 0x00000000005e9dfc in ?? ()
> #24 0x00000000005eab81 in PostmasterMain ()
> #25 0x000000000058e38e in main ()

Interesting stack trace.  Evidently, syslog() on your platform has some
internal mutex.  What seems to have happened is that SIGALRM interrupted
an attempt to log a client-authentication-related message, and then the
error message that it attempted to print resulted in recursive entry to
syslog(), causing a hang since the outer invocation already had the mutex.

This pathway would be impossible if the error level were ERROR, because
errfinish() clears ImmediateInterruptOK in that case, which would prevent
the SIGALRM interrupt handler from doing anything much.  A likely bet
though is that the error level in the outer call was FATAL (ie,
authentication failure); or it might've been just a LOG message.

I'm thinking the most robust solution to this is to have errfinish()
do this at entry:

   bool save_ImmediateInterruptOK = ImmediateInterruptOK;
   ImmediateInterruptOK = false;

and then, *if* it decides to return to caller, do this just before
returning:

   ImmediateInterruptOK = save_ImmediateInterruptOK;
   if (save_ImmediateInterruptOK)
       CHECK_FOR_INTERRUPTS();

This should make the error reporting logic safe against interrupts even
if invoked while ImmediateInterruptOK is true; which probably shouldn't
happen during normal backend operation but is definitely possible during
client authentication.

BTW, are you using an unusually short value for authentication_timeout?
This case doesn't seem likely to arise with normal values, since the
timeout isn't expected to occur unless something's totally wedged.

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
Следующее
От:
Дата:
Сообщение: Very slow query in PostgreSQL 9.3.3