Обсуждение: recovery is stuck when children are not processing SIGQUIT from previous crash
recovery is stuck when children are not processing SIGQUIT from previous crash
От
Peter Eisentraut
Дата:
I have observed the following situation a few times now (weeks or months apart), most recently with 8.3.7. Some postgres child process crashes. The postmaster notices and sends SIGQUIT to all other children. Once all other children have exited, it would enter recovery. But for some reason, some children are not processing the SIGQUIT signal and are basically just stuck. That means the whole database system is then stuck and won't continue without manual intervention. If I go in manually and SIGKILL the offending processes, everything proceeds normally, recovery finishes, and the system is up again. I haven't had the chance yet to analyze why the SIGQUIT signals are getting stuck. Be that as it may, it appears there are no provisions for this case. I couldn't find any documentation or previous reports on this sort of thing. One might imagine a feature where the postmaster resorts to throwing SIGKILLs around after a while, similar to how init scripts are sometimes set up. But perhaps manual intervention is the way to go. Comments?
Peter Eisentraut <peter_e@gmx.net> writes: > I have observed the following situation a few times now (weeks or months > apart), most recently with 8.3.7. Some postgres child process crashes. > The postmaster notices and sends SIGQUIT to all other children. Once > all other children have exited, it would enter recovery. But for some > reason, some children are not processing the SIGQUIT signal and are > basically just stuck. That means the whole database system is then > stuck and won't continue without manual intervention. If I go in > manually and SIGKILL the offending processes, everything proceeds > normally, recovery finishes, and the system is up again. We need some investigation into why that is happening. > I haven't had the chance yet to analyze why the SIGQUIT signals are > getting stuck. Be that as it may, it appears there are no provisions > for this case. I couldn't find any documentation or previous reports on > this sort of thing. One might imagine a feature where the postmaster > resorts to throwing SIGKILLs around after a while, similar to how init > scripts are sometimes set up. I'd prefer not to go there, at least not without a demonstration that this will solve a bug that's unsolvable otherwise. If a child is really stuck in a state that doesn't accept SIGQUIT, it probably won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe we just have some errant code that is blocking SIGQUIT; but that's a garden variety bug IMO, not something that needs major new postmaster logic to work around. regards, tom lane
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Peter Eisentraut
Дата:
On Wed, 2009-09-23 at 10:04 -0400, Tom Lane wrote: > I'd prefer not to go there, at least not without a demonstration that > this will solve a bug that's unsolvable otherwise. If a child is > really stuck in a state that doesn't accept SIGQUIT, it probably > won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe > we just have some errant code that is blocking SIGQUIT; but that's > a garden variety bug IMO, not something that needs major new postmaster > logic to work around. strace on the backend processes all showed them waiting at futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL Notably, the first argument was the same for all of them. I gather that a futex is a Linux kernel thing, which is probably then used by glibc to implement some pthreads stuff. Anyone know more? But yes, using SIGKILL on these processes works without problem.
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Alvaro Herrera
Дата:
Peter Eisentraut wrote: > strace on the backend processes all showed them waiting at > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > > Notably, the first argument was the same for all of them. > > I gather that a futex is a Linux kernel thing, which is probably then > used by glibc to implement some pthreads stuff. Anyone know more? Maybe a backtrace from GDB would tell us more. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Peter Eisentraut <peter_e@gmx.net> writes: > strace on the backend processes all showed them waiting at > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > Notably, the first argument was the same for all of them. Probably means they are blocked on semaphores. Stack traces would be much more informative ... regards, tom lane
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Peter Eisentraut
Дата:
On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > strace on the backend processes all showed them waiting at > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > > Notably, the first argument was the same for all of them. > > Probably means they are blocked on semaphores. Stack traces would > be much more informative ... Got one now: #0 0x00007f65951eaf8e in ?? () from /lib/libc.so.6 #1 0x00007f65951dc218 in ?? () from /lib/libc.so.6 #2 0x00007f65951dbcdd in __vsyslog_chk () from /lib/libc.so.6 #3 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6 #4 0x00000000006694bd in EmitErrorReport () at elog.c:1404 #5 0x0000000000669935 in errfinish (dummy=-1790575472) at elog.c:415 #6 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized out>) at postgres.c:2502 #7 <signal handler called> #8 0x00007f65951e0513 in send () from /lib/libc.so.6 #9 0x00007f65951dbeed in __vsyslog_chk () from /lib/libc.so.6 #10 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6 #11 0x00000000006694bd in EmitErrorReport () at elog.c:1404 #12 0x0000000000669935 in errfinish (dummy=3) at elog.c:415 #13 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized out>) at postgres.c:2502 #14 <signal handler called> #15 0x00007f65951e0303 in recv () from /lib/libc.so.6 #16 0x00000000005486a8 in secure_read (port=0x24a76f0, ptr=0x9ac680, len=8192) at be-secure.c:319 #17 0x000000000054f3d0 in pq_recvbuf () at pqcomm.c:754 #18 0x000000000054f817 in pq_getbyte () at pqcomm.c:795 #19 0x00000000005c4d10 in PostgresMain (argc=4, argv=<value optimized out>, username=0x2478728 "xyz") at postgres.c:317 #20 0x000000000059938d in ServerLoop () at postmaster.c:3218 #21 0x000000000059a0cf in PostmasterMain (argc=5, argv=0x24731d0) at postmaster.c:1031 #22 0x0000000000551245 in main (argc=5, argv=<value optimized out>) at main.c:188 Looks like a race condition or lockup in the syslog code.
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Marko Kreen
Дата:
On 11/12/09, Peter Eisentraut <peter_e@gmx.net> wrote: > On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > > strace on the backend processes all showed them waiting at > > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > > > Notably, the first argument was the same for all of them. > > > > Probably means they are blocked on semaphores. Stack traces would > > be much more informative ... > Looks like a race condition or lockup in the syslog code. AFAICS syslog() is not safe to use in signal handler: http://www.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 -- marko
Peter Eisentraut <peter_e@gmx.net> writes: >>> strace on the backend processes all showed them waiting at >>> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL >>> Notably, the first argument was the same for all of them. > Looks like a race condition or lockup in the syslog code. Hm, why are there two <signal handler> calls in the stack? The only thing I can think of is that we sent SIGQUIT twice. That's probably bad --- is there any obvious path through the postmaster that would do that? The other thought is that quickdie should block signals before starting to do anything. regards, tom lane
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The other thought is that quickdie should block signals before > starting to do anything. There would still be possibility of recursive syslog() calls. Shouldn't we fix that too? I'm not sure how exactly. If the recursive elog() must stay, then perhaps simple 'volatile int' around syslog() ? -- marko
Marko Kreen <markokr@gmail.com> writes: > On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The other thought is that quickdie should block signals before >> starting to do anything. > There would still be possibility of recursive syslog() calls. > Shouldn't we fix that too? That's what the signal block would do. regards, tom lane
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: > > On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> The other thought is that quickdie should block signals before > >> starting to do anything. > > > There would still be possibility of recursive syslog() calls. > > Shouldn't we fix that too? > > > That's what the signal block would do. usual elog syslog <signal> quickdie block signals syslog You talked about blocking in quickdie(), but you'd need to block in elog(). -- marko
Marko Kreen <markokr@gmail.com> writes: > You talked about blocking in quickdie(), but you'd need > to block in elog(). I'm not really particularly worried about that case. By that logic, we could not use quickdie at all, because any part of the system might be doing something that wouldn't survive being interrupted. In practice the code path isn't sufficiently used or critical enough to be worth trying to make that bulletproof. It does strike me that we might someday add code to the postmaster to SIGKILL processes that fail to exit in a reasonably prompt fashion after SIGQUIT, on the theory that they might be stuck in something like this. But for now, I'm more interested in a one-line fix that will deal with the actually observed case ... regards, tom lane
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Peter Eisentraut
Дата:
On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote: > In practice the code path isn't sufficiently used or critical > enough to be worth trying to make that bulletproof. Well, the subject line is "recovery is stuck". Not critical enough?
Re: recovery is stuck when children are not processing SIGQUIT from previous crash
От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: > > You talked about blocking in quickdie(), but you'd need > > to block in elog(). > > I'm not really particularly worried about that case. By that logic, > we could not use quickdie at all, because any part of the system > might be doing something that wouldn't survive being interrupted. Not really - we'd need to care only about parts that quickdie() (or any other signal handler) wants to use. Which basically means elog() only. OK, full elog() is a beast, but why would SIGQUIT handler need full elog()? How about we export minimal log-writing function and make that signal-safe - that is, drop message if already active. This will excange potential crash/deadlock with lost msg which seems slightly better behaviour. -- marko
Peter Eisentraut <peter_e@gmx.net> writes: > On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote: >> In practice the code path isn't sufficiently used or critical >> enough to be worth trying to make that bulletproof. > Well, the subject line is "recovery is stuck". Not critical enough? The particular case looks like it could be solved by disabling interrupts at the start of quickdie(). My point is that doing more than that is going to involve a large amount of work for small amount of return. regards, tom lane