Обсуждение: recovery is stuck when children are not processing SIGQUIT from previous crash

Поиск
Список
Период
Сортировка

recovery is stuck when children are not processing SIGQUIT from previous crash

От
Peter Eisentraut
Дата:
I have observed the following situation a few times now (weeks or months
apart), most recently with 8.3.7.  Some postgres child process crashes.
The postmaster notices and sends SIGQUIT to all other children.  Once
all other children have exited, it would enter recovery.  But for some
reason, some children are not processing the SIGQUIT signal and are
basically just stuck.  That means the whole database system is then
stuck and won't continue without manual intervention.  If I go in
manually and SIGKILL the offending processes, everything proceeds
normally, recovery finishes, and the system is up again.

I haven't had the chance yet to analyze why the SIGQUIT signals are
getting stuck.  Be that as it may, it appears there are no provisions
for this case.  I couldn't find any documentation or previous reports on
this sort of thing.  One might imagine a feature where the postmaster
resorts to throwing SIGKILLs around after a while, similar to how init
scripts are sometimes set up.  But perhaps manual intervention is the
way to go.

Comments?


Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> I have observed the following situation a few times now (weeks or months
> apart), most recently with 8.3.7.  Some postgres child process crashes.
> The postmaster notices and sends SIGQUIT to all other children.  Once
> all other children have exited, it would enter recovery.  But for some
> reason, some children are not processing the SIGQUIT signal and are
> basically just stuck.  That means the whole database system is then
> stuck and won't continue without manual intervention.  If I go in
> manually and SIGKILL the offending processes, everything proceeds
> normally, recovery finishes, and the system is up again.

We need some investigation into why that is happening.

> I haven't had the chance yet to analyze why the SIGQUIT signals are
> getting stuck.  Be that as it may, it appears there are no provisions
> for this case.  I couldn't find any documentation or previous reports on
> this sort of thing.  One might imagine a feature where the postmaster
> resorts to throwing SIGKILLs around after a while, similar to how init
> scripts are sometimes set up.

I'd prefer not to go there, at least not without a demonstration that
this will solve a bug that's unsolvable otherwise.  If a child is
really stuck in a state that doesn't accept SIGQUIT, it probably
won't accept SIGKILL either (eg, uninterruptable disk wait).  Or maybe
we just have some errant code that is blocking SIGQUIT; but that's
a garden variety bug IMO, not something that needs major new postmaster
logic to work around.

            regards, tom lane

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Peter Eisentraut
Дата:
On Wed, 2009-09-23 at 10:04 -0400, Tom Lane wrote:
> I'd prefer not to go there, at least not without a demonstration that
> this will solve a bug that's unsolvable otherwise.  If a child is
> really stuck in a state that doesn't accept SIGQUIT, it probably
> won't accept SIGKILL either (eg, uninterruptable disk wait).  Or maybe
> we just have some errant code that is blocking SIGQUIT; but that's
> a garden variety bug IMO, not something that needs major new postmaster
> logic to work around.

strace on the backend processes all showed them waiting at

futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL

Notably, the first argument was the same for all of them.

I gather that a futex is a Linux kernel thing, which is probably then
used by glibc to implement some pthreads stuff.  Anyone know more?

But yes, using SIGKILL on these processes works without problem.


Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Alvaro Herrera
Дата:
Peter Eisentraut wrote:

> strace on the backend processes all showed them waiting at
>
> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
>
> Notably, the first argument was the same for all of them.
>
> I gather that a futex is a Linux kernel thing, which is probably then
> used by glibc to implement some pthreads stuff.  Anyone know more?

Maybe a backtrace from GDB would tell us more.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> strace on the backend processes all showed them waiting at
> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
> Notably, the first argument was the same for all of them.

Probably means they are blocked on semaphores.  Stack traces would
be much more informative ...

            regards, tom lane

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Peter Eisentraut
Дата:
On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > strace on the backend processes all showed them waiting at
> > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
> > Notably, the first argument was the same for all of them.
>
> Probably means they are blocked on semaphores.  Stack traces would
> be much more informative ...

Got one now:

#0  0x00007f65951eaf8e in ?? () from /lib/libc.so.6
#1  0x00007f65951dc218 in ?? () from /lib/libc.so.6
#2  0x00007f65951dbcdd in __vsyslog_chk () from /lib/libc.so.6
#3  0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#4  0x00000000006694bd in EmitErrorReport () at elog.c:1404
#5  0x0000000000669935 in errfinish (dummy=-1790575472) at elog.c:415
#6  0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#7  <signal handler called>
#8  0x00007f65951e0513 in send () from /lib/libc.so.6
#9  0x00007f65951dbeed in __vsyslog_chk () from /lib/libc.so.6
#10 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#11 0x00000000006694bd in EmitErrorReport () at elog.c:1404
#12 0x0000000000669935 in errfinish (dummy=3) at elog.c:415
#13 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#14 <signal handler called>
#15 0x00007f65951e0303 in recv () from /lib/libc.so.6
#16 0x00000000005486a8 in secure_read (port=0x24a76f0, ptr=0x9ac680,
len=8192) at be-secure.c:319
#17 0x000000000054f3d0 in pq_recvbuf () at pqcomm.c:754
#18 0x000000000054f817 in pq_getbyte () at pqcomm.c:795
#19 0x00000000005c4d10 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x2478728 "xyz") at postgres.c:317
#20 0x000000000059938d in ServerLoop () at postmaster.c:3218
#21 0x000000000059a0cf in PostmasterMain (argc=5, argv=0x24731d0) at
postmaster.c:1031
#22 0x0000000000551245 in main (argc=5, argv=<value optimized out>) at
main.c:188

Looks like a race condition or lockup in the syslog code.


Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Marko Kreen
Дата:
On 11/12/09, Peter Eisentraut <peter_e@gmx.net> wrote:
> On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote:
>  > Peter Eisentraut <peter_e@gmx.net> writes:
>  > > strace on the backend processes all showed them waiting at
>  > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
>  > > Notably, the first argument was the same for all of them.
>  >
>  > Probably means they are blocked on semaphores.  Stack traces would
>  > be much more informative ...

>  Looks like a race condition or lockup in the syslog code.

AFAICS syslog() is not safe to use in signal handler:

  http://www.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03

--
marko

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
>>> strace on the backend processes all showed them waiting at
>>> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
>>> Notably, the first argument was the same for all of them.

> Looks like a race condition or lockup in the syslog code.

Hm, why are there two <signal handler> calls in the stack?
The only thing I can think of is that we sent SIGQUIT twice.
That's probably bad --- is there any obvious path through
the postmaster that would do that?

The other thought is that quickdie should block signals before
starting to do anything.

            regards, tom lane

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  The other thought is that quickdie should block signals before
>  starting to do anything.

There would still be possibility of recursive syslog() calls.
Shouldn't we fix that too?

I'm not sure how exactly.  If the recursive elog() must stay, then
perhaps simple 'volatile int' around syslog() ?

--
marko

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Marko Kreen <markokr@gmail.com> writes:
> On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The other thought is that quickdie should block signals before
>> starting to do anything.

> There would still be possibility of recursive syslog() calls.
> Shouldn't we fix that too?

That's what the signal block would do.

            regards, tom lane

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>  > On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  >> The other thought is that quickdie should block signals before
>  >> starting to do anything.
>
>  > There would still be possibility of recursive syslog() calls.
>  > Shouldn't we fix that too?
>
>
> That's what the signal block would do.

usual elog
  syslog
<signal>
 quickdie
     block signals
     syslog

You talked about blocking in quickdie(), but you'd need
to block in elog().

--
marko

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Marko Kreen <markokr@gmail.com> writes:
> You talked about blocking in quickdie(), but you'd need
> to block in elog().

I'm not really particularly worried about that case.  By that logic,
we could not use quickdie at all, because any part of the system
might be doing something that wouldn't survive being interrupted.
In practice the code path isn't sufficiently used or critical
enough to be worth trying to make that bulletproof.

It does strike me that we might someday add code to the postmaster
to SIGKILL processes that fail to exit in a reasonably prompt fashion
after SIGQUIT, on the theory that they might be stuck in something
like this.  But for now, I'm more interested in a one-line fix that
will deal with the actually observed case ...

            regards, tom lane

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Peter Eisentraut
Дата:
On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote:
> In practice the code path isn't sufficiently used or critical
> enough to be worth trying to make that bulletproof.

Well, the subject line is "recovery is stuck".  Not critical enough?


Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Marko Kreen
Дата:
On 11/12/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > You talked about blocking in quickdie(), but you'd need
>  > to block in elog().
>
>  I'm not really particularly worried about that case.  By that logic,
>  we could not use quickdie at all, because any part of the system
>  might be doing something that wouldn't survive being interrupted.

Not really - we'd need to care only about parts that quickdie()
(or any other signal handler) wants to use.  Which basically means
elog() only.

OK, full elog() is a beast, but why would SIGQUIT handler need full
elog()?  How about we export minimal log-writing function and make
that signal-safe - that is, drop message if already active.  This
will excange potential crash/deadlock with lost msg which seems
slightly better behaviour.

--
marko

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote:
>> In practice the code path isn't sufficiently used or critical
>> enough to be worth trying to make that bulletproof.

> Well, the subject line is "recovery is stuck".  Not critical enough?

The particular case looks like it could be solved by disabling
interrupts at the start of quickdie().  My point is that doing more than
that is going to involve a large amount of work for small amount of
return.

            regards, tom lane