Обсуждение: Coping with backend crash in libpq

Поиск
Список
Период
Сортировка

Coping with backend crash in libpq

От
Tom Lane
Дата:
I've just noticed that libpq doesn't cope very gracefully if the backend
exits when not in the middle of a query (ie, because the postmaster told
it to quit after some other BE crashed).  The behavior in psql, for
example, is that the next time you issue a query, psql just exits
without printing anything at all.  This is Not Friendly, especially
considering that the BE sent a nice little notice message before it quit.

The main problem is that if the next thing you do is to send a new query,
send() sees that the connection has been closed and generates a SIGPIPE
signal.  By default that terminates the frontend process.

We could cure this by having libpq disable SIGPIPE, but we would have
to disable it before each send() and re-enable afterwards to avoid
affecting the behavior of the rest of the frontend application.
Two additional kernel calls per query sounds like a lot of overhead.
(We do actually do this when trying to close the connection, but not
during normal queries.)

Perhaps a better answer is to have PQsendQuery check for fresh input
from the backend before trying to send the query.  This would have two
side effects:
  1. If a NOTICE message has arrived, we could print it.
  2. If EOF is detected, we will reset the connection state to
     CONNECTION_BAD, which PQsendQuery can use to avoid trying to send.

The minimum cost to do this is one kernel call (a select(), which
unfortunately is probably a fairly expensive call) in the normal
case where no new input has arrived.  Another objection is that it's
not 100% bulletproof --- if the backend closes the connection in the
window between select() and send() then you can still get SIGPIPE'd.
The odds of this seem pretty small however.

I'm inclined to go with answer #2, because it seems to have less
of a performance impact, and it will ensure that the backend's polite
"The Postmaster has informed me that some other backend died abnormally
and possibly corrupted shared memory." message gets displayed.  With
approach #1 we'd still have to go through some pushups to get the
notice to come out.

Does anyone have an objection, or a better idea?

            regards, tom lane

Re: [INTERFACES] Coping with backend crash in libpq

От
Karl Denninger
Дата:
On Tue, Jul 28, 1998 at 01:23:35PM -0400, Tom Lane wrote:
> I've just noticed that libpq doesn't cope very gracefully if the backend
> exits when not in the middle of a query (ie, because the postmaster told
> it to quit after some other BE crashed).  The behavior in psql, for
> example, is that the next time you issue a query, psql just exits
> without printing anything at all.  This is Not Friendly, especially
> considering that the BE sent a nice little notice message before it quit.
>
> The main problem is that if the next thing you do is to send a new query,
> send() sees that the connection has been closed and generates a SIGPIPE
> signal.  By default that terminates the frontend process.
>
> We could cure this by having libpq disable SIGPIPE, but we would have
> to disable it before each send() and re-enable afterwards to avoid
> affecting the behavior of the rest of the frontend application.
> Two additional kernel calls per query sounds like a lot of overhead.
> (We do actually do this when trying to close the connection, but not
> during normal queries.)
>
> Perhaps a better answer is to have PQsendQuery check for fresh input
> from the backend before trying to send the query.  This would have two
> side effects:
>   1. If a NOTICE message has arrived, we could print it.
>   2. If EOF is detected, we will reset the connection state to
>      CONNECTION_BAD, which PQsendQuery can use to avoid trying to send.
>
> The minimum cost to do this is one kernel call (a select(), which
> unfortunately is probably a fairly expensive call) in the normal
> case where no new input has arrived.  Another objection is that it's
> not 100% bulletproof --- if the backend closes the connection in the
> window between select() and send() then you can still get SIGPIPE'd.
> The odds of this seem pretty small however.
>
> I'm inclined to go with answer #2, because it seems to have less
> of a performance impact, and it will ensure that the backend's polite
> "The Postmaster has informed me that some other backend died abnormally
> and possibly corrupted shared memory." message gets displayed.  With
> approach #1 we'd still have to go through some pushups to get the
> notice to come out.
>
> Does anyone have an objection, or a better idea?
>
>             regards, tom lane
>

Not really.

I've noticed this kind of problem where the backend will fault in some way,
and after it does so, the library gets "confused".

We have a couple of processes here that are NEVER supposed to exit.  They
open a connection for each transaction, and close it at the end.  If
something happens to the backend where it dies abnormally, these processes
will sometimes get into an odd state in the libpq library where all new
connection attempts fail immediately.

I've yet to find a foolproof coding way around this particular problem.

--
--
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
                 | NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

Re: [HACKERS] Coping with backend crash in libpq

От
Bruce Momjian
Дата:
> I've just noticed that libpq doesn't cope very gracefully if the backend
> exits when not in the middle of a query (ie, because the postmaster told
> it to quit after some other BE crashed).  The behavior in psql, for
> example, is that the next time you issue a query, psql just exits
> without printing anything at all.  This is Not Friendly, especially
> considering that the BE sent a nice little notice message before it quit.

I say, install the signal handler for SIGPIPE on connection startup, but
when you install it, it returns the previous defined action.  If we find
there was a previous defined action, we can re-install theirs, and let
it handle the sigpipe.  If an application later defines it's own
sigpipe, over-riding ours, then they get no error message.

However, I see psql setting the SIGPIPE handler all over the place, so I
don't think that will work there.  How about SIGURG?  Oops, not portable
for unix domain sockets.  Can we send a signal to the process, telling
it the backend has exited.  We have that information now, so why not use
it.  Define a signal handler for SIGURG or SIGUSR1, and have that print
out a message.  If the app redefines that, it will get confused when we
send the signal from the postmaster.  Oops, we can't send signals to the
client because they may be owned by other users.

I am stumped.  Let me think about it.



--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Coping with backend crash in libpq

От
dg@illustra.com (David Gould)
Дата:
> > I've just noticed that libpq doesn't cope very gracefully if the backend
> > exits when not in the middle of a query (ie, because the postmaster told
> > it to quit after some other BE crashed).  The behavior in psql, for
> > example, is that the next time you issue a query, psql just exits
> > without printing anything at all.  This is Not Friendly, especially
> > considering that the BE sent a nice little notice message before it quit.
>
> I say, install the signal handler for SIGPIPE on connection startup, but
> when you install it, it returns the previous defined action.  If we find
> there was a previous defined action, we can re-install theirs, and let
> it handle the sigpipe.  If an application later defines it's own
> sigpipe, over-riding ours, then they get no error message.
>
> However, I see psql setting the SIGPIPE handler all over the place, so I
> don't think that will work there.  How about SIGURG?  Oops, not portable
> for unix domain sockets.  Can we send a signal to the process, telling
> it the backend has exited.  We have that information now, so why not use
> it.  Define a signal handler for SIGURG or SIGUSR1, and have that print
> out a message.  If the app redefines that, it will get confused when we
> send the signal from the postmaster.  Oops, we can't send signals to the
> client because they may be owned by other users.
>
> I am stumped.  Let me think about it.

Hmmm, perhaps fix psql so that it uses SIGPIPE more sensibly. SIGPIPE really
is the right signal to catch here.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - If simplicity worked, the world would be overrun with insects. -

Re: [HACKERS] Coping with backend crash in libpq

От
Tom Lane
Дата:
>> I say, install the signal handler for SIGPIPE on connection startup, but
>> when you install it, it returns the previous defined action.  If we find
>> there was a previous defined action, we can re-install theirs, and let
>> it handle the sigpipe.  If an application later defines it's own
>> sigpipe, over-riding ours, then they get no error message.

This makes our correct functioning dependent on the application's
SIGPIPE handler, which doesn't strike me as a good solution.
Another problem is that if we leave a SIGPIPE handler in place,
it will get called for SIGPIPEs on *other* pipes that the surrounding
application may have open, and we have no way to know what the right
response is.  (AFAIK a SIGPIPE handler can't even portably tell which
connection has SIGPIPEd.)

>> Can we send a signal to the process, telling
>> it the backend has exited.

No.  The client isn't necessarily even on the same machine as the
postmaster/backend.  Even if it were, I don't think we can take over
a signal code that the frontend application might be using for something
else.

> Hmmm, perhaps fix psql so that it uses SIGPIPE more sensibly. SIGPIPE really
> is the right signal to catch here.

Well, psql is also using SIGPIPE sensibly: it's trying to prevent a
hangup when sending data down a pipe to a subprocess that might
terminate early.  The real problem here is that SIGPIPE is designed
wrong.  It ought to be possible to enable/disable SIGPIPE on a per-
file-handle basis ... but AFAIK that's not possible, and it's certainly
not portable even if some Unixes support it.


I'm still in favor of the check-for-input-just-before-send solution.
That does leave a small window where we can fail, but really the failure
should be pretty improbable: you have to assume that some other backend
coredumps while yours is idle, and in a window of microseconds right
before you are going to send a new command to your backend.  I think the
mess-with-catching-SIGPIPE approach is actually more likely to have
problems in practice.  It could interfere with normal functioning of
the frontend app, whereas any possible failure of the other way requires
a previous failure in some backend.  Production backends shouldn't
coredump too darn often, one hopes.

            regards, tom lane