Обсуждение: Client failure allows backed to continue
As part of the training class I did, some people tested what happens when the client allocates tons of memory to store a result and aborts. What we found was that though elog was properly called: elog(COMMERROR, "pq_recvbuf: recv() failed: %m"); (I think that was the message.) the backend did not exit and kept eating CPU. I think the problem is that the elog code only exits on ERROR, not COMMERROR. Is there some way to fix this? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> As part of the training class I did, some people tested what happens
> when the client allocates tons of memory to store a result and aborts.
> What we found was that though elog was properly called:
> elog(COMMERROR, "pq_recvbuf: recv() failed: %m");
> (I think that was the message.) the backend did not exit and kept
> eating CPU. I think the problem is that the elog code only exits on
> ERROR, not COMMERROR. Is there some way to fix this?
There's been talk of setting the QueryCancel flag after detecting a
client communication failure ... but no one has ever done the legwork
to see if that works nicely, or what downsides it might have.
regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > As part of the training class I did, some people tested what happens > > when the client allocates tons of memory to store a result and aborts. > > > What we found was that though elog was properly called: > > > elog(COMMERROR, "pq_recvbuf: recv() failed: %m"); > > > (I think that was the message.) the backend did not exit and kept > > eating CPU. I think the problem is that the elog code only exits on > > ERROR, not COMMERROR. Is there some way to fix this? > > There's been talk of setting the QueryCancel flag after detecting a > client communication failure ... but no one has ever done the legwork > to see if that works nicely, or what downsides it might have. Why is COMMERROR not doing the longjump like ERROR? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Why is COMMERROR not doing the longjump like ERROR?
Because it's defined to be like LOG.
A more useful reply might be that I'm not sure it's safe to abort in the
client I/O routines.
regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Why is COMMERROR not doing the longjump like ERROR? > > Because it's defined to be like LOG. > > A more useful reply might be that I'm not sure it's safe to abort in the > client I/O routines. Well, if we get an I/O error, I can't imagine why we would continue doing anything --- are any of those recoverable? Do we need a separate error type for I/O messages? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, if we get an I/O error, I can't imagine why we would continue
> doing anything --- are any of those recoverable?
Well, that's what's not clear --- it's hard to tell if a write failure
is a hard error or just transient. If we make like elog(ERROR),
returning to the main loop, and then a read from the client *doesn't*
fail, we'll try to continue ... but we've just screwed the pooch,
because we have not sent a complete message and therefore certainly have
messed up frontend/backend synchronization. I have no idea whether it's
really possible to recover from this situation or not, but that approach
surely won't work.
If you want to take a kamikaze any-comm-error-means-we're-dead approach,
you might think about elog(FATAL). But that tries to send a message to
the client. Instant infinite loop, if the error is hard.
Complaints to the postmaster log, and abort at the next safe place
(*not* partway through message output) seem like the way to go to me.
> Do we need a separate error type for I/O messages?
Uh ... see COMMERROR.
regards, tom lane
Well, setting query_cancel then seems like a logical solution because it will exit at a reasonable point, hopefully. Right now we have statement_timeout and that exits at a give time, but I suppose it doesn't exit while data is transfering, so it may be different. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Well, if we get an I/O error, I can't imagine why we would continue > > doing anything --- are any of those recoverable? > > Well, that's what's not clear --- it's hard to tell if a write failure > is a hard error or just transient. If we make like elog(ERROR), > returning to the main loop, and then a read from the client *doesn't* > fail, we'll try to continue ... but we've just screwed the pooch, > because we have not sent a complete message and therefore certainly have > messed up frontend/backend synchronization. I have no idea whether it's > really possible to recover from this situation or not, but that approach > surely won't work. > > If you want to take a kamikaze any-comm-error-means-we're-dead approach, > you might think about elog(FATAL). But that tries to send a message to > the client. Instant infinite loop, if the error is hard. > > Complaints to the postmaster log, and abort at the next safe place > (*not* partway through message output) seem like the way to go to me. > > > Do we need a separate error type for I/O messages? > > Uh ... see COMMERROR. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073