Обсуждение: dubious improvement in new psql

Поиск
Список
Период
Сортировка

dubious improvement in new psql

От
Tom Lane
Дата:
The new psql automatically tries to reconnect if the backend disconnects
unexpectedly.  This feature strikes me as ill-conceived; furthermore
it appears to be buggy.

It's ill-conceived because:
(1) under WAL, following a backend crash the postmaster is going to be
spending a few seconds reinitializing; an immediate reconnect attempt
is almost guaranteed to fail.
(2) if I'm running an SQL script, I think it's extremely foolhardy
to press on with executing the script as though nothing had happened.
A backend crash is not an event to be lightly ignored.

It's buggy because: it doesn't work reliably.  While poking at the
backend's problems with oversize btree index entries, I saw psql claim
it had successfully reconnected, and then go into a catatonic state.
It wouldn't give me a new command prompt (not even with ^C), wouldn't
exit with ^D, and had to be killed from another shell window.

This behavior doesn't seem to happen for every crash, but I'm not
really interested in trying to debug it.  I think the "feature"
ought to be ripped out.
        regards, tom lane


Re: [HACKERS] dubious improvement in new psql

От
Peter Eisentraut
Дата:
On Sat, 25 Dec 1999, Tom Lane wrote:

> The new psql automatically tries to reconnect if the backend disconnects
> unexpectedly.  This feature strikes me as ill-conceived; furthermore
> it appears to be buggy.
> 
> It's ill-conceived because:
> (1) under WAL, following a backend crash the postmaster is going to be
> spending a few seconds reinitializing; an immediate reconnect attempt
> is almost guaranteed to fail.

Good point.

> (2) if I'm running an SQL script, I think it's extremely foolhardy
> to press on with executing the script as though nothing had happened.
> A backend crash is not an event to be lightly ignored.

It only does the reconnect thing if it's used interactively.

I suppose leaving psql in an unconnected state (which does exist) would be
a better solution. I'll investigate the behaviour you observed below after
I get back from my vacation.

> 
> It's buggy because: it doesn't work reliably.  While poking at the
> backend's problems with oversize btree index entries, I saw psql claim
> it had successfully reconnected, and then go into a catatonic state.
> It wouldn't give me a new command prompt (not even with ^C), wouldn't
> exit with ^D, and had to be killed from another shell window.
> 
> This behavior doesn't seem to happen for every crash, but I'm not
> really interested in trying to debug it.  I think the "feature"
> ought to be ripped out.
> 
>             regards, tom lane
> 
> ************
> 
> 

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] dubious improvement in new psql

От
Don Baccus
Дата:
At 11:14 PM 12/28/99 +0100, Peter Eisentraut wrote:
>On Sat, 25 Dec 1999, Tom Lane wrote:
>
>> The new psql automatically tries to reconnect if the backend disconnects
>> unexpectedly.  This feature strikes me as ill-conceived; furthermore
>> it appears to be buggy.
>> 
>> It's ill-conceived because:
>> (1) under WAL, following a backend crash the postmaster is going to be
>> spending a few seconds reinitializing; an immediate reconnect attempt
>> is almost guaranteed to fail.
>
>Good point.
>
>> (2) if I'm running an SQL script, I think it's extremely foolhardy
>> to press on with executing the script as though nothing had happened.
>> A backend crash is not an event to be lightly ignored.
>
>It only does the reconnect thing if it's used interactively.

This raises a question, then.  What should drivers for (say) web
servers that are expected to stay up 24/7 do if reconnecting to a
broken db connection can't be made reliable?

I've currently rewritten the AOLserver driver to do just that, and
it's working fine with 6.5.3.  The AOLserver driver for Oracle most
certainly can reconnect to a broken connection - to tell folks that
this can't be done with the WAL version of Postgres will simply
reinforce those of my friends who laugh at me for trying to use
Postgres instead of simply biting the bullet and buying an Oracle
license...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] dubious improvement in new psql

От
Tom Lane
Дата:
Don Baccus <dhogaza@pacifier.com> writes:
> The AOLserver driver for Oracle most
> certainly can reconnect to a broken connection - to tell folks that
> this can't be done with the WAL version of Postgres

I said no such thing!

You certainly *can* reconnect, although under WAL it will take a delay
(or better, a retry loop).

However, I think reconnection has to be integrated into the
application's logic at a level where you can have some idea of what
needs to be redone after reconnecting.  That's why I objected to having
psql do it.  If psql's only going to do it interactively then I guess
it's safe enough, though.

Question for discussion: when the WAL postmaster is running a database
start or restart, perhaps it should simply delay processing of new
connection requests until the DB is ready, instead of rejecting them
immediately?  That would eliminate the need for retry loops in
applications, and thereby avoid wasted retry processing on both sides.
On the other hand, I can see where an unexpected multi-second delay to
connect might be bad news, too.  Comments?
        regards, tom lane


Re: [HACKERS] dubious improvement in new psql

От
Ed Loehr
Дата:
Tom Lane wrote:

> Question for discussion: when the WAL postmaster is running a database
> start or restart, perhaps it should simply delay processing of new
> connection requests until the DB is ready, instead of rejecting them
> immediately?  That would eliminate the need for retry loops in
> applications, and thereby avoid wasted retry processing on both sides.
> On the other hand, I can see where an unexpected multi-second delay to
> connect might be bad news, too.  Comments?

Suggestion:  Make the delay/reconnect optional with configurable
parameters for how many times to retry, how long to retry, etc.

I have an Apache mod-perl app already doing this reconnect logic, and I'm
very glad my app has control over those parameters.

Cheers,
Ed Loehr





Re: [HACKERS] dubious improvement in new psql

От
Don Baccus
Дата:
At 01:48 PM 1/1/00 -0500, Tom Lane wrote:

>I said no such thing!
>
>You certainly *can* reconnect, although under WAL it will take a delay
>(or better, a retry loop).
>
>However, I think reconnection has to be integrated into the
>application's logic at a level where you can have some idea of what
>needs to be redone after reconnecting.  That's why I objected to having
>psql do it.  If psql's only going to do it interactively then I guess
>it's safe enough, though.

OK, my misunderstanding.  I couldn't understand why psql in interactive
mode should be a problem and took your comments in a more general context.

>
>Question for discussion: when the WAL postmaster is running a database
>start or restart, perhaps it should simply delay processing of new
>connection requests until the DB is ready, instead of rejecting them
>immediately?  That would eliminate the need for retry loops in
>applications, and thereby avoid wasted retry processing on both sides.
>On the other hand, I can see where an unexpected multi-second delay to
>connect might be bad news, too.  Comments?

I've been thinking about this one, actually...

Perhaps letting the caller decide in some manner?  In my driver environment
I'm not really supposed to call sleep or the like and a busy-wait for the
connection(s) to be rebuilt probably isn't the best thing to do, since the
postmaster is going to be hard at work straightening out things with the
WAL.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] dubious improvement in new psql

От
Peter Eisentraut
Дата:
Okay, I looked at the code again and I can't see anything wrong
conceptually. It follows libpq semantics which I remember to have grabbed
from the documentation:
results = PQexec(pset->db, query);
/* do something with result */
if (PQstatus(pset->db) == CONNECTION_BAD){    fputs("The connection to the server was lost. Attempting reset: ",
stderr);   PQreset(pset->db);    if (PQstatus(pset->db) == CONNECTION_BAD)    {        fputs("Failed.\n", stderr);
 PQfinish(pset->db);        PQclear(results);        pset->db = NULL;        return false;    }    else
fputs("Succeeded.\n",stderr);}
 


If you can still reproduce this somehow, I'd like to know where it hangs
and/or what the output was.


On 1999-12-25, Tom Lane mentioned:

> The new psql automatically tries to reconnect if the backend disconnects
> unexpectedly.  This feature strikes me as ill-conceived; furthermore
> it appears to be buggy.
> 
> It's ill-conceived because:
> (1) under WAL, following a backend crash the postmaster is going to be
> spending a few seconds reinitializing; an immediate reconnect attempt
> is almost guaranteed to fail.

Then rip out PQreset. It's not psql's job to make these kinds of
decisions.

> (2) if I'm running an SQL script, I think it's extremely foolhardy
> to press on with executing the script as though nothing had happened.
> A backend crash is not an event to be lightly ignored.

Then rip out PQreset. To quote from the docs:

"This function will close the connection to the backend and attempt to
reestablish a new connection to the same postmaster, using all the same
parameters previously used. This may be useful for error recovery if a
working connection is lost."

I don't know all the possible ways a backend can go down, but one of them
might be a short network failure. In that case attempting a reset might be
the reasonable thing to do. Again, this should be addressed at the libpq
level.

> 
> It's buggy because: it doesn't work reliably.  While poking at the
> backend's problems with oversize btree index entries, I saw psql claim
> it had successfully reconnected, and then go into a catatonic state.

Look at the above code; seems like a libpq problem.

> It wouldn't give me a new command prompt (not even with ^C), wouldn't
> exit with ^D, and had to be killed from another shell window.
> 
> This behavior doesn't seem to happen for every crash, but I'm not
> really interested in trying to debug it.  I think the "feature"

I am. :)

> ought to be ripped out.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden