Re: beta3 & the open items list

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: beta3 & the open items list
Дата
Msg-id AANLkTikhcwlko4QsPonDRiYbgxdscH-9dR2IBqqmgdAT@mail.gmail.com
обсуждение исходный текст
Ответ на Re: beta3 & the open items list  (Greg Stark <gsstark@mit.edu>)
Ответы Re: beta3 & the open items list  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
On Sun, Jun 20, 2010 at 9:31 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug <fgp@phlo.org> wrote:
>> I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently
longnetwork outages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity
provesnothing, since whether the connection is active or inactive during an outage is usually outside of anyone's
control.
>>
>> I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a
fewspurious disconnects. The former always require manual intervention and cause all sorts of performance and
disk-spaceissues, while the latter won't even be an issue for well-written clients who just reconnect and retry. 
>>
>
> So just as a data point I'm routinely annoyed by reopening my screen
> session and finding various session sessions have died since the day
> before. Usually this is caused by broken firewalls but there are also
> a bunch of SSH options which some servers have enabled which cause my
> sessions to never survive very long if there are any network outages.
> Servers where those options are disabled work fine.
>
> I admit this is a very different use case though and since we have
> control over the behaviour when the connection breaks perhaps the
> analogy falls apart completely. I'm not sure we can guarantee that
> reconnecting is always so simple though. What if the user set up an
> SSH gateway or needs some extra authentication to make the connection.
> Are users expecting the slave to randomly disconnect and reconnect
> willy nilly or are they expecting that once it connects it'll keep
> using that connection forever?

I feel like we're getting off in the weeds, here.  Obviously, the user
would ideally like the connection to the master to last forever, but
equally obviously, if the master unexpectedly reboots, they'd like the
slave to notice - ideally within some reasonable time period - that it
needs to reconnect.  There's no perfect way to distinguish "the master
croaked" from "the network administrator unplugged the Ethernet cable
and is planning to plug it back in any hour now", so we'll just need
to pick some reasonable timeout and go with it.  To my way of
thinking, if the master hasn't responded in a minute or two, that's a
sign that it's time to declare the connection dead.  Retrying the
connection *should* be cheap.  If the user has set things up so that a
TCP connection from slave to master is not straightforward, the user
has configured it incorrectly, and no matter what we do it's not going
to be reliable.

I still think there's a decent argument that we might want to have a
protocol-level heartbeat rather than a TCP-level heartbeat.  But doing
the latter is, I think, good enough for 9.0.  We're pretty much
speculating about what the problems with that approach might be, so
getting too worked up about fixing them at this point seems premature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Steve Singer
Дата:
Сообщение: Re: Patch: psql \whoami option
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Patch: psql \whoami option