Re: max_standby_delay considered harmful

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: max_standby_delay considered harmful
Дата
Msg-id AANLkTinigyOHRex2G2WrJKHlVZXPMpTTrjhg_WCbAeaM@mail.gmail.com
обсуждение исходный текст
Ответ на Re: max_standby_delay considered harmful  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: max_standby_delay considered harmful  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
On Mon, May 3, 2010 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm inclined to think that we should throw away all this logic and just
>>> have the slave cancel competing queries if the replay process waits
>>> more than max_standby_delay seconds to acquire a lock.
>
>> What if we somehow get into a situation where the replay process is
>> waiting for a lock over and over and over again, because it keeps
>> killing conflicting processes but something restarts them and they
>> take locks over again?
>
> They won't be able to take locks "over again", because the lock manager
> won't allow requests to pass a pending previous request, except in
> very limited circumstances that shouldn't hold here.  They'll queue
> up behind the replay process's lock request, not in front of it.
> (If that isn't the case, it needs to be fixed, quite independently
> of this concern.)

Well, the new backends needn't try to take "the same" locks as the
existing backends - the point is that in the worst case this proposal
means waiting max_standby_delay for EACH replay that requires taking a
lock.  And that might be a LONG time.

One idea I had while thinking this over was to bound the maximum
amount of unapplied WAL rather than the absolute amount of time lag.
Now, that's a little fruity, because your WAL volume might fluctuate
considerably, so you wouldn't really know how far the slave was behind
the master chronologically.  However, it would avoid all the time skew
issues, and it would also more accurately model the idea of a bound on
recovery time should we need to promote the standby to master, so
maybe it works out to a win.  You could still end up stuck
semi-permanently behind, but never by more than N segments.

Stephen's idea of a mode where we wait up to max_standby_delay for a
lock but then kill everything in our path until we've caught up again
is another possible way of approaching this problem, although it may
lead to "kill storms".  Some of that may be inevitable, though: a
bound on WAL lag has the same issue - if the primary is generating WAL
faster than the standby can apply it, the standby will eventually
decide to slaughter everything in its path.

...Robert


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: max_standby_delay considered harmful
Следующее
От: Robert Haas
Дата:
Сообщение: Re: buildfarm building all live branches from git