Re: Issues with Quorum Commit

Поиск

Список

Период

Сортировка

От	Bruce Momjian
Тема	Re: Issues with Quorum Commit
Дата	20 октября 2010 г. 21:49:31
Msg-id	201010210049.o9L0n6114296@momjian.us обсуждение исходный текст
Ответ на	Re: Issues with Quorum Commit (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Tom Lane wrote:
> Greg Smith <greg@2ndquadrant.com> writes:
> > I don't see this as needing any implementation any more complicated than 
> > the usual way such timeouts are handled.  Note how long you've been 
> > trying to reach the standby.  Default to -1 for forever.  And if you hit 
> > the timeout, mark the standby as degraded and force them to do a proper 
> > resync when they disconnect.  Once that's done, then they can re-enter 
> > sync rep mode again, via the same process a new node would have done so.
> 
> Well, actually, that's *considerably* more complicated than just a
> timeout.  How are you going to "mark the standby as degraded"?  The
> standby can't keep that information, because it's not even connected
> when the master makes the decision.  ISTM that this requires
> 
> 1. a unique identifier for each standby (not just role names that
> multiple standbys might share);
> 
> 2. state on the master associated with each possible standby -- not just
> the ones currently connected.
> 
> Both of those are perhaps possible, but the sense I have of the
> discussion is that people want to avoid them.
> 
> Actually, #2 seems rather difficult even if you want it.  Presumably
> you'd like to keep that state in reliable storage, so it survives master
> crashes.  But how you gonna commit a change to that state, if you just
> lost every standby (suppose master's ethernet cable got unplugged)?
> Looks to me like it has to be reliable non-replicated storage.  Leaving
> aside the question of how reliable it can really be if not replicated,
> it's still the case that we have noplace to put such information given
> the WAL-is-across-the-whole-cluster design.

I assumed we would have a parameter called "sync_rep_failure" that would
take a command and the command would be called when communication to the
slave was lost.  If you restart, it tries again and might call the
function again.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Issues with Quorum Commit