Re: Issues with Quorum Commit

Поиск
Список
Период
Сортировка
От Aidan Van Dyk
Тема Re: Issues with Quorum Commit
Дата
Msg-id AANLkTi=B0d75Pf4W4GUgKVHhCJs_Rh=CMWNp5xfT40B_@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issues with Quorum Commit  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Issues with Quorum Commit  (Josh Berkus <josh@agliodbs.com>)
Re: Issues with Quorum Commit  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: Issues with Quorum Commit  (Markus Wanner <markus@bluegap.ch>)
Re: Issues with Quorum Commit  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote:

> So if you have k = 3 and N = 10, then you can have 10 standbys and only
> 3 of them need to ack any specific commit for the master to proceed. As
> long as (a) you retain at least one of the 3 which ack'd, and (b) you
> have some way of determining which standby is the most "caught up", data
> loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong
> 4, to lose data.
>
> The advantage of this for availability over just having k = N = 3 comes
> when one of the standbys is responding slowly (due to traffic) or goes
> offline unexpectedly due to a hardware failure.  In the k = N = 3 case,
> the system halts.  In the k = 3, N = 10 case, you can lose up to 7
> standbys without the system going down.

Sure, but here is where I might not be following.

If you want "synchronous replication" because you want "query
availabilty" while making sure you're not getting "stale" queries from
all your slaves, than using your k < N (k = 3 and N - 10) situation is
screwing your self.

To get "non-stale" responses, you can only query those k=3 servers.
But you've shot your self in the foot because you don't know which
3/10 those will be.  The other 7 *are* stale (by definition).  They
talk about picking the "caught up" slave when the master fails, but
you actually need to do that for *every query*.

If you say they are "pretty close so by the time you get the query to
them they will be caught up", well then, all you really want is good
async replication, you don't really *need* the synchronous part.

The only case I see a "race to quorum" type of k < N being useful is
if you're just trying to duplicate data everywhere, but not actually
querying any of the replicas.  I can see that "all queries go to the
master, but the chances are pretty high the multiple machines are
going to fail so I want >> multiple replicas" being useful, but I
*don't* think that's what most people are wanting in their "I want 3
of 10 servers to ack the commit".

The difference between good async and sync is only the *guarentee*.
If you don't need the guarantee, you don't need the synchronous part.

a.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: standby registration (was: is sync rep stalled?)
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: standby registration (was: is sync rep stalled?)