Re: Issues with Quorum Commit

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Issues with Quorum Commit
Дата
Msg-id 1286313540.2025.2923.camel@ebony
обсуждение исходный текст
Ответ на Re: Issues with Quorum Commit  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Issues with Quorum Commit  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Tue, 2010-10-05 at 13:45 -0700, Jeff Davis wrote:
> On Tue, 2010-10-05 at 12:11 -0700, Josh Berkus wrote:
> > B. Eventual Inconsistency
> > -------------------------
> > If we have a quorum commit, it's possible for any individual standby to
> > be indefinitely ahead of any standby which is not needed by the quorum.
> >  This means that:
> > 
> > -- There is no clear criteria for when a standby which is not needed for
> > quorum should be considered no longer a synch standby, and
> > -- Applications cannot make assumptions that synch rep promises some
> > specific window of synchronicity, eliminating a lot of the value of
> > quorum commit.
> 
> Point B seems particularly dangerous.
> 
> When you lose one of the systems and the lagging server becomes required
> for quorum, then all of a sudden you could be facing a huge delay to
> commit the next transaction (because it needs to catch up on a lot of
> WAL replay). This can happen even without a network problem at all, and
> seems very likely to result in the lagging system being considered
> "down" due to a timeout. Not good, because the reason it is required for
> quorum is because another standby just went down.
> 
> In other words, a lagging standby combined with a timeout mechanism is
> essentially useless, because it will never catch up in time to be a part
> of the quorum.

Thanks for explaining what was meant.

This issue is a serious problem with the apply to *all* servers that
Heikki has been describing as being a useful use case. We register a
standby, it goes down and we decide to wait for it. Then when it does
come back up it takes ages to catch up.

This is really the nail in the coffin for the "All" servers use case,
and a significant blow to the requirement for standby registration.

If we use N+1 redundancy as I have explained, then this situation does
not occur until you have less than N standbys available. But then it's
no surprise that RAID-5 won't work with 4 drives either.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bernd Helmle
Дата:
Сообщение: Re: Re: starting to review the Extend NOT NULL representation to pg_constraint patch
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Issues with Quorum Commit