Re: Issues with Quorum Commit

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Issues with Quorum Commit
Дата
Msg-id AANLkTikg6e2XoUa2Adsccxyvd2BiV0j9mdx9=DhLz6Zb@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issues with Quorum Commit  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Issues with Quorum Commit  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Tue, Oct 5, 2010 at 5:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The points appear to be directed at "quorum commit", which is a name
> I've used. But most of the points apply more to Fujii's patch than my
> own. I can only presume that Josh wants to prevent us from adopting a
> design that allows sync against multiple standbys.

This looks to me like a cheap shot that doesn't advance the
discussion.  You are the first to complain when people don't take your
ideas as seriously as you feel they should.

>> > A. Permanent Synchronization Failure
>> > ---------------------------------
>> > Quorum commit, like other forms of more-than-one-standby synch rep,
>> > offers the possibility that one or more standbys could end up
>> > irretrievably desyncronized with the master.
>> >
>> > 1. Quorum is 3 servers (out of 5) with mode "apply"
>> > 2. Standbys 2 and 4 receive and apply transaction # 20001.
>> > 3. Due to a network issue, no other standby applies #20001.
>> > 4. Accordingly, the master rolls back #20001 and cancels, either due to
>> > timeout or DBA cancel.
>>
>> The master can not roll back or cancel the transaction. That's
>> completely infeasible, the WAL record has been written to local disk
>> already. The best it can do is halt and wait for enough standbys to
>> appear to fulfill the quorum. The client will hang waiting for the
>> COMMIT to finish, and the transaction will appear as in-progress to
>> other transactions.
>
> Yes, that point has long been understood. Neither patch does this, and
> in fact the issue is a completely general one.

Yep.

>> There's subtle point here that I don't think has been discussed yet: If
>> the master is forcibly restarted at that point, with pg_ctl restart -m
>> immediate, strictly speaking the master should start up in the same
>> state, with the unlucky transaction still appearing as in-progress,
>> until the standby acknowledges.
>
> That is a very important point, but again, nothing to do with quorum
> commit. For strict correctness, we should do that. Are you suggesting we
> should do that here?

I agree that this has nothing to do with quorum commit.  It does have
to do with synchronous replication, but I'm skeptical that we want to
get into it for this release, if ever.

>> > 5. #2 and #5 are now hopelessly out of synch with the master.
>>
>> > B. Eventual Inconsistency
>> > -------------------------
>> > If we have a quorum commit, it's possible for any individual standby to
>> > be indefinitely ahead of any standby which is not needed by the quorum.
>> >   This means that:
>> >
>> > -- There is no clear criteria for when a standby which is not needed for
>> > quorum should be considered no longer a synch standby, and
>> > -- Applications cannot make assumptions that synch rep promises some
>> > specific window of synchronicity, eliminating a lot of the value of
>> > quorum commit.
>>
>> Yep.
>
> Could the person that wrote that actually explain what a "specific
> window of synchronicity" is? I'm not sure whether to agree, or disagree.

Me either.

>> > C. Performance
>> > --------------
>> > Doing quorum commit requires significant extra accounting on the
>> > master's part: it must keep track of how many standbys committed for
>> > each pending transaction (and remember there may be many at the same
>> > time).
>> >
>> > Doing so could involve significant response-time overhead added to the
>> > simple case where there is only one standby, as well as memory usage,
>> > and likely a lot of troubleshooting of the mechanism from us.
>>
>> My gut feeling is that overhead will pale to insignificance compared to
>> the network and other overheads of actually getting the WAL to the
>> standby and processing the acknowledgments.
>
> You're ignoring Josh's points. Those exact points have been made by me
> in support of the design of my patch and against Fujii's. The mechanism
> to do this will be more complex and more likely to break. And it will be
> slower and that is a concern for me.

I don't think Heikki ignored Josh's points, and I do think Heikki's
analysis is correct.

>> > D. Adding/Replacing Quorum Members
>> > ----------------------------------
>> > For Quorum commit to be really valuable, we need to be able to add new
>> > quorum members and remove dead ones *without stopping the master*.  Per
>> > discussion about the startup issues with only one master, we have not
>> > worked out how to do this for synch rep standbys.  It's reasonable to
>> > assume that this will be more complex for a quorum group than with a
>> > single synch standby.
>> >
>> > Consider the case, for example, where due to a network outage we have
>> > dropped below quorum.  What is the strategy for getting the system
>> > running again by adding standbys?
>>
>> You start a new one from the latest base backup and let it catch up?
>> Possibly modifying the config file in the master to let it know about
>> the new standby, if we go down that path. This part doesn't seem
>> particularly hard to me.
>
> Agreed, not sure of the issue there.

Also agreed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Issues with Quorum Commit
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Issues with Quorum Commit