Re: Synchronous Standalone Master Redoux

Поиск
Список
Период
Сортировка
От Jose Ildefonso Camargo Tolosa
Тема Re: Synchronous Standalone Master Redoux
Дата
Msg-id CAETJ_S_GReZ05SCy=dzAGN5+KAQ5gGmS5q-v2D7fU0_PkGJmtg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronous Standalone Master Redoux  (Amit kapila <amit.kapila@huawei.com>)
Ответы Re: Synchronous Standalone Master Redoux  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
On Sat, Jul 14, 2012 at 12:42 AM, Amit kapila <amit.kapila@huawei.com> wrote:
>> From: Jose Ildefonso Camargo Tolosa [ildefonso.camargo@gmail.com]
>> Sent: Saturday, July 14, 2012 9:36 AM
>>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila <amit.kapila@huawei.com> wrote:
>> From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Jose Ildefonso Camargo
Tolosa[ildefonso.camargo@gmail.com]
 
>> Sent: Saturday, July 14, 2012 6:08 AM
>> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>>
>>>>> So how about this for a Postgres TODO:
>>>>>
>>>>>         Add configuration variable to allow Postgres to disable synchronous
>>>>>        replication after a specified timeout, and add variable to alert
>>>>>         administrators of the change.
>>
>>>> I agree we need a TODO for this, but... I think timeout-only is not
>>>> the best choice, there should be a maximum timeout (as a last
>>>> resource: the maximum time we are willing to wait for standby, this
>>>> have to have the option of "forever"), but certainly PostgreSQL have
>>>> to detect the *complete* disconnection of the standby (or all standbys
>>>> on the synchronous_standby_names), if it detects that no standbys are
>>>> eligible for sync standby AND the option to do fallback to async is
>>>> enabled = it will go into standalone mode (as if
>>>> synchronous_standby_names were empty), otherwise (if option is
>>>> disabled) it will just continue to wait for ever (the "last resource"
>>>> timeout is ignored if the fallback option is disabled).... I would
>>>> call this "soft_synchronous_standby", and
>>>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
>>>> value would be ~5 seconds) or something like that (I'm quite bad at
>>>> picking names :( ).
>>
>> >After it has gone to standalone mode, if the standby came back will it be able to return back to sync mode with
it.
>
>> That's the idea, yes, after the standby comes back, the master would
>> act as if the sync standby connected for the first time: first going
>> through the "catchup" mode, and "once the lag between standby and
>> primary reaches zero "(...)" we move to real-time streaming state"
>> (from 9.1 docs), at that point: normal sync behavior is restored.
>
> Idea wise, it looks okay, but are you sure that in the current code/design, it can handle the way you are
suggesting.
> I am not sure it can work because it might be the case that due to network instability, the master has gone in
standalonemode
 
> and now after standy is able to communicate back, it might be expecting to get more data rather than go in cacthup
mode.
> I believe some person who is expert of this code area can comment here to make it more concrete.

Well, I'd need to dive into the code, but as far as I know, is the
master who decides to be on "catchup" mode, and standby just takes
care of sending feedback to master.  Also, it has to handle the
situation, because currently, if master goes away because it crashed,
or because of network issues, the standby doesn't really know why, and
will reconnect to master and do whatever it needs to do to get in sync
with master again (be it: try to reconnect several times while master
is restarting, or that it just reconnect to a waiting master, and
request pending WAL segments).  There have to be code in place to
handle those issues, because it is already working.  I'm trying to get
a solution that is as non-intrusive as possible, with lower amount of
code added, so that performance doesn't suffer by reusing current
logic and actions, with small alterations.

>
> With Regards,
> Amit Kapila.



--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [PATCH] Allow breaking out of hung connection attempts
Следующее
От: Jan Urbański
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.