Обсуждение: Should we remove "not fast" promotion at all?

Поиск
Список
Период
Сортировка

Should we remove "not fast" promotion at all?

От
Fujii Masao
Дата:
Hi all,

We discussed the $SUBJECT in the following threads:
http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

Regards,

On Sat, Aug 3, 2013 at 4:31 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:
> Hi,
>
> I made a patch for REL9_3_STABLE which gets rid of
> old promote processing. please check it.
> This patch make PostgreSQL do fast promoting(*) always.
> (*) which means skipping long checkpoint before increasing
> timeline.
>
> And after this, I'll do make another patch for unlinking files which are
> created by user as a trigger_file or "pg_ctl promote" command.
>
> ---------------
> Tomonari Katsumata
> 2013/7/30 Fujii Masao <masao.fujii@gmail.com>
>>
>> On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata
>> <t.katsumata1122@gmail.com> wrote:
>> > Hi,
>> >
>> >
>> >>>> Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
>> >>>> both promote files exist.
>> >>>>
>> >>> The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
>> >>> unusualy case.
>> >>> Because the case is when done both procedures below.
>> >>>  - user create "promote" file on PGDATA
>> >>>  - user issue "pg_ctl promote"
>> >>>
>> >>> I understand the reason.
>> >>> But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
>> >>> unlink(FAST_PROMOTE_SIGNAL_FILE).
>> >>> Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
>> >>> PROMOTE_SIGNAL_FILE is sometimes there or not there.
>> >>
>> >> I could not understand why that's better. Could you elaborate that?
>> >>
>> > I'm sorry for less explanation.
>> >
>> > I've thought that errno would be set ENOENT and
>> > this may lead something wrong.
>> > I checked this and I know it's not problem.
>> >
>> > sorry for confusing you.
>> >
>> >
>> >
>> >>> And I have another question linking this behavior.
>> >>> I think TriggerFile should be removed too.
>> >>> This is corner-case but it will happen.
>> >>> How do you think of it ?
>> >>
>> >> I don't have strong opinion about that. I've never heard the complaint
>> >> about that current behavior so far.
>> >>
>> > For example, please imagine the cascading replication environment and
>> > using old master as a standby without copying the timeline history file
>> > to new standby.
>> >
>> > -------
>> > 1. replicating 3 servers(A,B,C)
>> > A->B->C
>> > ("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and
>> > C.)
>> >
>> > 2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
>> > promote"
>>
>> Why do you need to both create the trigger file and run pg_ctl promote?
>>
>> Anyway, if the patch is useful for fail-safe and it doesn't break the
>> current
>> behavior, I'd be happy to apply it. You are suggesting that we should
>> remove
>> the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is
>> executed.
>> But there can be some cases where we can get out of the WAL replay loop,
>> for example, reach the recovery_target_xxx. So ISTM we should try to
>> remove
>> both the trigger file and "promote" file at the end of recovery
>> instead. Thought?
>>
>> > B->C
>> > (/tmp/trig file remains on server B)
>> >
>> > 4. stop server B and promoting server C with "pg_ctl promote"
>> > C
>> >
>> > 5. making server B connect for standby of server C
>> > C->B
>> > ---------
>> >
>> > In step5 server B will promote as soon as it starts,
>> > because "/tmp/trig" is stil there.
>> >
>> >
>> >
>> >>>> One question is that: we really still need to support normal promote?
>> >>>> pg_ctl promote provides only way to do fast promotion. If we want to
>> >>>> do normal promotion, we need to create PROMOTE_SIGNAL_FILE
>> >>>> and send the SIGUSR1 signal to postmaster by hand. This seems messy.
>> >>>>
>> >>>> I think that we should remove normal promotion at all, or change
>> >>>> pg_ctl promote so that provides also the way to do normal promotion.
>> >>>>
>> >>> I think he merit of "fast promote" is
>> >>>  - allowing quick connection by skipping checkpoint
>> >>> and its demerit is
>> >>>  - taking little bit longer when crash-recovery
>> >>>
>> >>> If it is seldom to happen its crash soon after promoting
>> >>> and "fast promte" never breaks consistency of database cluster,
>> >>> I think we don't need normal promotion.
>> >>
>> >> You can execute checkpoint after fast promotion for that.
>> >>
>> > OK.
>> > Then I think we should do below things.
>> > - removing normal promotion at all from source
>> > - adding the know-how you suggest on document
>>
>> IMO either is necessary.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>
>



--
Fujii Masao

Вложения

Re: Should we remove "not fast" promotion at all?

От
Michael Paquier
Дата:
On Tue, Aug 6, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi all,
>
> We discussed the $SUBJECT in the following threads:
> http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
> http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
>
> Our consensus seems to remove "not fast" promotion at all
> because there is no use case for that promotion.
Indeed, if two modes of promotion are available, it is not that
user-friendly if pg_ctl does not support both directly.
       struct stat stat_buf;

-       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0 ||
-               stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
+       /*
+        * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
+        * signal handler. We now leave the file in place and let the Startup
+        * process do the unlink. This is the infrastructure for supporting
+        * various promotion modes in the future. This allows Startup to know
+        * the mode from the promote signal file that the postmaster left.
+        */
+       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)               return true;
Why not reshuffle this comment and remove references to 9.1 and 9.2?
Something like that perhaps:
Leave the promote signal file in place and let the Startup do the
unlink. This infrastructure permits Startup to know the mode from the
promote signal file that postmaster left, keeping the door open for
support of multiple promotion modes in the future.

-               /*
-                * In 9.1 and 9.2 the postmaster unlinked the promote
file inside the
-                * signal handler. We now leave the file in place and
let the Startup
-                * process do the unlink. This allows Startup to know
whether we're
-                * doing fast or normal promotion. Fast promotion
takes precedence.
-                */
-               if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(FAST_PROMOTE_SIGNAL_FILE);
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = true;
-               }
-               else if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
-               {
-                       unlink(PROMOTE_SIGNAL_FILE);
-                       fast_promote = false;
-               }
-               ereport(LOG, (errmsg("received promote request")));
-
+               unlink(PROMOTE_SIGNAL_FILE);
Wouldn't it make sense to keep the call to stat() to check the file
status before unlinking it?
-- 
Michael



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
Hi,

On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:
> Hi all,
> 
> We discussed the $SUBJECT in the following threads:
> http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
> http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
> 
> Our consensus seems to remove "not fast" promotion at all
> because there is no use case for that promotion.
> 
> Attached patch removes "not fast" promotion. Barring any objections,
> I will commit this patch.

FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
bit of trust issues regarding the new method, and I'd like to be able to
test potential issues against a stock postgres by doing a normal instead
of a fast promotion.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Fujii Masao
Дата:
On Tue, Aug 6, 2013 at 11:20 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Aug 6, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Hi all,
>>
>> We discussed the $SUBJECT in the following threads:
>> http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
>> http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
>>
>> Our consensus seems to remove "not fast" promotion at all
>> because there is no use case for that promotion.
> Indeed, if two modes of promotion are available, it is not that
> user-friendly if pg_ctl does not support both directly.
>
>         struct stat stat_buf;
>
> -       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0 ||
> -               stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
> +       /*
> +        * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> +        * signal handler. We now leave the file in place and let the Startup
> +        * process do the unlink. This is the infrastructure for supporting
> +        * various promotion modes in the future. This allows Startup to know
> +        * the mode from the promote signal file that the postmaster left.
> +        */
> +       if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>                 return true;
> Why not reshuffle this comment and remove references to 9.1 and 9.2?

I just left the old comment as it is. I'm OK with your suggestion.

> -               /*
> -                * In 9.1 and 9.2 the postmaster unlinked the promote
> file inside the
> -                * signal handler. We now leave the file in place and
> let the Startup
> -                * process do the unlink. This allows Startup to know
> whether we're
> -                * doing fast or normal promotion. Fast promotion
> takes precedence.
> -                */
> -               if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
> -               {
> -                       unlink(FAST_PROMOTE_SIGNAL_FILE);
> -                       unlink(PROMOTE_SIGNAL_FILE);
> -                       fast_promote = true;
> -               }
> -               else if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
> -               {
> -                       unlink(PROMOTE_SIGNAL_FILE);
> -                       fast_promote = false;
> -               }
> -
>                 ereport(LOG, (errmsg("received promote request")));
> -
> +               unlink(PROMOTE_SIGNAL_FILE);
> Wouldn't it make sense to keep the call to stat() to check the file
> status before unlinking it?

Why do we need to check the existence of the file before removing it
here?

Regards,

-- 
Fujii Masao



Re: Should we remove "not fast" promotion at all?

От
Fujii Masao
Дата:
On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:
>> Hi all,
>>
>> We discussed the $SUBJECT in the following threads:
>> http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
>> http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
>>
>> Our consensus seems to remove "not fast" promotion at all
>> because there is no use case for that promotion.
>>
>> Attached patch removes "not fast" promotion. Barring any objections,
>> I will commit this patch.
>
> FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
> bit of trust issues regarding the new method, and I'd like to be able to
> test potential issues against a stock postgres by doing a normal instead
> of a fast promotion.

So we should add new option specifying the promotion mode, into pg_ctl?
Currently pg_ctl cannot trigger the normal promotion.

Or, instead of normal promotion, it might be better to use another promotion
technique like shutdown + remove recovery.conf + restart for that purpose?

Regards,

-- 
Fujii Masao



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:

Fujii Masao <masao.fujii@gmail.com> schrieb:
>On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
>wrote:
>> Hi,
>>
>> On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:
>>> Hi all,
>>>
>>> We discussed the $SUBJECT in the following threads:
>>>
>http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
>>>
>http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
>>>
>>> Our consensus seems to remove "not fast" promotion at all
>>> because there is no use case for that promotion.
>>>
>>> Attached patch removes "not fast" promotion. Barring any objections,
>>> I will commit this patch.
>>
>> FWIW I'd rather keep plain promotion for a release or two. TBH, I
>have a
>> bit of trust issues regarding the new method, and I'd like to be able
>to
>> test potential issues against a stock postgres by doing a normal
>instead
>> of a fast promotion.
>
>So we should add new option specifying the promotion mode, into pg_ctl?
>Currently pg_ctl cannot trigger the normal promotion.

I am fine with only supporting doing the promotion in the old fashioned way, but I wouldn't protest against an option
either.

>Or, instead of normal promotion, it might be better to use another
>promotion
>technique like shutdown + remove recovery.conf + restart for that
>purpose?

That's a very bad thing to do since it suppresses the timeline increase...

Regards,

Andres


Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Should we remove "not fast" promotion at all?

От
Michael Paquier
Дата:
On Tue, Aug 6, 2013 at 12:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> -
>> +               unlink(PROMOTE_SIGNAL_FILE);
>> Wouldn't it make sense to keep the call to stat() to check the file
>> status before unlinking it?
>
> Why do we need to check the existence of the file before removing it
> here?
Forget what I said, I had in mind that it might have been better to
put in silence errors of unlink here. This is not mandatory.
-- 
Michael



Re: Should we remove "not fast" promotion at all?

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@gmail.com> writes:
> On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
>> bit of trust issues regarding the new method, and I'd like to be able to
>> test potential issues against a stock postgres by doing a normal instead
>> of a fast promotion.

> So we should add new option specifying the promotion mode, into pg_ctl?
> Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6.  Which seems prudent to me.
        regards, tom lane



Re: Should we remove "not fast" promotion at all?

От
Michael Paquier
Дата:
On Tue, Aug 6, 2013 at 12:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
>
> Fujii Masao <masao.fujii@gmail.com> schrieb:
>>On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
>>wrote:
>>> Hi,
>>>
>>> On 2013-08-06 03:24:58 +0900, Fujii Masao wrote:
>>>> Hi all,
>>>>
>>>> We discussed the $SUBJECT in the following threads:
>>>>
>>http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
>>>>
>>http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com
>>>>
>>>> Our consensus seems to remove "not fast" promotion at all
>>>> because there is no use case for that promotion.
>>>>
>>>> Attached patch removes "not fast" promotion. Barring any objections,
>>>> I will commit this patch.
>>>
>>> FWIW I'd rather keep plain promotion for a release or two. TBH, I
>>have a
>>> bit of trust issues regarding the new method, and I'd like to be able
>>to
>>> test potential issues against a stock postgres by doing a normal
>>instead
>>> of a fast promotion.
>>
>>So we should add new option specifying the promotion mode, into pg_ctl?
>>Currently pg_ctl cannot trigger the normal promotion.
>
> I am fine with only supporting doing the promotion in the old fashioned way, but I wouldn't protest against an option
either.
Yeah an additional option would be the way to go especially if new
promotion modes are supported in the future. Btw, what I like about
this patch is that it opens the door for easier support of additional
promotion modes. Could it be possible to use this advantage to support
both the fast and non-fast promotions now with a new fresh structure?

>>Or, instead of normal promotion, it might be better to use another
>>promotion
>>technique like shutdown + remove recovery.conf + restart for that
>>purpose?
>
> That's a very bad thing to do since it suppresses the timeline increase...
Agreed with Andres. This is unsafe as it avoids as well all the safety
checks at the end of archive recovery.
-- 
Michael



Re: Should we remove "not fast" promotion at all?

От
Tomonari Katsumata
Дата:
Hi,

2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>
Fujii Masao <masao.fujii@gmail.com> writes:
> On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
>> bit of trust issues regarding the new method, and I'd like to be able to
>> test potential issues against a stock postgres by doing a normal instead
>> of a fast promotion.

> So we should add new option specifying the promotion mode, into pg_ctl?
> Currently pg_ctl cannot trigger the normal promotion.

It would be silly to add such an option if we want to remove the old mode
in a release or two.

I think what Andres is suggesting is to leave it as-is for 9.4 and then
remove the old code in 9.5 or 9.6.  Which seems prudent to me.

How about giving trigger_file an ability to chose "fast promote" and "normal promote"
like the triggerfile of pg_standby.
It means that if the contents of the trigger_file is empty or 'smart' then do "normal promote",
and it's 'fast' then do "fast promote".

I think this change would be smaller than change to pg_ctl.
And this would allow us to treat ${PGDATA}/promote and trigger_file only.
(because ${PGDATA}/fast_promote is not created automatically)

regards,
---------------
Tomonari Katsumata

Re: Should we remove "not fast" promotion at all?

От
Fujii Masao
Дата:
On Tue, Aug 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
>>> bit of trust issues regarding the new method, and I'd like to be able to
>>> test potential issues against a stock postgres by doing a normal instead
>>> of a fast promotion.
>
>> So we should add new option specifying the promotion mode, into pg_ctl?
>> Currently pg_ctl cannot trigger the normal promotion.
>
> It would be silly to add such an option if we want to remove the old mode
> in a release or two.

Without such an option, a user cannot easily trigger the "normal" promotion
when we find some problems in fast promotion. In this case, a user needs to
create the "promote" file and send the SIGUSR1 signal to postmaster by hand.
Or needs to execute pg_ctl promote by using old version (e.g., 9.2) of pg_ctl.
Seems confusing.

Regards,

-- 
Fujii Masao



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
On 2013-08-07 22:26:53 +0900, Fujii Masao wrote:
> On Tue, Aug 6, 2013 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Fujii Masao <masao.fujii@gmail.com> writes:
> >> On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >>> FWIW I'd rather keep plain promotion for a release or two. TBH, I have a
> >>> bit of trust issues regarding the new method, and I'd like to be able to
> >>> test potential issues against a stock postgres by doing a normal instead
> >>> of a fast promotion.
> >
> >> So we should add new option specifying the promotion mode, into pg_ctl?
> >> Currently pg_ctl cannot trigger the normal promotion.
> >
> > It would be silly to add such an option if we want to remove the old mode
> > in a release or two.
> 
> Without such an option, a user cannot easily trigger the "normal" promotion
> when we find some problems in fast promotion. In this case, a user needs to
> create the "promote" file and send the SIGUSR1 signal to postmaster by hand.
> Or needs to execute pg_ctl promote by using old version (e.g., 9.2) of pg_ctl.
> Seems confusing.

Seems fine for debugging to me.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Michael Paquier
Дата:
On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
<t.katsumata1122@gmail.com> wrote:
> Hi,
>
> 2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>
>>
>> Fujii Masao <masao.fujii@gmail.com> writes:
>> > On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
>> > wrote:
>> >> FWIW I'd rather keep plain promotion for a release or two. TBH, I have
>> >> a
>> >> bit of trust issues regarding the new method, and I'd like to be able
>> >> to
>> >> test potential issues against a stock postgres by doing a normal
>> >> instead
>> >> of a fast promotion.
>>
>> > So we should add new option specifying the promotion mode, into pg_ctl?
>> > Currently pg_ctl cannot trigger the normal promotion.
>>
>> It would be silly to add such an option if we want to remove the old mode
>> in a release or two.
>>
>> I think what Andres is suggesting is to leave it as-is for 9.4 and then
>> remove the old code in 9.5 or 9.6.  Which seems prudent to me.
>>
> How about giving trigger_file an ability to chose "fast promote" and "normal
> promote"
> like the triggerfile of pg_standby.
> It means that if the contents of the trigger_file is empty or 'smart' then
> do "normal promote",
> and it's 'fast' then do "fast promote".
> I think this change would be smaller than change to pg_ctl.
> And this would allow us to treat ${PGDATA}/promote and trigger_file only.
> (because ${PGDATA}/fast_promote is not created automatically)
Indeed, this would be the way to go to have an extensible format for
other promotion modes or other actions that could be triggered by a
standby. So why not taking the approach suggested by Katsumata-san
now? One single file to rule them all, in this case called promote,
including a keyword indicating the promotion action to take. This
could be controlled by pg_ctl entirely, and opens the door to extra
possible modes.
-- 
Michael



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
On 2013-08-08 06:40:00 +0900, Michael Paquier wrote:
> On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
> <t.katsumata1122@gmail.com> wrote:
> > Hi,
> >
> > 2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>
> >>
> >> Fujii Masao <masao.fujii@gmail.com> writes:
> >> > On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
> >> > wrote:
> >> >> FWIW I'd rather keep plain promotion for a release or two. TBH, I have
> >> >> a
> >> >> bit of trust issues regarding the new method, and I'd like to be able
> >> >> to
> >> >> test potential issues against a stock postgres by doing a normal
> >> >> instead
> >> >> of a fast promotion.
> >>
> >> > So we should add new option specifying the promotion mode, into pg_ctl?
> >> > Currently pg_ctl cannot trigger the normal promotion.
> >>
> >> It would be silly to add such an option if we want to remove the old mode
> >> in a release or two.
> >>
> >> I think what Andres is suggesting is to leave it as-is for 9.4 and then
> >> remove the old code in 9.5 or 9.6.  Which seems prudent to me.
> >>
> > How about giving trigger_file an ability to chose "fast promote" and "normal
> > promote"
> > like the triggerfile of pg_standby.
> > It means that if the contents of the trigger_file is empty or 'smart' then
> > do "normal promote",
> > and it's 'fast' then do "fast promote".
> > I think this change would be smaller than change to pg_ctl.
> > And this would allow us to treat ${PGDATA}/promote and trigger_file only.
> > (because ${PGDATA}/fast_promote is not created automatically)
> Indeed, this would be the way to go to have an extensible format for
> other promotion modes or other actions that could be triggered by a
> standby. So why not taking the approach suggested by Katsumata-san
> now? One single file to rule them all, in this case called promote,
> including a keyword indicating the promotion action to take. This
> could be controlled by pg_ctl entirely, and opens the door to extra
> possible modes.

Why are we suddenly trying to make this even more complicated? It's too
late to redesign stuff without very good evidence that it's
needed. Renaming trigger files and changing their format certainly
doesn't seem appropriate post-beta.

Let's just leave this as is, and remove the code in 9.4/9.5.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Michael Paquier
Дата:
On Thu, Aug 8, 2013 at 12:24 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-08 06:40:00 +0900, Michael Paquier wrote:
>> On Tue, Aug 6, 2013 at 8:05 PM, Tomonari Katsumata
>> <t.katsumata1122@gmail.com> wrote:
>> > Hi,
>> >
>> > 2013/8/6 Tom Lane <tgl@sss.pgh.pa.us>
>> >>
>> >> Fujii Masao <masao.fujii@gmail.com> writes:
>> >> > On Tue, Aug 6, 2013 at 11:40 AM, Andres Freund <andres@2ndquadrant.com>
>> >> > wrote:
>> >> >> FWIW I'd rather keep plain promotion for a release or two. TBH, I have
>> >> >> a
>> >> >> bit of trust issues regarding the new method, and I'd like to be able
>> >> >> to
>> >> >> test potential issues against a stock postgres by doing a normal
>> >> >> instead
>> >> >> of a fast promotion.
>> >>
>> >> > So we should add new option specifying the promotion mode, into pg_ctl?
>> >> > Currently pg_ctl cannot trigger the normal promotion.
>> >>
>> >> It would be silly to add such an option if we want to remove the old mode
>> >> in a release or two.
>> >>
>> >> I think what Andres is suggesting is to leave it as-is for 9.4 and then
>> >> remove the old code in 9.5 or 9.6.  Which seems prudent to me.
>> >>
>> > How about giving trigger_file an ability to chose "fast promote" and "normal
>> > promote"
>> > like the triggerfile of pg_standby.
>> > It means that if the contents of the trigger_file is empty or 'smart' then
>> > do "normal promote",
>> > and it's 'fast' then do "fast promote".
>> > I think this change would be smaller than change to pg_ctl.
>> > And this would allow us to treat ${PGDATA}/promote and trigger_file only.
>> > (because ${PGDATA}/fast_promote is not created automatically)
>> Indeed, this would be the way to go to have an extensible format for
>> other promotion modes or other actions that could be triggered by a
>> standby. So why not taking the approach suggested by Katsumata-san
>> now? One single file to rule them all, in this case called promote,
>> including a keyword indicating the promotion action to take. This
>> could be controlled by pg_ctl entirely, and opens the door to extra
>> possible modes.
>
> Why are we suddenly trying to make this even more complicated? It's too
> late to redesign stuff without very good evidence that it's
> needed. Renaming trigger files and changing their format certainly
> doesn't seem appropriate post-beta.
>
> Let's just leave this as is, and remove the code in 9.4/9.5.
Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
yes it's too late.
-- 
Michael



Re: Should we remove "not fast" promotion at all?

От
Bruce Momjian
Дата:
On Thu, Aug  8, 2013 at 01:27:35PM +0900, Michael Paquier wrote:
> > Why are we suddenly trying to make this even more complicated? It's too
> > late to redesign stuff without very good evidence that it's
> > needed. Renaming trigger files and changing their format certainly
> > doesn't seem appropriate post-beta.
> >
> > Let's just leave this as is, and remove the code in 9.4/9.5.
> Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
> yes it's too late.

We seem to be all over the map with the fast promotion code --- some
people don't trust it, some people want an option to enable the old
method, and some people want the old method removed.

This has left us in an odd situation where we are going to ship 9.3
old-method code with no way to enable it in case we need it.  Adding the
ability to enable it in 9.4 makes no sense --- effectively, if we need
the old promotion code, we are going to have to enable it in a 9.3 minor
release, while if we get to 9.4 final without needing it, we can assume
the fast promotion code is good and we don't need the old code.

I think a prudent plan would be to remove the old promotion code just
before 9.4 beta as we would know by then that the code is no longer
needed.  I have added this as a 9.4 open item:
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
On 2013-08-08 12:50:31 -0400, Bruce Momjian wrote:
> On Thu, Aug  8, 2013 at 01:27:35PM +0900, Michael Paquier wrote:
> > > Why are we suddenly trying to make this even more complicated? It's too
> > > late to redesign stuff without very good evidence that it's
> > > needed. Renaming trigger files and changing their format certainly
> > > doesn't seem appropriate post-beta.
> > >
> > > Let's just leave this as is, and remove the code in 9.4/9.5.
> > Sorry. I should have been clearer. I meant that for 9.4~ only. For 9.3
> > yes it's too late.
> 
> We seem to be all over the map with the fast promotion code --- some
> people don't trust it, some people want an option to enable the old
> method, and some people want the old method removed.
> 
> This has left us in an odd situation where we are going to ship 9.3
> old-method code with no way to enable it in case we need it.

That's not true. It's relatively easy to trigger it. You just can't use
pg_ctl. Which seems completely fine for debugging.

Imo there's no need to do anything. Which is what we've concluded on
before...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Josh Berkus
Дата:
Bruce, all:

> We seem to be all over the map with the fast promotion code --- some
> people don't trust it, some people want an option to enable the old
> method, and some people want the old method removed.

Having read over this thread, the only reason given for retaining any
ability to use "old" promotion code is because people are worried about
"fast" promotion being buggy.  This seems wrong.

Either we have confidence is fast promotion, or we don't.  If we don't
have confidence, then either (a) more testing is needed, or (b) it
shouldn't be the default.  Again, here, we are coming up against our
lack of any kind of broad replication failure testing.

Of course, even if we have confidence, bugs are always possible, and
leaving the old promotion code in there would make it somewhat easier to
ship a 9.3.2 update which reverts the behavior.  But maybe we should
focus on shipping a version which is relatively bug-free instead?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
On 2013-08-08 10:15:14 -0700, Josh Berkus wrote:
> Bruce, all:
> 
> > We seem to be all over the map with the fast promotion code --- some
> > people don't trust it, some people want an option to enable the old
> > method, and some people want the old method removed.
> 
> Having read over this thread, the only reason given for retaining any
> ability to use "old" promotion code is because people are worried about
> "fast" promotion being buggy.  This seems wrong.

Well, it's touching one of the more complex parts of pg.

> Either we have confidence is fast promotion, or we don't.  If we don't
> have confidence, then either (a) more testing is needed, or (b) it
> shouldn't be the default.  Again, here, we are coming up against our
> lack of any kind of broad replication failure testing.

While I think we definitely miss out there I don't think any regression
suite would help much here. I am wary of unknown problems, not ones
we already have tests for. The subtle ones aren't easy to test, even
with a regression suite.

> Of course, even if we have confidence, bugs are always possible, and
> leaving the old promotion code in there would make it somewhat easier to
> ship a 9.3.2 update which reverts the behavior.  But maybe we should
> focus on shipping a version which is relatively bug-free instead?

The problem is that, especially involving HS, there's lots of subtle
corner cases. And those are pretty hard to forsee and thus hard to
test. Being able to tell somebody to touch some file and kill a certain
process instead of pg_ctl triggering is certainly better than to have
them apply complex patches which then only exhibit the old behaviour.
It's not about letting people regularly use it or such. It's about being
able to verify problems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Josh Berkus
Дата:
On 08/08/2013 10:34 AM, Andres Freund wrote:
> On 2013-08-08 10:15:14 -0700, Josh Berkus wrote:
>> Either we have confidence is fast promotion, or we don't.  If we don't
>> have confidence, then either (a) more testing is needed, or (b) it
>> shouldn't be the default.  Again, here, we are coming up against our
>> lack of any kind of broad replication failure testing.
> 
> While I think we definitely miss out there I don't think any regression
> suite would help much here. I am wary of unknown problems, not ones
> we already have tests for. The subtle ones aren't easy to test, even
> with a regression suite.

Yeah, that's why we have to get beyond the mentality that regression
testing is the only kind of testing.  We need a destruction test for
replication, and that's NOT going to be a regression test.  Among other
things, we'll probably need to run it on cloud hosting.

> The problem is that, especially involving HS, there's lots of subtle
> corner cases. And those are pretty hard to forsee and thus hard to
> test. 

It would be useful to assemble a list of corner cases we *do* know
about.  This could become a test suite, and we could keep adding to it.

> Being able to tell somebody to touch some file and kill a certain
> process instead of pg_ctl triggering is certainly better than to have
> them apply complex patches which then only exhibit the old behaviour.
> It's not about letting people regularly use it or such. It's about being
> able to verify problems.

The problem is, if failover fails badly, the user is probably facing a
corrupt database, downtime, loss of data, and restore from backup.  So
if we don't think that fast failover is rock-solid trustworthy --- or at
least as trustworthy as slow failover was -- then we should be making it
a non-default option for 9.3.  We shouldn't be exposing people who don't
need fast failover to new risks without their knowledge.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Should we remove "not fast" promotion at all?

От
Andres Freund
Дата:
On 2013-08-08 10:51:45 -0700, Josh Berkus wrote:
> On 08/08/2013 10:34 AM, Andres Freund wrote:
> > On 2013-08-08 10:15:14 -0700, Josh Berkus wrote:
> >> Either we have confidence is fast promotion, or we don't.  If we don't
> >> have confidence, then either (a) more testing is needed, or (b) it
> >> shouldn't be the default.  Again, here, we are coming up against our
> >> lack of any kind of broad replication failure testing.
> > 
> > While I think we definitely miss out there I don't think any regression
> > suite would help much here. I am wary of unknown problems, not ones
> > we already have tests for. The subtle ones aren't easy to test, even
> > with a regression suite.
> 
> Yeah, that's why we have to get beyond the mentality that regression
> testing is the only kind of testing.  We need a destruction test for
> replication, and that's NOT going to be a regression test.  Among other
> things, we'll probably need to run it on cloud hosting.

The point is, that will still mostly produce scenarios we know of.

> > The problem is that, especially involving HS, there's lots of subtle
> > corner cases. And those are pretty hard to forsee and thus hard to
> > test. 
> 
> It would be useful to assemble a list of corner cases we *do* know
> about.  This could become a test suite, and we could keep adding to it.

The first thing would be to build the infrastructure for HS
testing. Unfortunately I don't have the time/energy for that atm. Unless
somebody steps up, this won't happen :(

> > Being able to tell somebody to touch some file and kill a certain
> > process instead of pg_ctl triggering is certainly better than to have
> > them apply complex patches which then only exhibit the old behaviour.
> > It's not about letting people regularly use it or such. It's about being
> > able to verify problems.
> 
> The problem is, if failover fails badly, the user is probably facing a
> corrupt database, downtime, loss of data, and restore from backup.  So
> if we don't think that fast failover is rock-solid trustworthy --- or at
> least as trustworthy as slow failover was -- then we should be making it
> a non-default option for 9.3.  We shouldn't be exposing people who don't
> need fast failover to new risks without their knowledge.

I don't think anybody working on related areas of the code thinks it's
rock solid.
But anyway, I just don't see the downside of allowing problem
analysis. You're free to do more testing, review, whatever before the
release.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Should we remove "not fast" promotion at all?

От
Josh Berkus
Дата:
On 08/08/2013 11:01 AM, Andres Freund wrote:

> I don't think anybody working on related areas of the code thinks it's
> rock solid.
> But anyway, I just don't see the downside of allowing problem
> analysis. You're free to do more testing, review, whatever before the
> release.

I'm 100% with you that we ought to keep the slow failover code around
and accessible to debugging.  What I'm asking is whether it should still
be explicitly available to users, and the default.  Based on your
feedback, it's sounding like it should be.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Should we remove "not fast" promotion at all?

От
Tomonari Katsumata
Дата:
Hi,

I understood it's too late to change the feature.
I hope it will be revised in 9.4!

(2013/08/09 4:13), Josh Berkus wrote:> On 08/08/2013 11:01 AM, Andres Freund wrote:>>> I don't think anybody working on
relatedareas of the code thinks it's>> rock solid.>> But anyway, I just don't see the downside of allowing problem>>
analysis.You're free to do more testing, review, whatever before the>> release.>> I'm 100% with you that we ought to
keepthe slow failover code around> and accessible to debugging.  What I'm asking is whether it should still> be
explicitlyavailable to users, and the default.  Based on your> feedback, it's sounding like it should be.>
 

In my opinion, the default behavior "fast promote" is ok.
Because we don't have any explicit problem with it for now.

And we shouldn't mention about "normal promote" in document.
I think it will make user confused if do so.


By the way, I'm thinking about when these trigger files(*)
are unlinked.
(*)Now I treat these three files
1) a file specified in trigger_file
2) ${PGDATA}/promote
3) ${PGDATA}/fast_promote

Current source has a problem in some corner cases.
It occurs by the timing of detecting these files.

for example:
------
case1:
createing 1) and executing "pg_ctl promote" occur at the same time.

case2:
creating 1) and promoting at recovery-end(by recovery_target_xxx)
occur at the same time.
------

1) is created, but promoting completes by another trigger.
Both cases 1) remains on the server.
If user doesn't know it and make a standby on the server,
the standby will promote soon.

I think this is not so big problem, but not user-friendly.
Against this, I'm thinking unlinking these files before
starting recovery.

This should be fixed in 9.4 too?

---------
Tomonari Katsumata





Re: Should we remove "not fast" promotion at all?

От
Heikki Linnakangas
Дата:
On 08.08.2013 20:15, Josh Berkus wrote:
> Bruce, all:
>
>> We seem to be all over the map with the fast promotion code --- some
>> people don't trust it, some people want an option to enable the old
>> method, and some people want the old method removed.
>
> Having read over this thread, the only reason given for retaining any
> ability to use "old" promotion code is because people are worried about
> "fast" promotion being buggy.  This seems wrong.
>
> Either we have confidence is fast promotion, or we don't.  If we don't
> have confidence, then either (a) more testing is needed, or (b) it
> shouldn't be the default.  Again, here, we are coming up against our
> lack of any kind of broad replication failure testing.

Well, I don't see much harm in keeping the old behavior as an 
undocumented escape hatch, as it is now. The way I'd phrase the current 
situation is this: 9.3 now always does "fast promotion". However, for 
debugging and testing purposes, you can still trigger the old behavior 
by manually creating a file in $PGDATA. That should never be necessary 
in the field, however.

There's one thing that irks me with the current situation, however: if 
you use 9.2 version of pg_ctl against a 9.3 server, it will 
inadvertently trigger slow promotion, because it creates the "promote" 
file. Since fast mode is the default, and not only the default but the 
only documented mode, it's confusing if you can accidentally trigger the 
old behavior like that.

And it's even worse if you use 9.3 pg_ctl against a 9.2 server: it will 
create a filed called "fast_promote" and return success, but it won't 
actually do anything.

I think "promote" file should trigger the fast promotion, and the 
filename to trigger the slow mode should be called "fallback_promote" or 
"safe_promote" or something like that. There wasn't any good reason to 
change the filename primarily used. It might even break people's scripts 
for no good reason, if people are creating the $PGDATA/promote file 
themselves without using pg_ctl.

(I raised this back in April, but Simon argued strongly for the current 
situation. I never understood why. 
http://www.postgresql.org/message-id/517798AE.30203@vmware.com)

- Heikki



Re: Should we remove "not fast" promotion at all?

От
Robert Haas
Дата:
On Mon, Aug 19, 2013 at 4:20 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Well, I don't see much harm in keeping the old behavior as an undocumented
> escape hatch, as it is now. The way I'd phrase the current situation is
> this: 9.3 now always does "fast promotion". However, for debugging and
> testing purposes, you can still trigger the old behavior by manually
> creating a file in $PGDATA. That should never be necessary in the field,
> however.
>
> There's one thing that irks me with the current situation, however: if you
> use 9.2 version of pg_ctl against a 9.3 server, it will inadvertently
> trigger slow promotion, because it creates the "promote" file. Since fast
> mode is the default, and not only the default but the only documented mode,
> it's confusing if you can accidentally trigger the old behavior like that.
>
> And it's even worse if you use 9.3 pg_ctl against a 9.2 server: it will
> create a filed called "fast_promote" and return success, but it won't
> actually do anything.
>
> I think "promote" file should trigger the fast promotion, and the filename
> to trigger the slow mode should be called "fallback_promote" or
> "safe_promote" or something like that. There wasn't any good reason to
> change the filename primarily used. It might even break people's scripts for
> no good reason, if people are creating the $PGDATA/promote file themselves
> without using pg_ctl.

+1.

...Robert



Re: Should we remove "not fast" promotion at all?

От
Kevin Grittner
Дата:
Robert Haas <robertmhaas@gmail.com> wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

>> I think "promote" file should trigger the fast promotion, and
>> the filename to trigger the slow mode should be called
>> "fallback_promote" or "safe_promote" or something like that.
>> There wasn't any good reason to change the filename primarily
>> used. It might even break people's scripts for no good reason,
>> if people are creating the $PGDATA/promote file themselves
>> without using pg_ctl.
>
> +1.

+1  Changing it has risk with no associated benefit that I can see.
 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should we remove "not fast" promotion at all?

От
Bruce Momjian
Дата:
On Mon, Aug 19, 2013 at 11:20:42AM +0300, Heikki Linnakangas wrote:
> And it's even worse if you use 9.3 pg_ctl against a 9.2 server: it
> will create a filed called "fast_promote" and return success, but it
> won't actually do anything.
> 
> I think "promote" file should trigger the fast promotion, and the
> filename to trigger the slow mode should be called
> "fallback_promote" or "safe_promote" or something like that. There
> wasn't any good reason to change the filename primarily used. It
> might even break people's scripts for no good reason, if people are
> creating the $PGDATA/promote file themselves without using pg_ctl.
> 
> (I raised this back in April, but Simon argued strongly for the
> current situation. I never understood why.
> http://www.postgresql.org/message-id/517798AE.30203@vmware.com)

+1

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Should we remove "not fast" promotion at all?

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Aug 19, 2013 at 11:20:42AM +0300, Heikki Linnakangas wrote:
>> I think "promote" file should trigger the fast promotion, and the
>> filename to trigger the slow mode should be called
>> "fallback_promote" or "safe_promote" or something like that. There
>> wasn't any good reason to change the filename primarily used. It
>> might even break people's scripts for no good reason, if people are
>> creating the $PGDATA/promote file themselves without using pg_ctl.
>> 
>> (I raised this back in April, but Simon argued strongly for the
>> current situation. I never understood why.
>> http://www.postgresql.org/message-id/517798AE.30203@vmware.com)

> +1

If we're going to change this in 9.3, it needs to happen *now*, as in
the next couple hours, because I plan to wrap rc1 this afternoon.
Please stop discussing and commit something.
        regards, tom lane



Re: Should we remove "not fast" promotion at all?

От
Bruce Momjian
Дата:
On Mon, Aug 19, 2013 at 01:27:29PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Aug 19, 2013 at 11:20:42AM +0300, Heikki Linnakangas wrote:
> >> I think "promote" file should trigger the fast promotion, and the
> >> filename to trigger the slow mode should be called
> >> "fallback_promote" or "safe_promote" or something like that. There
> >> wasn't any good reason to change the filename primarily used. It
> >> might even break people's scripts for no good reason, if people are
> >> creating the $PGDATA/promote file themselves without using pg_ctl.
> >> 
> >> (I raised this back in April, but Simon argued strongly for the
> >> current situation. I never understood why.
> >> http://www.postgresql.org/message-id/517798AE.30203@vmware.com)
> 
> > +1
> 
> If we're going to change this in 9.3, it needs to happen *now*, as in
> the next couple hours, because I plan to wrap rc1 this afternoon.
> Please stop discussing and commit something.

FYI, Tom, I think Heikki is working on it now.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Should we remove "not fast" promotion at all?

От
Heikki Linnakangas
Дата:
On 19.08.2013 20:27, Tom Lane wrote:
> Bruce Momjian<bruce@momjian.us>  writes:
>> On Mon, Aug 19, 2013 at 11:20:42AM +0300, Heikki Linnakangas wrote:
>>> I think "promote" file should trigger the fast promotion, and the
>>> filename to trigger the slow mode should be called
>>> "fallback_promote" or "safe_promote" or something like that. There
>>> wasn't any good reason to change the filename primarily used. It
>>> might even break people's scripts for no good reason, if people are
>>> creating the $PGDATA/promote file themselves without using pg_ctl.
>>>
>>> (I raised this back in April, but Simon argued strongly for the
>>> current situation. I never understood why.
>>> http://www.postgresql.org/message-id/517798AE.30203@vmware.com)
>
>> +1
>
> If we're going to change this in 9.3, it needs to happen *now*, as in
> the next couple hours, because I plan to wrap rc1 this afternoon.
> Please stop discussing and commit something.

Ok, committed. The promote trigger file is now called "promote", like it 
was in previous versions. For the non-fast promotion, create a file 
called "fallback_promote".

- Heikki



Re: Should we remove "not fast" promotion at all?

От
Simon Riggs
Дата:
On 19 August 2013 09:20, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 08.08.2013 20:15, Josh Berkus wrote:
>>
>> Bruce, all:
>>
>>> We seem to be all over the map with the fast promotion code --- some
>>> people don't trust it, some people want an option to enable the old
>>> method, and some people want the old method removed.
>>
>>
>> Having read over this thread, the only reason given for retaining any
>> ability to use "old" promotion code is because people are worried about
>> "fast" promotion being buggy.  This seems wrong.
>>
>> Either we have confidence is fast promotion, or we don't.  If we don't
>> have confidence, then either (a) more testing is needed, or (b) it
>> shouldn't be the default.  Again, here, we are coming up against our
>> lack of any kind of broad replication failure testing.
>
>
> Well, I don't see much harm in keeping the old behavior as an undocumented
> escape hatch, as it is now. The way I'd phrase the current situation is
> this: 9.3 now always does "fast promotion". However, for debugging and
> testing purposes, you can still trigger the old behavior by manually
> creating a file in $PGDATA. That should never be necessary in the field,
> however.

+1, again.

I have removed this item from the 9.4 open items list, since this
issue has already been resolved.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services