Обсуждение: Should we remove a fallback promotion? take 2

Поиск
Список
Период
Сортировка

Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:
Hi,

We discussed the $SUBJECT six years ago at the following thread.
https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+aNAzc1Oo9O4LqWo50MxpvFj+0VOw@mail.gmail.com

Seems our consensus at that discussion was to leave a fallback
promotion for a release or two for debugging purpose or as
an emergency method because fast promotion might have
some issues, and then to remove it later. Now, more than six years
have already passed since that discussion. Is there still
any reason to keep a fallback promotion? If nothing, I'd like to
drop it from v13.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Should we remove a fallback promotion? take 2

От
Robert Haas
Дата:
On Thu, Mar 5, 2020 at 8:48 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> We discussed the $SUBJECT six years ago at the following thread.
> https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+aNAzc1Oo9O4LqWo50MxpvFj+0VOw@mail.gmail.com
>
> Seems our consensus at that discussion was to leave a fallback
> promotion for a release or two for debugging purpose or as
> an emergency method because fast promotion might have
> some issues, and then to remove it later. Now, more than six years
> have already passed since that discussion. Is there still
> any reason to keep a fallback promotion? If nothing, I'd like to
> drop it from v13.

Seems reasonable, but it would be better if people proposed these
kinds of changes closer to the beginning of the release cycle rather
than in the crush at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should we remove a fallback promotion? take 2

От
Michael Paquier
Дата:
On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> Seems reasonable, but it would be better if people proposed these
> kinds of changes closer to the beginning of the release cycle rather
> than in the crush at the end.

+1, to both points.
--
Michael

Вложения

Re: Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:

On 2020/03/06 10:40, Michael Paquier wrote:
> On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
>> Seems reasonable, but it would be better if people proposed these
>> kinds of changes closer to the beginning of the release cycle rather
>> than in the crush at the end.
> 
> +1, to both points.

Ok, I'm fine to do that in v14.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Should we remove a fallback promotion? take 2

От
Alvaro Herrera
Дата:
On 2020-Mar-06, Michael Paquier wrote:

> On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> > Seems reasonable, but it would be better if people proposed these
> > kinds of changes closer to the beginning of the release cycle rather
> > than in the crush at the end.
> 
> +1, to both points.

Why?  Are you saying that there's some actual risk of breaking
something?  We're not even near beta or feature freeze yet.

I'm not seeing the reason for the "please propose this sooner in the
cycle" argument.  It has already been proposed sooner -- seven years
sooner.  We're not waiting for users to complain anymore; clearly nobody
cared.

I think dragging things forever serves no purpose.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should we remove a fallback promotion? take 2

От
Andres Freund
Дата:
Hi,

On 2020-03-06 16:33:18 -0300, Alvaro Herrera wrote:
> On 2020-Mar-06, Michael Paquier wrote:
> > On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> > > Seems reasonable, but it would be better if people proposed these
> > > kinds of changes closer to the beginning of the release cycle rather
> > > than in the crush at the end.
> > 
> > +1, to both points.
> 
> Why?  Are you saying that there's some actual risk of breaking
> something?  We're not even near beta or feature freeze yet.
> 
> I'm not seeing the reason for the "please propose this sooner in the
> cycle" argument.  It has already been proposed sooner -- seven years
> sooner.  We're not waiting for users to complain anymore; clearly nobody
> cared.

Yea. There are changes that are so invasive that it's useful to go very
early, but in this case I'm not seeing it?

+1 for removing non-fast promotions.

FWIW, I find "fallback promotion" a confusing description.


Btw, I'd really like to make the crash recovery environment more like
the replication environment. I.e. have checkpointer, bgwriter running,
and have an 'end-of-recovery' record instead of a checkpoint at the end.


Greetings,

Andres Freund



Remove non-fast promotion Re: Should we remove a fallback promotion?take 2

От
Fujii Masao
Дата:

On 2020/03/10 6:56, Andres Freund wrote:
> Hi,
> 
> On 2020-03-06 16:33:18 -0300, Alvaro Herrera wrote:
>> On 2020-Mar-06, Michael Paquier wrote:
>>> On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
>>>> Seems reasonable, but it would be better if people proposed these
>>>> kinds of changes closer to the beginning of the release cycle rather
>>>> than in the crush at the end.
>>>
>>> +1, to both points.
>>
>> Why?  Are you saying that there's some actual risk of breaking
>> something?  We're not even near beta or feature freeze yet.
>>
>> I'm not seeing the reason for the "please propose this sooner in the
>> cycle" argument.  It has already been proposed sooner -- seven years
>> sooner.  We're not waiting for users to complain anymore; clearly nobody
>> cared.
> 
> Yea. There are changes that are so invasive that it's useful to go very
> early, but in this case I'm not seeing it?
> 
> +1 for removing non-fast promotions.

Patch attached. I will add this into the first CF for v14.

> FWIW, I find "fallback promotion" a confusing description.

Yeah, so I changed the subject.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Michael Paquier
Дата:
On Mon, Apr 20, 2020 at 03:26:16PM +0900, Fujii Masao wrote:
> Patch attached. I will add this into the first CF for v14.

Thanks!

> -    if (IsPromoteSignaled())
> +    /*
> +     * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> +     * signal handler. It now leaves the file in place and lets the
> +     * Startup process do the unlink.
> +     */
> +    if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>      {
> -        /*
> -         * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> -         * signal handler. It now leaves the file in place and lets the
> -         * Startup process do the unlink. This allows Startup to know whether
> -         * it should create a full checkpoint before starting up (fallback
> -         * mode). Fast promotion takes precedence.
> -         */
> -        if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
> -        {
> -            unlink(PROMOTE_SIGNAL_FILE);
> -            unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
> -            fast_promote = true;
> -        }
> -        else if (stat(FALLBACK_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
> -        {
> -            unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
> -            fast_promote = false;
> -        }
> -
>          ereport(LOG, (errmsg("received promote request")));
> -
> +        unlink(PROMOTE_SIGNAL_FILE);

On HEAD, this code means that it is possible to end recovery just by
sending SIGUSR2 to the startup process.  With your patch, this code
now means that in order to finish recovery you need to send SIGUSR2 to
the startup process *and* to create the promote signal file.  Is that
really what you want?
--
Michael

Вложения

Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/21 10:59, Michael Paquier wrote:
> On Mon, Apr 20, 2020 at 03:26:16PM +0900, Fujii Masao wrote:
>> Patch attached. I will add this into the first CF for v14.
> 
> Thanks!
> 
>> -    if (IsPromoteSignaled())
>> +    /*
>> +     * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
>> +     * signal handler. It now leaves the file in place and lets the
>> +     * Startup process do the unlink.
>> +     */
>> +    if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>>       {
>> -        /*
>> -         * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
>> -         * signal handler. It now leaves the file in place and lets the
>> -         * Startup process do the unlink. This allows Startup to know whether
>> -         * it should create a full checkpoint before starting up (fallback
>> -         * mode). Fast promotion takes precedence.
>> -         */
>> -        if (stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>> -        {
>> -            unlink(PROMOTE_SIGNAL_FILE);
>> -            unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
>> -            fast_promote = true;
>> -        }
>> -        else if (stat(FALLBACK_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>> -        {
>> -            unlink(FALLBACK_PROMOTE_SIGNAL_FILE);
>> -            fast_promote = false;
>> -        }
>> -
>>           ereport(LOG, (errmsg("received promote request")));
>> -
>> +        unlink(PROMOTE_SIGNAL_FILE);

Thanks for reviewing the patch!

> On HEAD, this code means that it is possible to end recovery just by
> sending SIGUSR2 to the startup process.

Yes, in this case, non-fast promotion is triggered.

> With your patch, this code
> now means that in order to finish recovery you need to send SIGUSR2 to
> the startup process *and* to create the promote signal file.

Yes, but isn't this the same as the way to trigger fast promotion in HEAD?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Michael Paquier
Дата:
On Tue, Apr 21, 2020 at 02:27:20PM +0900, Fujii Masao wrote:
> On 2020/04/21 10:59, Michael Paquier wrote:
>> With your patch, this code
>> now means that in order to finish recovery you need to send SIGUSR2 to
>> the startup process *and* to create the promote signal file.
>
> Yes, but isn't this the same as the way to trigger fast promotion in HEAD?

Yep, but my point is that some users who have been relying only on
SIGUSR2 sent to the startup process for a promotion may be surprised
to see that doing the same operation does not trigger a promotion
anymore.
--
Michael

Вложения

Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/21 14:54, Michael Paquier wrote:
> On Tue, Apr 21, 2020 at 02:27:20PM +0900, Fujii Masao wrote:
>> On 2020/04/21 10:59, Michael Paquier wrote:
>>> With your patch, this code
>>> now means that in order to finish recovery you need to send SIGUSR2 to
>>> the startup process *and* to create the promote signal file.
>>
>> Yes, but isn't this the same as the way to trigger fast promotion in HEAD?
> 
> Yep, but my point is that some users who have been relying only on
> SIGUSR2 sent to the startup process for a promotion may be surprised
> to see that doing the same operation does not trigger a promotion
> anymore.

Yeah, but that's not documented. So I don't think that we need to keep
the backward-compatibility for that.

Also in that case, non-fast promotion is triggered. Since my patch
tries to remove non-fast promotion, it's intentional to prevent them
from doing that. But you think that we should not drop that because
there are still some users for that?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Michael Paquier
Дата:
On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
> Yeah, but that's not documented. So I don't think that we need to keep
> the backward-compatibility for that.
>
> Also in that case, non-fast promotion is triggered. Since my patch
> tries to remove non-fast promotion, it's intentional to prevent them
> from doing that. But you think that we should not drop that because
> there are still some users for that?

It would be good to ask around to folks maintaining HA solutions about
that change at least, as there could be a point in still letting
promotion to happen in this case, but switch silently to the fast
path.
--
Michael

Вложения

Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/21 15:36, Michael Paquier wrote:
> On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
>> Yeah, but that's not documented. So I don't think that we need to keep
>> the backward-compatibility for that.
>>
>> Also in that case, non-fast promotion is triggered. Since my patch
>> tries to remove non-fast promotion, it's intentional to prevent them
>> from doing that. But you think that we should not drop that because
>> there are still some users for that?
> 
> It would be good to ask around to folks maintaining HA solutions about
> that change at least, as there could be a point in still letting
> promotion to happen in this case, but switch silently to the fast
> path.

*If* there are some HA solutions doing that, IMO that they should be changed
so that the documented official way to trigger promotion (i.e., pg_ctl promote,
pg_promote or trigger_file) is used instead.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Kyotaro Horiguchi
Дата:
At Tue, 21 Apr 2020 15:48:02 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/04/21 15:36, Michael Paquier wrote:
> > On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
> >> Yeah, but that's not documented. So I don't think that we need to keep
> >> the backward-compatibility for that.
> >>
> >> Also in that case, non-fast promotion is triggered. Since my patch
> >> tries to remove non-fast promotion, it's intentional to prevent them
> >> from doing that. But you think that we should not drop that because
> >> there are still some users for that?
> > It would be good to ask around to folks maintaining HA solutions about
> > that change at least, as there could be a point in still letting
> > promotion to happen in this case, but switch silently to the fast
> > path.
> 
> *If* there are some HA solutions doing that, IMO that they should be
> *changed
> so that the documented official way to trigger promotion (i.e., pg_ctl
> promote,
> pg_promote or trigger_file) is used instead.

The difference between fast and non-fast promotions is far trivial
than the difference between promotion happens or not.  I think
everyone cares about the new version actually promotes by the steps
they have been doing, but few of them even notices the difference
between the fast and non-fast.  If those who are using non-fast
promotion for a certain reason should notice the change of promotion
behavior in release notes.

This is similar to the change of the default waiting behvaior of
pg_ctl at PG10.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Kyotaro Horiguchi
Дата:
At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Patch attached. I will add this into the first CF for v14.

-            if (!fast_promoted)
+            if (!promoted)
                 RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                                   CHECKPOINT_IMMEDIATE |
                                   CHECKPOINT_WAIT);

If we don't find the checkpoint record just before, we don't insert
End-Of-Recovery record then run an immediate chekpoint.  I think if we
nuke the non-fast promotion, shouldn't we insert the EOR record even
in that case?

Or, as Andres suggested upthread, do we always insert it?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/21 17:15, Kyotaro Horiguchi wrote:
> At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> Patch attached. I will add this into the first CF for v14.
> 
> -            if (!fast_promoted)
> +            if (!promoted)
>                   RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
>                                     CHECKPOINT_IMMEDIATE |
>                                     CHECKPOINT_WAIT);
> 
> If we don't find the checkpoint record just before, we don't insert
> End-Of-Recovery record then run an immediate chekpoint.  I think if we
> nuke the non-fast promotion, shouldn't we insert the EOR record even
> in that case?

I'm not sure if that's safe. What if the server crashes before the checkpoint
completes in that case? Since the last checkpoint record is not available,
the subsequent crash recovery will fail. This would lead to that the server
will never start up. Right? Currently ISTM that end-of-recovery-checkpoint
is executed to avoid such trouble in that case.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Jehan-Guillaume de Rorthais
Дата:
Hello,

On Tue, 21 Apr 2020 15:36:22 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Apr 21, 2020 at 03:29:54PM +0900, Fujii Masao wrote:
> > Yeah, but that's not documented. So I don't think that we need to keep
> > the backward-compatibility for that.
> > 
> > Also in that case, non-fast promotion is triggered. Since my patch
> > tries to remove non-fast promotion, it's intentional to prevent them
> > from doing that. But you think that we should not drop that because
> > there are still some users for that?  
> 
> It would be good to ask around to folks maintaining HA solutions about
> that change at least, as there could be a point in still letting
> promotion to happen in this case, but switch silently to the fast
> path.

FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.

Regards,



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Alvaro Herrera
Дата:
On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:

> On Tue, 21 Apr 2020 15:36:22 +0900
> Michael Paquier <michael@paquier.xyz> wrote:

> > > Also in that case, non-fast promotion is triggered. Since my patch
> > > tries to remove non-fast promotion, it's intentional to prevent them
> > > from doing that. But you think that we should not drop that because
> > > there are still some users for that?  
> > 
> > It would be good to ask around to folks maintaining HA solutions about
> > that change at least, as there could be a point in still letting
> > promotion to happen in this case, but switch silently to the fast
> > path.
> 
> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.

AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0 (released
in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres 10,
so it seems fairly safe to assume that the removal won't be a problem.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Alvaro Herrera
Дата:
On 2020-Apr-20, Fujii Masao wrote:

> +    /*
> +     * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> +     * signal handler. It now leaves the file in place and lets the
> +     * Startup process do the unlink.
> +     */
> +    if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>      {
> -        /*
> -         * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
> -         * signal handler. It now leaves the file in place and lets the
> -         * Startup process do the unlink. This allows Startup to know whether
> -         * it should create a full checkpoint before starting up (fallback
> -         * mode). Fast promotion takes precedence.
> -         */

It seems pointless to leave a very old comment that documents what the
code no longer does.  I thikn it would be better to reword it indicating
what the code does do, ie. something like "Leave the signal file in
place; it will be removed by the startup process when ..."

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Kyotaro Horiguchi
Дата:
At Tue, 21 Apr 2020 22:08:56 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/04/21 17:15, Kyotaro Horiguchi wrote:
> > At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote in
> >> Patch attached. I will add this into the first CF for v14.
> > -            if (!fast_promoted)
> > +            if (!promoted)
> >                   RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
> >                                     CHECKPOINT_IMMEDIATE |
> >                                     CHECKPOINT_WAIT);
> > If we don't find the checkpoint record just before, we don't insert
> > End-Of-Recovery record then run an immediate chekpoint.  I think if we
> > nuke the non-fast promotion, shouldn't we insert the EOR record even
> > in that case?
> 
> I'm not sure if that's safe. What if the server crashes before the
> checkpoint
> completes in that case? Since the last checkpoint record is not
> available,
> the subsequent crash recovery will fail. This would lead to that the
> server
> will never start up. Right? Currently ISTM that

Yes, that's right.

> end-of-recovery-checkpoint
> is executed to avoid such trouble in that case.

I meant that we always have EOR at the end of recovery.  So in the
missing latest checkpoint (and crash recovery) case, we insert EOR
after the immediate checkpoint. That also means we no longer set
CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Ian Barwick
Дата:
On 2020/04/22 6:53, Alvaro Herrera wrote:
> On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:
> 
>> On Tue, 21 Apr 2020 15:36:22 +0900
>> Michael Paquier <michael@paquier.xyz> wrote:
> 
>>>> Also in that case, non-fast promotion is triggered. Since my patch
>>>> tries to remove non-fast promotion, it's intentional to prevent them
>>>> from doing that. But you think that we should not drop that because
>>>> there are still some users for that?
>>>
>>> It would be good to ask around to folks maintaining HA solutions about
>>> that change at least, as there could be a point in still letting
>>> promotion to happen in this case, but switch silently to the fast
>>> path.
>>
>> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.
> 
> AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0 (released
> in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres 10,
> so it seems fairly safe to assume that the removal won't be a problem.

Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available), and
won't be affected by this change.


Regards

Ian Barwick


-- 
Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Kyotaro Horiguchi
Дата:
At Wed, 22 Apr 2020 10:28:07 +0900, Ian Barwick <ian.barwick@2ndquadrant.com> wrote in 
> On 2020/04/22 6:53, Alvaro Herrera wrote:
> > On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:
> > 
> >> On Tue, 21 Apr 2020 15:36:22 +0900
> >> Michael Paquier <michael@paquier.xyz> wrote:
> > 
> >>>> Also in that case, non-fast promotion is triggered. Since my patch
> >>>> tries to remove non-fast promotion, it's intentional to prevent them
> >>>> from doing that. But you think that we should not drop that because
> >>>> there are still some users for that?
> >>>
> >>> It would be good to ask around to folks maintaining HA solutions about
> >>> that change at least, as there could be a point in still letting
> >>> promotion to happen in this case, but switch silently to the fast
> >>> path.
> >>
> >> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.
> > AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0
> > (released
> > in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres
> > 10,
> > so it seems fairly safe to assume that the removal won't be a problem.
> 
> Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available),
> and
> won't be affected by this change.

For the record, the pgsql resource agent uses "pg_ctl promote" and
working with fast-promote.  Auxiliary tools for it is assuming
fast-promote.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/22 6:57, Alvaro Herrera wrote:
> On 2020-Apr-20, Fujii Masao wrote:
> 
>> +    /*
>> +     * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
>> +     * signal handler. It now leaves the file in place and lets the
>> +     * Startup process do the unlink.
>> +     */
>> +    if (IsPromoteSignaled() && stat(PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>>       {
>> -        /*
>> -         * In 9.1 and 9.2 the postmaster unlinked the promote file inside the
>> -         * signal handler. It now leaves the file in place and lets the
>> -         * Startup process do the unlink. This allows Startup to know whether
>> -         * it should create a full checkpoint before starting up (fallback
>> -         * mode). Fast promotion takes precedence.
>> -         */
> 
> It seems pointless to leave a very old comment that documents what the
> code no longer does.  I thikn it would be better to reword it indicating
> what the code does do, ie. something like "Leave the signal file in
> place; it will be removed by the startup process when ..."

Agreed. And, while reading the related code, I thought that it's more proper
to place this comment in CheckPromoteSignal() rather than
CheckForStandbyTrigger(). Because CheckPromoteSignal() actually does
what the comment says, i.e., leaves the promote signal file in place and
lets the startup process do the unlink.

Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/22 10:53, Kyotaro Horiguchi wrote:
> At Wed, 22 Apr 2020 10:28:07 +0900, Ian Barwick <ian.barwick@2ndquadrant.com> wrote in
>> On 2020/04/22 6:53, Alvaro Herrera wrote:
>>> On 2020-Apr-21, Jehan-Guillaume de Rorthais wrote:
>>>
>>>> On Tue, 21 Apr 2020 15:36:22 +0900
>>>> Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>>>>> Also in that case, non-fast promotion is triggered. Since my patch
>>>>>> tries to remove non-fast promotion, it's intentional to prevent them
>>>>>> from doing that. But you think that we should not drop that because
>>>>>> there are still some users for that?
>>>>>
>>>>> It would be good to ask around to folks maintaining HA solutions about
>>>>> that change at least, as there could be a point in still letting
>>>>> promotion to happen in this case, but switch silently to the fast
>>>>> path.
>>>>
>>>> FWIW, PAF relies on pg_ctl promote. No need for non-fast promotion.
>>> AFAICT repmgr uses 'pg_ctl promote', and has since version 3.0
>>> (released
>>> in mid 2015).  It was only 3.3.2 (mid 2017) that supported Postgres
>>> 10,
>>> so it seems fairly safe to assume that the removal won't be a problem.
>>
>> Correct, repmgr uses "pg_ctl promote" or pg_promote() (if available),
>> and
>> won't be affected by this change.
> 
> For the record, the pgsql resource agent uses "pg_ctl promote" and
> working with fast-promote.  Auxiliary tools for it is assuming
> fast-promote.

Thanks all for checking whether the change affects each HA solution!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/22 9:13, Kyotaro Horiguchi wrote:
> At Tue, 21 Apr 2020 22:08:56 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/04/21 17:15, Kyotaro Horiguchi wrote:
>>> At Mon, 20 Apr 2020 15:26:16 +0900, Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote in
>>>> Patch attached. I will add this into the first CF for v14.
>>> -            if (!fast_promoted)
>>> +            if (!promoted)
>>>                    RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
>>>                                      CHECKPOINT_IMMEDIATE |
>>>                                      CHECKPOINT_WAIT);
>>> If we don't find the checkpoint record just before, we don't insert
>>> End-Of-Recovery record then run an immediate chekpoint.  I think if we
>>> nuke the non-fast promotion, shouldn't we insert the EOR record even
>>> in that case?
>>
>> I'm not sure if that's safe. What if the server crashes before the
>> checkpoint
>> completes in that case? Since the last checkpoint record is not
>> available,
>> the subsequent crash recovery will fail. This would lead to that the
>> server
>> will never start up. Right? Currently ISTM that
> 
> Yes, that's right.
> 
>> end-of-recovery-checkpoint
>> is executed to avoid such trouble in that case.
> 
> I meant that we always have EOR at the end of recovery.  So in the
> missing latest checkpoint (and crash recovery) case, we insert EOR
> after the immediate checkpoint. That also means we no longer set
> CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.

Could you tell me what the benefit by this change is? Even with this change,
the server still needs to wait for the checkpoint to complete before
becoming the master and starting the service, unlike fast promotion. No?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Kyotaro Horiguchi
Дата:
At Wed, 22 Apr 2020 11:51:42 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > I meant that we always have EOR at the end of recovery.  So in the
> > missing latest checkpoint (and crash recovery) case, we insert EOR
> > after the immediate checkpoint. That also means we no longer set
> > CHECKPOINT_END_OF_RECOVERY to the checkpoint, too.
> 
> Could you tell me what the benefit by this change is? Even with this
> change,
> the server still needs to wait for the checkpoint to complete before
> becoming the master and starting the service, unlike fast
> promotion. No?

There's no benefit of performance.  It's just for simplicity by
signalling end-of-recovery in a unified way.

Long ago, we had only non-fast promotion, which is marked by
CHECKPOINT_END_OF_RECOVERY.  When we introduced fast-promotion, it is
marked by the END_OF_RECOVERY record since checkpoint record is not
inserted at the promotion time. However, we internally fall back to
non-fast promotion when we need to make a checkpoint immediately.
If we remove non-fast checkpoint, we don't need two means to signal
end-of-recovery.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Jehan-Guillaume de Rorthais
Дата:
On Wed, 22 Apr 2020 11:51:15 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> On 2020/04/22 10:53, Kyotaro Horiguchi wrote:
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> 
> Thanks all for checking whether the change affects each HA solution!

Unless I'm wrong, we don't have feedback from Patroni team.

I did some quick grep and it seems to rely on "pg_ctl promote" as well.
Moreover, the latest commit 80fbe9005 force a checkpoint right after the
promote. So I suppose they don't use non-fast promote.

I CC'ed Alexander Kukushkin to this discussion, so at least he is aware of
this topic.

Regards,



Re: Remove non-fast promotion Re: Should we remove a fallbackpromotion? take 2

От
Fujii Masao
Дата:

On 2020/04/23 3:56, Jehan-Guillaume de Rorthais wrote:
> On Wed, 22 Apr 2020 11:51:15 +0900
> Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
>> On 2020/04/22 10:53, Kyotaro Horiguchi wrote:
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>   [...]
>>
>> Thanks all for checking whether the change affects each HA solution!
> 
> Unless I'm wrong, we don't have feedback from Patroni team.
> 
> I did some quick grep and it seems to rely on "pg_ctl promote" as well.
> Moreover, the latest commit 80fbe9005 force a checkpoint right after the
> promote. So I suppose they don't use non-fast promote.

Thanks for checking that!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Should we remove a fallback promotion? take 2

От
Hamid Akhtar
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

I've applied the v2 patch on the master branch. There some hunks, but the patch got applied. So, I ran make
installcheck-worldand everything looks fine to me with this patch. Though, I do have a few suggestions in general:
 

(1) I see two functions being used (a) CheckPromoteSignal and (b) IsPromoteSignaled in the code. Should these be
combinedinto a single function and perhaps check for "promote_signaled" and the "PROMOTE_SIGNAL_FILE". Not sure if
doingthis will break "sigusr1_handler" in postmaster.c though.
 

(2) CheckPromoteSignal is checking for "PROMOTE_SIGNAL_FILE" file. So, perhaps, rather than calling stat on
"PROMOTE_SIGNAL_FILE"in if statements, I would suggest to use CheckPromoteSignal function instead as it does nothing
butstat on "PROMOTE_SIGNAL_FILE" (after applying your patch). 

The new status of this patch is: Waiting on Author

Re: Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:

On 2020/06/03 3:38, Hamid Akhtar wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            not tested
> 
> I've applied the v2 patch on the master branch. There some hunks, but the patch got applied. So, I ran make
installcheck-worldand everything looks fine to me with this patch. Though, I do have a few suggestions in general:
 

Thanks for the test and review!

> (1) I see two functions being used (a) CheckPromoteSignal and (b) IsPromoteSignaled in the code. Should these be
combinedinto a single function and perhaps check for "promote_signaled" and the "PROMOTE_SIGNAL_FILE". Not sure if
doingthis will break "sigusr1_handler" in postmaster.c though.
 

I don't think we can do that simply. CheckPromoteSignal() can be called by
both postmaster and the startup process. OTOH, IsPromoteSignaled()
accesses the flag that can be set only in the startup process' signal handler,
i.e., it's intended to be called only by the startup process.

> (2) CheckPromoteSignal is checking for "PROMOTE_SIGNAL_FILE" file. So, perhaps, rather than calling stat on
"PROMOTE_SIGNAL_FILE"in if statements, I would suggest to use CheckPromoteSignal function instead as it does nothing
butstat on "PROMOTE_SIGNAL_FILE" (after applying your patch).
 

Yes, that's good idea. Attached is the updated version of the patch.
I replaced that stat() with CheckPromoteSignal(). Also I replaced
unlink(PROMOTE_SIGNAL_FILE) with RemovePromoteSignalFiles().

> The new status of this patch is: Waiting on Author

I will change the status back to Needs Review.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Should we remove a fallback promotion? take 2

От
Kyotaro Horiguchi
Дата:
At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> I will change the status back to Needs Review.

         record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
         if (record != NULL)
         {
-          fast_promoted = true;
+          promoted = true;

Even if we missed the last checkpoint record, we don't give up
promotion and continue fall-back promotion but the variable "promoted"
stays false. That is confusiong.

How about changing it to fallback_promotion, or some names with more
behavior-specific name like immediate_checkpoint_needed?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:

On 2020/06/03 12:06, Kyotaro Horiguchi wrote:
> At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> I will change the status back to Needs Review.

Thanks for the review!

>           record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
>           if (record != NULL)
>           {
> -          fast_promoted = true;
> +          promoted = true;
> 
> Even if we missed the last checkpoint record, we don't give up
> promotion and continue fall-back promotion but the variable "promoted"
> stays false. That is confusiong.
> 
> How about changing it to fallback_promotion, or some names with more
> behavior-specific name like immediate_checkpoint_needed?


I like doEndOfRecoveryCkpt or something, but I have no strong opinion
about that flag naming. So I'm ok with immediate_checkpoint_needed
if others also like that, too.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Should we remove a fallback promotion? take 2

От
Hamid Akhtar
Дата:
Applying the patch to the current master branch throws 9 hunks. AFAICT, the patch is good otherwise.

On Wed, Jun 3, 2020 at 3:20 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


On 2020/06/03 12:06, Kyotaro Horiguchi wrote:
> At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> I will change the status back to Needs Review.

Thanks for the review!

>           record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
>           if (record != NULL)
>           {
> -          fast_promoted = true;
> +          promoted = true;
>
> Even if we missed the last checkpoint record, we don't give up
> promotion and continue fall-back promotion but the variable "promoted"
> stays false. That is confusiong.
>
> How about changing it to fallback_promotion, or some names with more
> behavior-specific name like immediate_checkpoint_needed?


I like doEndOfRecoveryCkpt or something, but I have no strong opinion
about that flag naming. So I'm ok with immediate_checkpoint_needed
if others also like that, too.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus

Re: Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:

On 2020/07/27 17:53, Hamid Akhtar wrote:
> Applying the patch to the current master branch throws 9 hunks. AFAICT, the patch is good otherwise.

So you think that the patch can be marked as Ready for Committer?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Should we remove a fallback promotion? take 2

От
Hamid Akhtar
Дата:
There have been no real objections on this patch and it seems to work. So, IMHO, the only thing that needs to be done is perhaps update the patch so that it applies clearly on the master branch.

On Mon, Jul 27, 2020 at 9:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


On 2020/07/27 17:53, Hamid Akhtar wrote:
> Applying the patch to the current master branch throws 9 hunks. AFAICT, the patch is good otherwise.

So you think that the patch can be marked as Ready for Committer?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus

Re: Should we remove a fallback promotion? take 2

От
Fujii Masao
Дата:

On 2020/07/28 20:35, Hamid Akhtar wrote:
> There have been no real objections on this patch and it seems to work.

Thanks! So I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION