Обсуждение: Function to promote standby servers

Поиск
Список
Период
Сортировка

Function to promote standby servers

От
Laurenz Albe
Дата:
Providing SQL access for administrative tasks seems to be a
good thing, see ALTER SYSTEM and pg_reload_conf().

In that vein, I propose a function pg_promote() to promote
physical standby servers.

If there are no fundamental objections, I'll add it to the
next commitfest.

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Thu, Sep 20, 2018 at 07:59:02AM +0200, Laurenz Albe wrote:
> Providing SQL access for administrative tasks seems to be a
> good thing, see ALTER SYSTEM and pg_reload_conf().
>
> In that vein, I propose a function pg_promote() to promote
> physical standby servers.

No fundamental issues from me regarding the concept of being able to
trigger a promotion remotely, so +1.  Do we want this capability as well
for fallback_promote?  My gut tells me no, and that we ought to drop
this option at some point in the future, still that's worth mentioning.

You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c
could use it.
--
Michael

Вложения

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> > In that vein, I propose a function pg_promote() to promote
> > physical standby servers.
> 
> No fundamental issues from me regarding the concept of being able to
> trigger a promotion remotely, so +1.  Do we want this capability as well
> for fallback_promote?  My gut tells me no, and that we ought to drop
> this option at some point in the future, still that's worth mentioning.
> 
> You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c
> could use it.

Good, idea; updated patch attached.

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Masahiko Sawada
Дата:
On Thu, Oct 4, 2018 at 8:26 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Michael Paquier wrote:
> > > In that vein, I propose a function pg_promote() to promote
> > > physical standby servers.

+1

> >
> > No fundamental issues from me regarding the concept of being able to
> > trigger a promotion remotely, so +1.  Do we want this capability as well
> > for fallback_promote?  My gut tells me no, and that we ought to drop
> > this option at some point in the future, still that's worth mentioning.
> >
> > You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c
> > could use it.
>
> Good, idea; updated patch attached.
>

Maybe the patch needs regression tests for the new function. And I'd
suggest to make the function name more clear by changing to
pg_promote_server(), pg_promote_standby() and so on.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Masahiko Sawada wrote:
> Maybe the patch needs regression tests for the new function. And I'd
> suggest to make the function name more clear by changing to
> pg_promote_server(), pg_promote_standby() and so on.

Thanks for the review.

The attached patch has regression tests - I though it would be good
to change some of the existing tests that run standby promotion to
use the SQL function instead of pg_ctl.

I have left the name though -- as far as I can tell, "promote" has
no other meaning in PostgreSQL than standby promotion, and I believe
it is only good to be more verbose if that avoids confusion.

Yours,
Laurenz Albe

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Masahiko Sawada wrote:
> Maybe the patch needs regression tests for the new function. And I'd
> suggest to make the function name more clear by changing to
> pg_promote_server(), pg_promote_standby() and so on.

Thanks for the review.

The attached patch has regression tests - I though it would be good
to change some of the existing tests that run standby promotion to
use the SQL function instead of pg_ctl.

I have left the name though -- as far as I can tell, "promote" has
no other meaning in PostgreSQL than standby promotion, and I believe
it is only good to be more verbose if that avoids confusion.

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Mon, Oct 08, 2018 at 07:36:50PM +0200, Laurenz Albe wrote:
> The attached patch has regression tests - I though it would be good
> to change some of the existing tests that run standby promotion to
> use the SQL function instead of pg_ctl.
>
> I have left the name though -- as far as I can tell, "promote" has
> no other meaning in PostgreSQL than standby promotion, and I believe
> it is only good to be more verbose if that avoids confusion.

I am fine with pg_promote.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 7375a78ffc..3a1f49e83a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
>  /* File path names (all relative to $PGDATA) */
>  #define RECOVERY_COMMAND_FILE    "recovery.conf"
>  #define RECOVERY_COMMAND_DONE    "recovery.done"
> -#define PROMOTE_SIGNAL_FILE        "promote"
> -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"

Perhaps we could just move the whole set to xlog.h.

> +Datum
> +pg_promote(PG_FUNCTION_ARGS)
> +{
> +    FILE *promote_file;
> +
> +    if (!superuser())
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be superuser to promote standby servers")));

Let's remove this restriction at code level, and instead use
function-level ACLs so as it is possible to allow non-superusers to
trigger a promotion as well, say a dedicated role.  We could add a
system role for this purpose, but I am not sure that it is worth the
addition yet.

> +    while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> +    {
> +         sleep 1;
> +    }
> +    return;

This could go on infinitely, locking down buildfarm machines silently if
a commit is not careful.  What I would suggest to do instead is to not
touch PostgresNode.pm at all, and to just modify one test to trigger
promotion with the SQL function.  Then from the test, you should add a
comment that triggerring promotion from SQL is wanted instead of the
internal routine, and then please add a call to poll_query_until() in
the same way as what 6deb52b2 has removed.

As of now, this function returns true if promotion has been triggered,
and false if not.  However it seems to me that we should have something
more consistent with "pg_ctl promote", so there are actually three
possible states:
1) Node is in recovery, with promotion triggered.  pg_promote() returns
true in case of success in creating the trigger file.
2) Node is in recovery, with promotion triggered.  pg_promote() returns
false in case of failure in creating the trigger file.
3) Node is not in recovery, where I think a call to pg_promote should
trigger an error.  This is not handled in the patch.

An other thing which has value is to implement a "wait" mode for this
feature, or actually a nowait mode.  You could simply do what pg_ctl
does with its logic in get_control_dbstate() by looking at the control
file state.  I think that waiting for the promotion to happen should be
the default.

At the end this patch needs a bit more work.
--
Michael

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Mon, Oct 08, 2018 at 07:36:50PM +0200, Laurenz Albe wrote:
> The attached patch has regression tests - I though it would be good
> to change some of the existing tests that run standby promotion to
> use the SQL function instead of pg_ctl.
>
> I have left the name though -- as far as I can tell, "promote" has
> no other meaning in PostgreSQL than standby promotion, and I believe
> it is only good to be more verbose if that avoids confusion.

I am fine with pg_promote.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 7375a78ffc..3a1f49e83a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
>  /* File path names (all relative to $PGDATA) */
>  #define RECOVERY_COMMAND_FILE    "recovery.conf"
>  #define RECOVERY_COMMAND_DONE    "recovery.done"
> -#define PROMOTE_SIGNAL_FILE        "promote"
> -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"

Perhaps we could just move the whole set to xlog.h.

> +Datum
> +pg_promote(PG_FUNCTION_ARGS)
> +{
> +    FILE *promote_file;
> +
> +    if (!superuser())
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be superuser to promote standby servers")));

Let's remove this restriction at code level, and instead use
function-level ACLs so as it is possible to allow non-superusers to
trigger a promotion as well, say a dedicated role.  We could add a
system role for this purpose, but I am not sure that it is worth the
addition yet.

> +    while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> +    {
> +         sleep 1;
> +    }
> +    return;

This could go on infinitely, locking down buildfarm machines silently if
a commit is not careful.  What I would suggest to do instead is to not
touch PostgresNode.pm at all, and to just modify one test to trigger
promotion with the SQL function.  Then from the test, you should add a
comment that triggerring promotion from SQL is wanted instead of the
internal routine, and then please add a call to poll_query_until() in
the same way as what 6deb52b2 has removed.

As of now, this function returns true if promotion has been triggered,
and false if not.  However it seems to me that we should have something
more consistent with "pg_ctl promote", so there are actually three
possible states:
1) Node is in recovery, with promotion triggered.  pg_promote() returns
true in case of success in creating the trigger file.
2) Node is in recovery, with promotion triggered.  pg_promote() returns
false in case of failure in creating the trigger file.
3) Node is not in recovery, where I think a call to pg_promote should
trigger an error.  This is not handled in the patch.

An other thing which has value is to implement a "wait" mode for this
feature, or actually a nowait mode.  You could simply do what pg_ctl
does with its logic in get_control_dbstate() by looking at the control
file state.  I think that waiting for the promotion to happen should be
the default.

At the end this patch needs a bit more work.
--
Michael

Вложения

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > index 7375a78ffc..3a1f49e83a 100644
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
> >  /* File path names (all relative to $PGDATA) */
> >  #define RECOVERY_COMMAND_FILE    "recovery.conf"
> >  #define RECOVERY_COMMAND_DONE    "recovery.done"
> > -#define PROMOTE_SIGNAL_FILE        "promote"
> > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
> 
> Perhaps we could just move the whole set to xlog.h.

Done.

> > +Datum
> > +pg_promote(PG_FUNCTION_ARGS)
> > +{
> > +    FILE *promote_file;
> > +
> > +    if (!superuser())
> > +        ereport(ERROR,
> > +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +                 errmsg("must be superuser to promote standby servers")));
> 
> Let's remove this restriction at code level, and instead use
> function-level ACLs so as it is possible to allow non-superusers to
> trigger a promotion as well, say a dedicated role.  We could add a
> system role for this purpose, but I am not sure that it is worth the
> addition yet.

Agreed, done.

> > +    while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> > +    {
> > +         sleep 1;
> > +    }
> > +    return;
> 
> This could go on infinitely, locking down buildfarm machines silently if
> a commit is not careful.  What I would suggest to do instead is to not
> touch PostgresNode.pm at all, and to just modify one test to trigger
> promotion with the SQL function.  Then from the test, you should add a
> comment that triggerring promotion from SQL is wanted instead of the
> internal routine, and then please add a call to poll_query_until() in
> the same way as what 6deb52b2 has removed.

I have modified recovery/t/004_timeline_switch.pl and recovery/t/009_twophase.pl
accordingly: one calls the function with "wait => true", the other
uses "wait => false" and waits as you suggested.

> As of now, this function returns true if promotion has been triggered,
> and false if not.  However it seems to me that we should have something
> more consistent with "pg_ctl promote", so there are actually three
> possible states:
> 1) Node is in recovery, with promotion triggered.  pg_promote() returns
> true in case of success in creating the trigger file.
> 2) Node is in recovery, with promotion triggered.  pg_promote() returns
> false in case of failure in creating the trigger file.
> 3) Node is not in recovery, where I think a call to pg_promote should
> trigger an error.  This is not handled in the patch.

So far I had returned "false" in the last case, but I am fine with
throwing an error in that case.  Done.

> An other thing which has value is to implement a "wait" mode for this
> feature, or actually a nowait mode.  You could simply do what pg_ctl
> does with its logic in get_control_dbstate() by looking at the control
> file state.  I think that waiting for the promotion to happen should be
> the default.

I have implemented that.

> At the end this patch needs a bit more work.

Yes, it did.  Thanks for the thorough review!

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Ashwin Agrawal
Дата:

Just curious to know, is promotion via function only allowed for hot-standby or works for any standby?

On Mon, Oct 15, 2018 at 7:16 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Michael Paquier wrote:
> > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > index 7375a78ffc..3a1f49e83a 100644
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
> >  /* File path names (all relative to $PGDATA) */
> >  #define RECOVERY_COMMAND_FILE      "recovery.conf"
> >  #define RECOVERY_COMMAND_DONE      "recovery.done"
> > -#define PROMOTE_SIGNAL_FILE                "promote"
> > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
>
> Perhaps we could just move the whole set to xlog.h.

Done.

> > +Datum
> > +pg_promote(PG_FUNCTION_ARGS)
> > +{
> > +   FILE *promote_file;
> > +
> > +   if (!superuser())
> > +           ereport(ERROR,
> > +                           (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +                            errmsg("must be superuser to promote standby servers")));
>
> Let's remove this restriction at code level, and instead use
> function-level ACLs so as it is possible to allow non-superusers to
> trigger a promotion as well, say a dedicated role.  We could add a
> system role for this purpose, but I am not sure that it is worth the
> addition yet.

Agreed, done.

> > +   while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> > +   {
> > +            sleep 1;
> > +   }
> > +   return;
>
> This could go on infinitely, locking down buildfarm machines silently if
> a commit is not careful.  What I would suggest to do instead is to not
> touch PostgresNode.pm at all, and to just modify one test to trigger
> promotion with the SQL function.  Then from the test, you should add a
> comment that triggerring promotion from SQL is wanted instead of the
> internal routine, and then please add a call to poll_query_until() in
> the same way as what 6deb52b2 has removed.

I have modified recovery/t/004_timeline_switch.pl and recovery/t/009_twophase.pl
accordingly: one calls the function with "wait => true", the other
uses "wait => false" and waits as you suggested.

> As of now, this function returns true if promotion has been triggered,
> and false if not.  However it seems to me that we should have something
> more consistent with "pg_ctl promote", so there are actually three
> possible states:
> 1) Node is in recovery, with promotion triggered.  pg_promote() returns
> true in case of success in creating the trigger file.
> 2) Node is in recovery, with promotion triggered.  pg_promote() returns
> false in case of failure in creating the trigger file.
> 3) Node is not in recovery, where I think a call to pg_promote should
> trigger an error.  This is not handled in the patch.

So far I had returned "false" in the last case, but I am fine with
throwing an error in that case.  Done.

> An other thing which has value is to implement a "wait" mode for this
> feature, or actually a nowait mode.  You could simply do what pg_ctl
> does with its logic in get_control_dbstate() by looking at the control
> file state.  I think that waiting for the promotion to happen should be
> the default.

I have implemented that.

> At the end this patch needs a bit more work.

Yes, it did.  Thanks for the thorough review!

Yours,
Laurenz Albe

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Ashwin Agrawal wrote:
> Just curious to know, is promotion via function only allowed for
> hot-standby or works for any standby?

Only for hot standby - how do you want to execute a function on a standby
server that doesn't allow connections?

Yours,
Laurenz Albe



Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Tue, Oct 16, 2018 at 09:49:20AM +0200, Laurenz Albe wrote:
> Only for hot standby - how do you want to execute a function on a standby
> server that doesn't allow connections?

In most deployments hot standby will be enabled, and wal_level uses the
same value for archive and hot_standby for some time now, so that does
not sound like a huge issue to me.
--
Michael

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Mon, Oct 15, 2018 at 04:16:02PM +0200, Laurenz Albe wrote:
> Michael Paquier wrote:
>> Let's remove this restriction at code level, and instead use
>> function-level ACLs so as it is possible to allow non-superusers to
>> trigger a promotion as well, say a dedicated role.  We could add a
>> system role for this purpose, but I am not sure that it is worth the
>> addition yet.
>
> Agreed, done.

>> As of now, this function returns true if promotion has been triggered,
>> and false if not.  However it seems to me that we should have something
>> more consistent with "pg_ctl promote", so there are actually three
>> possible states:
>> 1) Node is in recovery, with promotion triggered.  pg_promote() returns
>> true in case of success in creating the trigger file.
>> 2) Node is in recovery, with promotion triggered.  pg_promote() returns
>> false in case of failure in creating the trigger file.
>> 3) Node is not in recovery, where I think a call to pg_promote should
>> trigger an error.  This is not handled in the patch.
>
> So far I had returned "false" in the last case, but I am fine with
> throwing an error in that case.  Done.

Thanks, that looks correct to me.  I think that consistency with pg_ctl
is quite important, as this is a feature present for many releases now.

>> At the end this patch needs a bit more work.
>
> Yes, it did.  Thanks for the thorough review!

+       /* wait for up to a minute for promotion */
+       for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i)
+       {
+               if (!RecoveryInProgress())
+                       PG_RETURN_BOOL(true);
+
+               pg_usleep(1000000L / WAITS_PER_SECOND);
+       }
I would recommend to avoid pg_usleep and instead use a WaitLatch() or
similar to generate a wait event.  The wait can then also be seen in
pg_stat_activity, which is useful for monitoring purposes.  Using
RecoveryInProgress is indeed doable, and that's more simple than what I
thought first.

Something I missed to mention in the previous review: the timeout should
be manually enforceable, with a default at 60s.

Is the function marked as strict?  Per the code it should be, I am not
able to test now though.

You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in
system_views.sql, or any users could trigger a promotion, no?
--
Michael

Вложения

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> +       /* wait for up to a minute for promotion */
> +       for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i)
> +       {
> +               if (!RecoveryInProgress())
> +                       PG_RETURN_BOOL(true);
> +
> +               pg_usleep(1000000L / WAITS_PER_SECOND);
> +       }
> I would recommend to avoid pg_usleep and instead use a WaitLatch() or
> similar to generate a wait event.  The wait can then also be seen in
> pg_stat_activity, which is useful for monitoring purposes.  Using
> RecoveryInProgress is indeed doable, and that's more simple than what I
> thought first.

Agreed, done.

I have introduced a new wait event, because I couldn't find one that fit.

> Something I missed to mention in the previous review: the timeout should
> be manually enforceable, with a default at 60s.

Ok, added as a new parameter "wait_seconds".

> Is the function marked as strict?  Per the code it should be, I am not
> able to test now though.

Yes, it is.

> You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in
> system_views.sql, or any users could trigger a promotion, no?

You are right *blush*.
Fixed.

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
I wrote:
> Fixed.

Here is another version, with a fix in pg_proc.dat, an improved comment
and "wait_seconds" exercised in the regression test.

Yours,
Laurenz Albe

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote:
> Here is another version, with a fix in pg_proc.dat, an improved comment
> and "wait_seconds" exercised in the regression test.

Thanks for the new version.  This looks pretty good to me.  I'll see if
I can review it once and then commit.

> -    WAIT_EVENT_SYNC_REP
> +    WAIT_EVENT_SYNC_REP,
> +    WAIT_EVENT_PROMOTE
>  } WaitEventIPC;

Those are kept in alphabetical order.  Individual wait events are also
documented with a description.
--
Michael

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Sat, Oct 20, 2018 at 01:48:56PM +0900, Michael Paquier wrote:
> On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote:
>> Here is another version, with a fix in pg_proc.dat, an improved comment
>> and "wait_seconds" exercised in the regression test.
>
> Thanks for the new version.  This looks pretty good to me.  I'll see if
> I can review it once and then commit.
>
>> -    WAIT_EVENT_SYNC_REP
>> +    WAIT_EVENT_SYNC_REP,
>> +    WAIT_EVENT_PROMOTE
>>  } WaitEventIPC;
>
> Those are kept in alphabetical order.  Individual wait events are also
> documented with a description.

Regarding the documentation, wouldn't it be more adapted to list the new
function under the section "Recovery Control Functions"?  Not only does
the new function signal the postmaster, but it also creates the
promotion trigger file.

Anyway, I have been looking in depth at the patch, and I finish with the
attached.  Here are some notes:
- pg_ctl returns an error if it cannot create the promote trigger file
and if it doesn't close it.  pg_promote should do the same.
- WL_TIMEOUT could have been reached, leading to no further retries
(remove for example the part related to the trigger file creation and
try to trigger the promotion, the wait time is incorrect).
- Documentation has been reworked.
- The new wait event is documented.
- a WARNING is returned if the signal to the postmaster is not sent,
which is something I think makes most sense as we do the same for other
signals.
- position including unistd.h was not correct in xlogfuncs.c.
- Timeouts for the tests are made a tad longer.  Some buildfarm machines
don't like a promotion wait of 10s.
- a catalog version bump is included, just a personal reminder..
- Indentatio has been run.

Laurenz, what do you think?
--
Michael

Вложения

Re: Function to promote standby servers

От
Masahiko Sawada
Дата:
On Mon, Oct 22, 2018 at 3:01 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Oct 20, 2018 at 01:48:56PM +0900, Michael Paquier wrote:
> > On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote:
> >> Here is another version, with a fix in pg_proc.dat, an improved comment
> >> and "wait_seconds" exercised in the regression test.
> >
> > Thanks for the new version.  This looks pretty good to me.  I'll see if
> > I can review it once and then commit.
> >
> >> -    WAIT_EVENT_SYNC_REP
> >> +    WAIT_EVENT_SYNC_REP,
> >> +    WAIT_EVENT_PROMOTE
> >>  } WaitEventIPC;
> >
> > Those are kept in alphabetical order.  Individual wait events are also
> > documented with a description.
>
> Regarding the documentation, wouldn't it be more adapted to list the new
> function under the section "Recovery Control Functions"?  Not only does
> the new function signal the postmaster, but it also creates the
> promotion trigger file.
>
> Anyway, I have been looking in depth at the patch, and I finish with the
> attached.  Here are some notes:
> - pg_ctl returns an error if it cannot create the promote trigger file
> and if it doesn't close it.  pg_promote should do the same.
> - WL_TIMEOUT could have been reached, leading to no further retries
> (remove for example the part related to the trigger file creation and
> try to trigger the promotion, the wait time is incorrect).
> - Documentation has been reworked.
> - The new wait event is documented.
> - a WARNING is returned if the signal to the postmaster is not sent,
> which is something I think makes most sense as we do the same for other
> signals.
> - position including unistd.h was not correct in xlogfuncs.c.
> - Timeouts for the tests are made a tad longer.  Some buildfarm machines
> don't like a promotion wait of 10s.
> - a catalog version bump is included, just a personal reminder..
> - Indentatio has been run.
>

Thank you for workig on this. There is one review comment for the latest patch.

+     if (FreeFile(promote_file))
+               ereport(ERROR,
+                               (errcode_for_file_access(),
+                                errmsg("could not write file \"%s\": %m",
+                                               PROMOTE_SIGNAL_FILE)));

Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Masahiko Sawada wrote:
> On Mon, Oct 22, 2018 at 3:01 PM Michael Paquier <michael@paquier.xyz> wrote:
> > Regarding the documentation, wouldn't it be more adapted to list the new
> > function under the section "Recovery Control Functions"?  Not only does
> > the new function signal the postmaster, but it also creates the
> > promotion trigger file.

Hmm, yes, that's probably the first place where people would look.
I guess the implementation lead me to categorize it as a "signaling function",
and it also wouldn't be wrong there.

> > Anyway, I have been looking in depth at the patch, and I finish with the
> > attached.  Here are some notes:
> > [...]
> > - WL_TIMEOUT could have been reached, leading to no further retries
> > (remove for example the part related to the trigger file creation and
> > try to trigger the promotion, the wait time is incorrect).

Ok.

> > - Documentation has been reworked.
> > - The new wait event is documented.

Thanks.

> > - position including unistd.h was not correct in xlogfuncs.c.
> > - Timeouts for the tests are made a tad longer.  Some buildfarm machines
> > don't like a promotion wait of 10s.
> > - a catalog version bump is included, just a personal reminder..
> > - Indentatio has been run.

Thanks.

> Thank you for workig on this. There is one review comment for the latest patch.
> 
> +     if (FreeFile(promote_file))
> +               ereport(ERROR,
> +                               (errcode_for_file_access(),
> +                                errmsg("could not write file \"%s\": %m",
> +                                               PROMOTE_SIGNAL_FILE)));
> 
> Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring.

Yes, that cannot hurt.

Yours,
Laurenz Albe



Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Mon, Oct 22, 2018 at 11:45:30AM +0200, Laurenz Albe wrote:
> Masahiko Sawada wrote:
>> Thank you for workig on this. There is one review comment for the latest patch.
>>
>> +     if (FreeFile(promote_file))
>> +               ereport(ERROR,
>> +                               (errcode_for_file_access(),
>> +                                errmsg("could not write file \"%s\": %m",
>> +                                               PROMOTE_SIGNAL_FILE)));
>>
>> Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring.
>
> Yes, that cannot hurt.

If FreeFile() fails, unlink() would most likely fail for the same
reason.  Please note that if unlink() happens before issuing the ERROR,
saving errno would be necessary.  That's not a huge issue anyway, if a
failure happens, the operator would retry the operation.  If there is a
crash, the file gets removed at the end of recovery.  If there are no
objections, I'll look at this patch again by the end of this week in
order to get it committed.
--
Michael

Вложения

Re: Function to promote standby servers

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> If there are no
> objections, I'll look at this patch again by the end of this week in
> order to get it committed.

No objections from me; on the contrary, I would like to thank you for
your effort here.  I appreciate that you probably spent more time
tutoring me than it would have taken you to write this yourself.

Yours,
Laurenz Albe



Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Tue, Oct 23, 2018 at 09:42:16AM +0200, Laurenz Albe wrote:
> No objections from me; on the contrary, I would like to thank you for
> your effort here.  I appreciate that you probably spent more time
> tutoring me than it would have taken you to write this yourself.

You're welcome.  Happy that it helped.
--
Michael

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Wed, Oct 24, 2018 at 08:50:43AM +0900, Michael Paquier wrote:
> On Tue, Oct 23, 2018 at 09:42:16AM +0200, Laurenz Albe wrote:
> > No objections from me; on the contrary, I would like to thank you for
> > your effort here.  I appreciate that you probably spent more time
> > tutoring me than it would have taken you to write this yourself.
>
> You're welcome.  Happy that it helped.

And committed.  I double-checked the code, and tweaked a bit the tests
so as the test using wait_mode = false is removed as it did not seem
worth the extra cycles.  I also added a check on the return value of
pg_promote when using the wait mode.  Another thing was that the
function needs to be parallel-restricted.
--
Michael

Вложения

Re: Function to promote standby servers

От
Ian Barwick
Дата:
Hi

On 10/25/2018 09:47 AM, Michael Paquier wrote:
> And committed.  I double-checked the code, and tweaked a bit the tests
> so as the test using wait_mode = false is removed as it did not seem
> worth the extra cycles.  I also added a check on the return value of
> pg_promote when using the wait mode.  Another thing was that the
> function needs to be parallel-restricted.

Documentation for this [*] says "Returns true if promotion is successful and false otherwise",
which is not correct if "wait" is false, as it will always return TRUE.

[*] https://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL

Attached patch contains a suggested rewording to clarify this.


Regards

Ian Barwick

-- 
  Ian Barwick                   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Function to promote standby servers

От
Ian Barwick
Дата:
Hi

On 10/25/2018 09:47 AM, Michael Paquier wrote:
> And committed.  I double-checked the code, and tweaked a bit the tests
> so as the test using wait_mode = false is removed as it did not seem
> worth the extra cycles.  I also added a check on the return value of
> pg_promote when using the wait mode.  Another thing was that the
> function needs to be parallel-restricted.

Documentation for this [*] says "Returns true if promotion is successful and false otherwise",
which is not correct if "wait" is false, as it will always return TRUE.

[*] https://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL

Attached patch contains a suggested rewording to clarify this.


Regards

Ian Barwick

-- 
   Ian Barwick                   http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services


Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Fri, Oct 26, 2018 at 11:36:00AM +0900, Ian Barwick wrote:
> Documentation for this [*] says "Returns true if promotion is
> successful and false otherwise", which is not correct if "wait" is
> false, as it will always return TRUE.

Yes, in the case where the promotion has been initiated.

> +        <literal>false</literal> otherwise. If <parameter>wait</parameter>
> +        is set to <literal>false</literal>, the function returns <literal>true</literal>
> +        immediately after sending the promotion signal to the postmaster.

Or we could use "the function returns true immediately after initiating
the promotion by sending the promotion signal to the postmaster"?  As a
native speaker which one feels more natural to you?
--
Michael

Вложения

Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Fri, Oct 26, 2018 at 01:51:24PM +0900, Michael Paquier wrote:
> Or we could use "the function returns true immediately after initiating
> the promotion by sending the promotion signal to the postmaster"?  As a
> native speaker which one feels more natural to you?

Sorry for the time it took.  After pondering on it, I have committed the
improvements from your version.
--
Michael

Вложения

Re: Function to promote standby servers

От
Ian Barwick
Дата:
On 11/19/2018 01:22 PM, Michael Paquier wrote:
> On Fri, Oct 26, 2018 at 01:51:24PM +0900, Michael Paquier wrote:
>> Or we could use "the function returns true immediately after initiating
>> the promotion by sending the promotion signal to the postmaster"?  As a
>> native speaker which one feels more natural to you?
> 
> Sorry for the time it took.  After pondering on it, I have committed the
> improvements from your version.

Thanks, looks good (and apologies for the delay in responding from my
side).


Regards


Ian Barwick


-- 
  Ian Barwick                   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: Function to promote standby servers

От
Michael Paquier
Дата:
On Wed, Nov 28, 2018 at 10:06:34AM +0900, Ian Barwick wrote:
> Thanks, looks good (and apologies for the delay in responding from my
> side).

Thanks for double-checking, Ian.  I took my time as well ;)

(Hope to see you face-to-face in a couple of days around Akihabara.)
--
Michael

Вложения