Re: Proposal for changes to recovery.conf API

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Proposal for changes to recovery.conf API
Дата
Msg-id 20161112160949.xso4is7r5n2t25vq@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Proposal for changes to recovery.conf API  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Ответы Re: Proposal for changes to recovery.conf API  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2016-11-01 05:14:58 +0530, Abhijit Menon-Sen wrote:
> Subject: Convert recovery.conf settings to GUCs

I really want to get this in, we've been back and forth about this for
far too long.

>     <para>
>-     Create a recovery command file <filename>recovery.conf</> in the cluster
>-     data directory (see <xref linkend="recovery-config">). You might
>+     Set up recovery parameters in <filename>postgresql.conf</> (see
>+     <xref linkend="runtime-config-wal-archive-recovery"> and
>+     <xref linkend="runtime-config-wal-recovery-target">).
>+    </para>
>+   </listitem>
>+   <listitem>
>+    <para>
>+     Create a recovery trigger file <filename>recovery.trigger</>
>+     in the cluster data directory. You might
>      also want to temporarily modify <filename>pg_hba.conf</> to prevent
>      ordinary users from connecting until you are sure the recovery was successful.
>     </para>

I think this means we need to rephrase a number of docs and code
references to trigger files, because they'll currently often refer to
the promotion trigger, no?


> diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
> index 7a16751..15ce784 100644
> --- a/src/backend/access/transam/recovery.conf.sample
> +++ b/src/backend/access/transam/recovery.conf.sample
> @@ -2,23 +2,14 @@
>  # PostgreSQL recovery config file
>  # -------------------------------
>  #
> -# Edit this file to provide the parameters that PostgreSQL needs to
> -# perform an archive recovery of a database, or to act as a replication
> -# standby.
> +# PostgreSQL 10.0 or later, recovery.conf is no longer used. Instead,
> +# specify all recovery parameters in postgresql.conf and create
> +# recovery.trigger to enter archive recovery or standby mode.
>  #
> -# If "recovery.conf" is present in the PostgreSQL data directory, it is
> -# read on postmaster startup.  After successful recovery, it is renamed
> -# to "recovery.done" to ensure that we do not accidentally re-enter
> -# archive recovery or standby mode.
> +# If you must use recovery.conf, specify "include directives" in
> +# postgresql.conf like this:
>  #
> -# This file consists of lines of the form:
> -#
> -#   name = value
> -#
> -# Comments are introduced with '#'.
> -#
> -# The complete list of option names and allowed values can be found
> -# in the PostgreSQL documentation.
> +#   include 'recovery.conf'

This should probably warn about the differences in "trigger" behaviour
of recovery.conf being present.

>  #---------------------------------------------------------------------------
>  # ARCHIVE RECOVERY PARAMETERS
> @@ -131,14 +122,6 @@
>  #
>  #primary_slot_name = ''

I don't think it makes much sense to still list all this?


> +bool        standby_mode = false;

I'm very tempted to rename this during the move to GUCs.


> +char       *primary_conninfo = NULL;
> +char       *primary_slot_name = NULL;

Slightly less so, but still tempted to also rename these. They're not
actually necessarily pointing towards a primary, and namespace-wise
they're not grouped with recovery_*, which has become more important now
that recovery.conf isn't a separate namespace anymore.

> -static int    recovery_min_apply_delay = 0;
>  static TimestampTz recoveryDelayUntilTime;
>
> +static TimestampTz recoveryDelayUntilTime;
> +

Isn't this a repeated definition?

>  /*
>   * During normal operation, the only timeline we care about is ThisTimeLineID.
>   * During recovery, however, things are more complicated.  To simplify life
> @@ -577,10 +578,10 @@ typedef struct XLogCtlData
>      TimeLineID    PrevTimeLineID;
>
>      /*
> -     * archiveCleanupCommand is read from recovery.conf but needs to be in
> -     * shared memory so that the checkpointer process can access it.
> +     * archive_cleanup_command is read from recovery.conf but needs to
> +     * be in shared memory so that the checkpointer process can access it.
>       */

References recovery.conf. It'd be a good idea to grep for all remaining
references to recovery.conf.


> +static bool
> +check_recovery_target(char **newval, void **extra, GucSource source)
> +{
> +    RecoveryTargetType *myextra;
> +    RecoveryTargetType rt = RECOVERY_TARGET_UNSET;
> +
> +    if (strcmp(*newval, "xid") == 0)
> +        rt = RECOVERY_TARGET_XID;
> +    else if (strcmp(*newval, "time") == 0)
> +        rt = RECOVERY_TARGET_TIME;
> +    else if (strcmp(*newval, "name") == 0)
> +        rt = RECOVERY_TARGET_NAME;
> +    else if (strcmp(*newval, "lsn") == 0)
> +        rt = RECOVERY_TARGET_LSN;
> +    else if (strcmp(*newval, "immediate") == 0)
> +        rt = RECOVERY_TARGET_IMMEDIATE;
> +    else if (strcmp(*newval, "") != 0)
> +    {
> +        GUC_check_errdetail("recovery_target is not valid: \"%s\"", *newval);
> +        return false;
> +    }
> +
> +    myextra = (RecoveryTargetType *) guc_malloc(ERROR, sizeof(RecoveryTargetType));
> +    *myextra = rt;
> +    *extra = (void *) myextra;
> +
> +    return true;
> +}

Shouldn't this instead be an enum type GUC?

> +static bool
> +check_recovery_target_time(char **newval, void **extra, GucSource source)
> +{
> +    TimestampTz     time;
> +    TimestampTz     *myextra;
> +    MemoryContext oldcontext = CurrentMemoryContext;
> +
> +    PG_TRY();
> +    {
> +        time = (strcmp(*newval, "") == 0) ?
> +            0 :
> +            DatumGetTimestampTz(DirectFunctionCall3(timestamptz_in,
> +                                                    CStringGetDatum(*newval),
> +                                                    ObjectIdGetDatum(InvalidOid),
> +                                                    Int32GetDatum(-1)));
> +    }
> +    PG_CATCH();
> +    {
> +        ErrorData  *edata;
> +
> +        /* Save error info */
> +        MemoryContextSwitchTo(oldcontext);
> +        edata = CopyErrorData();
> +        FlushErrorState();
> +
> +        /* Pass the error message */
> +        GUC_check_errdetail("%s", edata->message);
> +        FreeErrorData(edata);
> +        return false;
> +    }
> +    PG_END_TRY();

Hm, I'm not happy to do that kind of thing. What if there's ever any
database interaction added to timestamptz_in?

It's also problematic because the parsing of timestamps depends on the
timezone GUC - which might not yet have taken effect...


> +static bool
> +check_recovery_target_lsn(char **newval, void **extra, GucSource source)
> +{
> +    XLogRecPtr        lsn;
> +    XLogRecPtr       *myextra;
> +    MemoryContext oldcontext = CurrentMemoryContext;
> +
> +    PG_TRY();
> +    {
> +        lsn = (strcmp(*newval, "") == 0) ?
> +            InvalidXLogRecPtr :
> +            DatumGetLSN(DirectFunctionCall3(pg_lsn_in,
> +                                            CStringGetDatum(*newval),
> +                                            ObjectIdGetDatum(InvalidOid),
> +                                            Int32GetDatum(-1)));
> +    }
> +    PG_CATCH();
> +    {
> +        ErrorData  *edata;
> +
> +        /* Save error info */
> +        MemoryContextSwitchTo(oldcontext);
> +        edata = CopyErrorData();
> +        FlushErrorState();
> +
> +        /* Pass the error message */
> +        GUC_check_errdetail("%s", edata->message);
> +        FreeErrorData(edata);
> +        return false;
> +    }
> +    PG_END_TRY();

That seems like a pretty pointless use of pg_lsn_in, given the problems
that PG_CATCHing an error without rethrowing brings.


> +static bool
> +check_recovery_target_action(char **newval, void **extra, GucSource source)
> +{
> +    RecoveryTargetAction rta = RECOVERY_TARGET_ACTION_PAUSE;
> +    RecoveryTargetAction *myextra;
> +
> +    if (strcmp(*newval, "pause") == 0)
> +        rta = RECOVERY_TARGET_ACTION_PAUSE;
> +    else if (strcmp(*newval, "promote") == 0)
> +        rta = RECOVERY_TARGET_ACTION_PROMOTE;
> +    else if (strcmp(*newval, "shutdown") == 0)
> +        rta = RECOVERY_TARGET_ACTION_SHUTDOWN;
> +    else if (strcmp(*newval, "") != 0)
> +    {
> +        GUC_check_errdetail("recovery_target_action is not valid: \"%s\"", *newval);
> +        return false;
> +    }
> +
> +    myextra = (RecoveryTargetAction *) guc_malloc(ERROR, sizeof(RecoveryTargetAction));
> +    *myextra = rta;
> +    *extra = (void *) myextra;
> +
> +    return true;
> +}

Should be an enum.


> +static bool
> +check_primary_slot_name(char **newval, void **extra, GucSource source)
> +{
> +    if (strcmp(*newval,"") != 0 &&
> +        !ReplicationSlotValidateName(*newval, WARNING))
> +    {
> +        GUC_check_errdetail("primary_slot_name is not valid: \"%s\"", *newval);
> +        return false;
> +    }
> +
> +    return true;
> +}

ReplicationSlotValidateName will emit WARNINGs this way - hm. That'll
often loose detail about what the problem actually is, e.g. at server
startup.

> +#standby_mode = off            # "on" starts the server as a standby
> +                    # (change requires restart)
> +#primary_conninfo = ''            # connection string to connect to the master
> +#trigger_file = ''            # trigger file to promote the standby

Too general a name imo.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: move collation import to backend
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.