Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Дата
Msg-id CAB7nPqQX+mvOQL9_eBS2Bi4wN8zxiO_8UQoGgNFviRTdG3KBjA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code  (Michael Paquier <michael.paquier@gmail.com>)
Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Sat, Jan 17, 2015 at 2:44 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-05 17:16:43 +0900, Michael Paquier wrote:
>> +     <varlistentry id="restore-command-retry-interval" xreflabel="restore_command_retry_interval">
>> +      <term><varname>restore_command_retry_interval</varname> (<type>integer</type>)
>> +      <indexterm>
>> +        <primary><varname>restore_command_retry_interval</> recovery parameter</primary>
>> +      </indexterm>
>> +      </term>
>> +      <listitem>
>> +       <para>
>> +        If <varname>restore_command</> returns nonzero exit status code, retry
>> +        command after the interval of time specified by this parameter,
>> +        measured in milliseconds if no unit is specified. Default value is
>> +        <literal>5s</>.
>> +       </para>
>> +       <para>
>> +        This command is particularly useful for warm standby configurations
>> +        to leverage the amount of retries done to restore a WAL segment file
>> +        depending on the activity of a master node.
>> +       </para>
>
> Don't really like this paragraph. What's "leveraging the amount of
> retries done" supposed to mean?
Actually I have reworked the whole paragraph with the renaming of the
parameter. Hopefully that's more clear.

>> +# restore_command_retry_interval
>> +#
>> +# specifies an optional retry interval for restore_command command, if
>> +# previous command returned nonzero exit status code. This can be useful
>> +# to increase or decrease the number of calls of restore_command.
> It's not really the number  but frequency, right?
Sure.

>> +             else if (strcmp(item->name, "restore_command_retry_interval") == 0)
>> +             {
>> +                     const char *hintmsg;
>> +
>> +                     if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
>> +                                                &hintmsg))
>> +                             ereport(ERROR,
>> +                                             (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                                              errmsg("parameter \"%s\" requires a temporal value",
>> +                                                             "restore_command_retry_interval"),
>> +                                              hintmsg ? errhint("%s", _(hintmsg)) : 0));
>
> "temporal value" sounds odd to my ears. I'd rather keep in line with
> what guc.c outputs in such a case.
Yeah, I have been pondering about that a bit and I think that
wal_availability_check_interval is the closest thing as we want to
check if WAL is available for a walreceiver at record level and at
segment level for a restore_command.

> Am I missing something or does this handle/affect streaming failures
> just the same? That might not be a bad idea, because the current
> interval is far too long for e.g. a sync replication setup. But it
> certainly makes the name a complete misnomer.
Hm.

> Not this patch's fault, but I'm getting a bit tired seeing the above open
> coded. How about adding a function that does the sleeping based on a
> timestamptz and a ms interval?
You mean in plugins, right? I don't recall seeing similar patterns in
other code paths of backend. But I think that we can use something
like that in timestamp.c then because we need to leverage that between
two timestamps, the last failure and now():
TimestampSleepDifference(start_time, stop_time, internal_ms);
Perhaps you have something else in mind?

Attached is an updated patch.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: disallow operator "=>" and use it for named parameters
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code