Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Дата
Msg-id 20150119165655.GG23811@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 2015-01-19 17:16:11 +0900, Michael Paquier wrote:
> On Mon, Jan 19, 2015 at 4:10 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Sat, Jan 17, 2015 at 2:44 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Not this patch's fault, but I'm getting a bit tired seeing the above open
> >> coded. How about adding a function that does the sleeping based on a
> >> timestamptz and a ms interval?
> > You mean in plugins, right? I don't recall seeing similar patterns in
> > other code paths of backend. But I think that we can use something
> > like that in timestamp.c then because we need to leverage that between
> > two timestamps, the last failure and now():
> > TimestampSleepDifference(start_time, stop_time, internal_ms);
> > Perhaps you have something else in mind?
> >
> > Attached is an updated patch.

> Actually I came with better than last patch by using a boolean flag as
> return value of TimestampSleepDifference and use
> TimestampDifferenceExceeds directly inside it.

> Subject: [PATCH] Add wal_availability_check_interval to control WAL fetching
>  on failure

I think that name isn't a very good. And its isn't very accurate
either.

How about wal_retrieve_retry_interval? Not very nice, but I think it's
still better than the above.


> +     <varlistentry id="wal-availability-check-interval" xreflabel="wal_availability_check_interval">
> +      <term><varname>wal_availability_check_interval</varname> (<type>integer</type>)
> +      <indexterm>
> +        <primary><varname>wal_availability_check_interval</> recovery parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        This parameter specifies the amount of time to wait when
> +        WAL is not available for a node in recovery. Default value is
> +        <literal>5s</>.
> +       </para>
> +       <para>
> +        A node in recovery will wait for this amount of time if
> +        <varname>restore_command</> returns nonzero exit status code when
> +        fetching new WAL segment files from archive or when a WAL receiver
> +        is not able to fetch a WAL record when using streaming replication.
> +       </para>
> +      </listitem>
> +     </varlistentry>
> +
>      </variablelist>

Walreceiver doesn't wait that amount, but rather how long the connection
is intact. And restore_command may or may not retry.

>      /*-------
>       * Standby mode is implemented by a state machine:
> @@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
>                       * machine, so we've exhausted all the options for
>                       * obtaining the requested WAL. We're going to loop back
>                       * and retry from the archive, but if it hasn't been long
> -                     * since last attempt, sleep 5 seconds to avoid
> -                     * busy-waiting.
> +                     * since last attempt, sleep the amount of time specified
> +                     * by wal_availability_check_interval to avoid busy-waiting.
>                       */
> -                    now = (pg_time_t) time(NULL);
> -                    if ((now - last_fail_time) < 5)
> -                    {
> -                        pg_usleep(1000000L * (5 - (now - last_fail_time)));
> -                        now = (pg_time_t) time(NULL);
> -                    }
> +                    now = GetCurrentTimestamp();
> +                    if (TimestampSleepDifference(last_fail_time, now,
> +                                            wal_availability_check_interval))
> +                        now = GetCurrentTimestamp();

Not bad, that's much easier to read imo.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: WITH CHECK and Column-Level Privileges
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?