Re: Allow users to choose what happens when recovery target is not reached

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Allow users to choose what happens when recovery target is not reached
Дата
Msg-id CALj2ACW_=Q48pD7SoQr--o-5Md82ZjWUCD-7NBpJ52pEGGXYaw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Allow users to choose what happens when recovery target is not reached  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: Allow users to choose what happens when recovery target is not reached  ("Euler Taveira" <euler@eulerto.com>)
Re: Allow users to choose what happens when recovery target is not reached  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On Sat, Nov 13, 2021 at 9:45 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Nov 13, 2021 at 11:00 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Users will always be optimistic and set a recovery target and try to
> > reach it, but somehow the few of the WAL files haven't arrived (for
> > whatever the reasons) the PITR target server, imagine if their primary
> > isn't available too, then with the proposal I made, they can choose to
> > have at least an available target server rather than a FATALly failed
> > one.
>
> If your primary server isn't available, why would you want a recovery
> target in the first place?  I just don't understand in which case
> someone would want to setup a recovery target and wouldn't care if the
> recovery wasn't reached, especially if it can be off by GB / days of
> data.
>
> It seems like it could have the opposite effect of what you want most
> of the time.  What if for some reason the restore_command is flawed,
> and you end up starting your server because it couldn't restore WAL
> that are actually available?  You would have to restart from scratch
> and waste more time than if you didn't use this.

Firstly, the proposed patch adds no new behaviour as such, it just
gives the ability that is existing today on v12 and below (prior to
commit dc78866 which went into v13 and later).

I think performing PITR is the user's wish - whether the primary is
available or not, it is completely the user's choice. The user might
start the PITR, when the primary is available, thinking that it sends
all the WAL files required for achieving recovery target. But imagine
a disaster happens and the primary server crashes, say the recovery
has replayed a huge bunch of WAL records (a TB may be), and the
primary failed without sending the last one or few WAL files, should
the PITR target server be failing this case after replaying a huge
bunch of WAL records? The user might want the target server to be
available instead of FATALly shutting down. This is the exact problem
the proposed patch is trying to solve.

With the GUC proposed, the user can choose what to do in these
scenarios. The user will be fully aware what she needs when she choose
to set the new GUC recovery_end_before_target_action to 'promote'
instead of default 'shutdown'.

> It look like what you actually want is some kind of a target window,
> but the window you currently propose is a hardcoded (consistency,
> given target], and it seems too dangerous to be useful.

As I said earlier, the behaviour is not too dangerous as it is not
something new that the patch is proposing, it exists today in v12 and
below. In fact, it gives a way out of a "dangerous situation" if the
user ever gets stuck in it without wasting recovery cycles and compute
resources, by quickly getting the database to be available(of course,
the responsibility lies with the user to deal with the missing WAL
files).

Regards,
Bharath Rupireddy.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Identify missing publications from publisher while create/alter subscription.
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: RFC: Logging plan of the running query