Re: pg_rewind: warn when checkpoint hasn't happened after promotion

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Дата
Msg-id 20220607.165401.749444271469074557.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на pg_rewind: warn when checkpoint hasn't happened after promotion  (James Coleman <jtc331@gmail.com>)
Список pgsql-hackers
At Tue, 7 Jun 2022 16:16:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, Jun 07, 2022 at 12:39:38PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 6 Jun 2022 08:32:01 -0400, James Coleman <jtc331@gmail.com> wrote in 
> >> To confirm I'm following you correctly, you're envisioning a situation like:
> >> 
> >> - Primary A
> >> - Replica B replicating from primary
> >> - Replica C replicating from replica B
> >> 
> >> then on failover from A to B you end up with:
> >> 
> >> - Primary B
> >> - Replica C replication from primary
> >> - [needs rewind] A
> >> 
> >> and you try to rewind A from C as the source?
> > 
> > Yes. I think it is a legit use case.  That being said, like other
> > points, it might be acceptable.
> 
> This configuration is a case supported by pg_rewind, meaning that your
> patch to check after minRecoveryPointTLI would be confusing when using
> a standby as a source because the checkpoint needs to apply on its
> primary to allow the TLI of the standby to go up.  If you want to

Yeah, that what I meant.

> provide to the user more context, a more meaningful way may be to rely
> on an extra check for ControlFileData.state, I guess, as a promoted 
> cluster is marked as DB_IN_PRODUCTION before recoveryMinPoint is
> cleared by the first post-promotion checkpoint, with
> DB_IN_ARCHIVE_RECOVERY for a cascading standby.

Right. However, IIUC, checkpoint LSN/TLI is not updated at the
time. The point of the minRecoveryPoint check is to confirm that we
can read the timeline ID of the promoted source cluster from
checkPointCopy.ThisTimeLineID. But we cannot do that yet at the time
the cluster state moves to DB_IN_PRODUCTION.  And a standby is in
DB_IN_ARCHIVE_RECOVERY since before the upstream promotes. It also
doesn't signal the reliability of checkPointCopy.ThisTimeLineID..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Logging query parmeters in auto_explain
Следующее
От: Jean Landercy - BEEODIVERSITY
Дата:
Сообщение: RE: Sudden database error with COUNT(*) making Query Planner crashes: variable not found in subplan target list