Re: pg_rewind: warn when checkpoint hasn't happened after promotion

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Дата
Msg-id CAAaqYe8gaooWYRS=gN-BQuJMf7UtgGUqXS+5QxjD6h7YzAM3xg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_rewind: warn when checkpoint hasn't happened after promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: pg_rewind: warn when checkpoint hasn't happened after promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Mon, Jun 6, 2022 at 1:26 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Sat, 4 Jun 2022 19:09:41 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Sat, Jun 4, 2022 at 6:29 PM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > A few weeks back I sent a bug report [1] directly to the -bugs mailing
> > > list, and I haven't seen any activity on it (maybe this is because I
> > > emailed directly instead of using the form?), but I got some time to
> > > take a look and concluded that a first-level fix is pretty simple.
> > >
> > > A quick background refresher: after promoting a standby rewinding the
> > > former primary requires that a checkpoint have been completed on the
> > > new primary after promotion. This is correctly documented. However
> > > pg_rewind incorrectly reports to the user that a rewind isn't
> > > necessary because the source and target are on the same timeline.
> ...
> > > Attached is a patch that detects this condition and reports it as an
> > > error to the user.
>
> I have some random thoughts on this.
>
> There could be a problem in the case of gracefully shutdowned
> old-primary, so I think it is worth doing something if it can be in a
> simple way.
>
> However, I don't think we can simply rely on minRecoveryPoint to
> detect that situation, since it won't be reset on a standby. A standby
> also still can be the upstream of a cascading standby.  So, as
> discussed in the thread for the comment [2], what we can do here would be
> simply waiting for the timelineID to advance, maybe having a timeout.

To confirm I'm following you correctly, you're envisioning a situation like:

- Primary A
- Replica B replicating from primary
- Replica C replicating from replica B

then on failover from A to B you end up with:

- Primary B
- Replica C replication from primary
- [needs rewind] A

and you try to rewind A from C as the source?

> In a case of single-step replication set, a checkpoint request to the
> primary makes the end-of-recovery checkpoint fast.  It won't work as
> expected in cascading replicas, but it might be acceptable.

"Won't work as expected" because there's no way to guarantee
replication is caught up or even advancing?

> > > In the spirit of the new-ish "ensure shutdown" functionality I could
> > > imagine extending this to automatically issue a checkpoint when this
> > > situation is detected. I haven't started to code that up, however,
> > > wanting to first get buy-in on that.
> > >
> > > 1: https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com
> >
> > Thanks. I had a quick look over the issue and patch - just a thought -
> > can't we let pg_rewind issue a checkpoint on the new primary instead
> > of erroring out, maybe optionally? It might sound too much, but helps
> > pg_rewind to be self-reliant i.e. avoiding external actor to detect
> > the error and issue checkpoint the new primary to be able to
> > successfully run pg_rewind on the pld primary and repair it to use it
> > as a new standby.
>
> At the time of the discussion [2] for the it was the hinderance that
> that requires superuser privileges.  Now that has been narrowed down
> to the pg_checkpointer privileges.
>
> If we know that the timeline IDs are different, we don't need to wait
> for a checkpoint.

Correct.

> It seems to me that the exit status is significant. pg_rewind exits
> with 1 when an invalid option is given. I don't think it is great if
> we report this state by the same code.

I'm happy to change that; I only chose "1" as a placeholder for
"non-zero exit status".

> I don't think we always want to request a non-spreading checkpoint.

I'm not familiar with the terminology "non-spreading checkpoint".

> [2]
https://www.postgresql.org/message-id/flat/CABUevEz5bpvbwVsYCaSMV80CBZ5-82nkMzbb%2BBu%3Dh1m%3DrLdn%3Dg%40mail.gmail.com

I read through that thread, and one interesting idea stuck out to me:
making "tiimeline IDs are the same" an error exit status. On the one
hand that makes a certain amount of sense because it's unexpected. But
on the other hand there are entirely legitimate situations where upon
failover the timeline IDs happen to match (e.g., for use it happens
some percentage of the time naturally as we are using sync replication
and failovers often involve STONITHing the original primary, so it's
entirely possible that the promoted replica begins with exactly the
same WAL ending LSN from the primary before it stopped).

Thanks,
James Coleman



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Следующее
От: Dong Wook Lee
Дата:
Сообщение: pg_buffercache: add sql test