Re: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id CAA4eK1+cAfBzfTr8cACeZL68CA8BizkKxs06gaw5Fx=MhNXWtw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Thu, Aug 10, 2023 at 6:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I feel it would be a good idea to provide such a tool for users to
> > avoid getting errors during upgrade but I think the upgrade code still
> > needs to ensure that there are no WAL records between
> > confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
> > want to say that we don't do any verification check during the upgrade
> > and let the data loss happens if the user didn't ensure that by
> > running such a tool?
>
> I meant that if we can check the slot state file while the old cluster
> stops, we can ensure there are no WAL records between slot's
> confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
> the control file).
>

Are you suggesting doing this before we start the old cluster or after
we stop the old cluster? I was thinking about the pros and cons of
doing this check when the server is 'on' (along with other upgrade
checks something like the patch is doing now) versus when the server
is 'off'. I think the advantage of doing it when the server is 'off'
(after check_and_dump_old_cluster()) is that it will be ensured that
there is no extra WAL that could be generated during the upgrade and
has not been verified against confirmed_flush_lsn location. But OTOH,
to retrieve slot information when the server is 'off', we need a
separate utility or probably a functionality for the same in
pg_upgrade and also some WAL reading stuff which sounds to me like a
larger change that may not be warranted here. I think anyway the extra
WAL (if any got generated during the upgrade) won't be required after
the upgrade so not convinced to make such a check while the server is
'off'. Are there reasons which make it better to do this while the old
cluster is 'off'?

> >
> > We can do that if we think so. We have two ways to make this check
> > optional (a) have a switch like --include-logical-replication-slots as
> > the proposed patch has which means by default we won't try to upgrade
> > slots; (b) have a switch like --exclude-logical-replication-slots as
> > Jonathan proposed which means we will exclude slots only if specified
> > by user. Now, one thing to note is that we don't seem to have any
> > include/exclude switch in the upgrade which I think indicates users by
> > default prefer to upgrade everything. Now, even if we decide not to
> > give any switch initially but do it only if there is a user demand for
> > it then also users will have a way to proceed with an upgrade which is
> > by dropping such slots. Do you have any preference?
>
> TBH I'm not sure if there is a use case where the user wants to
> exclude replication slots during the upgrade. Including replication
> slots by default seems to be better to me, at least for now. I
> initially thought asking for users to drop replication slots that
> possibly have not consumed all WAL records would not be a good idea,
> but since we already do such things in check.c I now think it would
> not be a problem. I guess it would be great if we can check WAL
> records between slots' confimed_flush_lsn and the latest LSN, and if
> there are no meaningful WAL records there we can upgrade the
> replication slots.
>

Agreed.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo NAGATA
Дата:
Сообщение: Re: pgbnech: allow to cancel queries during benchmark
Следующее
От: Yugo NAGATA
Дата:
Сообщение: Make psql's qeury canceling test simple by using signal() routine of IPC::Run