Re: Impact of checkpointer during pg_upgrade

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Impact of checkpointer during pg_upgrade
Дата
Msg-id CAA4eK1LLik2818uzYqS73O+He5LK_+=kthyZ6hwT6oe9TuxycA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Impact of checkpointer during pg_upgrade  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: Impact of checkpointer during pg_upgrade
Список pgsql-hackers
On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > The other possibilities apart from not allowing an upgrade in such a
> > case could be (a) Before starting the old cluster, we fetch the slots
> > directly from the disk using some tool like [2] and make the decisions
> > based on that state;
>
> Okay, so IIUC along with dumping the slot data we also need to dump
> the latest checkpoint LSN because during upgrade we do check that the
> confirmed flush lsn for all the slots should be the same as the latest
> checkpoint.  Yeah but I think we could work this out.
>

We already have the latest checkpoint LSN information from
pg_controldata. I think we can use that as the patch proposed in the
thread [1] is doing now. Do you have something else in mind?

>  (b) During the upgrade, we don't allow WAL to be
> > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > as well but for that, we need to expose an API to invalidate the
> > slots;
>
>  (d) somehow distinguish the slots that are invalidated during
> > an upgrade and then simply copy such slots because anyway we ensure
> > that all the WAL required by slot is sent before shutdown.
>
> Yeah this could also be an option, although we need to think the
> mechanism of distinguishing those slots looks clean and fit well with
> other architecture.
>

If we want to do this we probably need to maintain a flag in the slot
indicating that it was invalidated during an upgrade and then use the
same flag in the upgrade to check the validity of slots. I think such
a flag needs to be maintained at the same level as
ReplicationSlotInvalidationCause to avoid any inconsistency among
those.

> Alternatively can't we just ignore all the invalidated slots and do
> not migrate them at all.  Because such scenarios are very rare that
> some of the segments are getting dropped just during the upgrade time
> and that too from the old cluster so in such cases not migrating the
> slots which are invalidated should be fine no?
>

I also think that such a scenario would be very rare but are you
suggesting to ignore all invalidated slots or just the slots that got
invalidated during an upgrade? BTW, if we simply ignore invalidated
slots then users won't be able to drop corresponding subscriptions
after an upgrade. They need to first use the Alter Subscription
command to disassociate the slot (by using the command ALTER
SUBSCRIPTION ... SET (slot_name = NONE)) and then drop the
subscription similar to what we suggest in other cases as described in
the Replication Slot Management section in docs [2]. Also, if users
really want to continue that subscription by syncing corresponding
tables then they can recreate the slots manually and then continue
with replication. So, if we want to do this then we will just rely on
the current state (at the time we query for them in the old cluster)
of slots, and even if they later got invalidated during the upgrade,
we will just ignore such invalidations as anyway the required WAL is
already copied.


[1] -
https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: Assert failure in ATPrepAddPrimaryKey
Следующее
От: Thomas Munro
Дата:
Сообщение: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only