On Thu, Sep 14, 2023 at 10:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 10:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 14, 2023 at 9:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > > > -----------
> > > >
> > > > 3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
> > > > already did the upgrade check for stopped server, they can use this option
> > > > when trying to upgrade later.
> > > >
> > > > Pros: Can save some efforts for user to advance each slot's lsn.
> > > >
> > > > Cons: I didn't see similar options in pg_upgrade, might need some agreement.
> > >
> > > Yeah right, in fact during the --check command we can give that
> > > suggestion as well.
> > >
> >
> > Hmm, we can't mandate users to skip checking slots because that is the
> > whole point of --check slots.
>
> I mean not to mandate skipping in the --check command. But once the
> check command has already checked the slot then we can issue a
> suggestion to the user that the slots are already checked so that
> during the actual upgrade we can --skip checking the slots. So for
> user who has already run the check command and is now following with
> an upgrade can skip slot checking if we can provide such an option.
>
oh, okay, we can document and request the user to follow as you
suggest but I guess it will be more work for the user and also is less
intuitive.
> > > I feel option 2 looks best to me unless there is some design issue to
> > > that, as of now I do not see any issue with that though. Let's see
> > > what others think.
> > >
> >
> > By the way, did you consider the previous approach this patch was
> > using? Basically, instead of getting the last checkpoint location from
> > the control file, we will read the WAL file starting from the
> > confirmed_flush location of a slot and if we find any WAL other than
> > expected WALs like shutdown checkpoint, running_xacts, etc. then we
> > will error out.
>
> So basically, while scanning from confirmed_flush we must ensure that
> we find a first record as SHUTDOWN CHECKPOINT record at the same LSN,
> and after that, we should not get any other WAL other than like you
> said shutdown checkpoint, running_xacts. That way we will ensure both
> aspect that the confirmed flush LSN is at the shutdown checkpoint and
> after that there is no real activity in the system.
>
Right.
> I think to me,
> this seems like the best available option so far.
>
Yeah, let's see if someone else has a different opinion or has a better idea.
--
With Regards,
Amit Kapila.