Re: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id CAD21AoDkyyC=wa2=1Ruo_L8g16xf_W5Xyhp-=3j9urT916b9gA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы RE: [PoC] pg_upgrade: allow to upgrade publisher node  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
On Tue, Nov 28, 2023 at 6:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > One month has already been passed since this main patch got committed
> > > but reading this change, I have some questions on new
> > > binary_upgrade_logical_slot_has_caught_up() function:
> > >
> > > Is there any reason why this function can be executed only in binary
> > > upgrade mode? It seems to me that other functions in
> > > pg_upgrade_support.c must be called only in binary upgrade mode
> > > because it does some hacky changes internally. On the other hand,
> > > binary_upgrade_logical_slot_has_caught_up() just calls
> > > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > > internally. If we make this function usable in normal mode, the user
> > > would be able to  check each slot's upgradability without pg_upgrade
> > > --check command (or without stopping the server if the user can ensure
> > > no more meaningful WAL records are generated).
> >
> > It may happen that such a user-facing function tells there's no
> > unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> > Therefore, the information the function gives turns out to be
> > incorrect. I don't see a real-world use-case for such a function right
> > now. If there's one, it's not a big change to turn it into a
> > user-facing function.
> >
>
> Yeah, as of now, I don't see a use case for it and in fact, it could
> lead to unpredictable results. Immediately after calling the function,
> there could be more activity on the server which could make the
> results incorrect. I think to check the slot's upgradeability, one can
> rely on the results of the pg_upgrade --check functionality.

Fair point.

This function is already a user-executable function as it's in
pg_catalog but is restricted to be executed only in binary upgrade
even though it doesn't change anything internally. So it wasn't clear
to me why we put such a restriction.

>
> > > ---
> > > Also, the function checks if the user has the REPLICATION privilege
> > > but I think that only superuser can connect to the server in binary
> > > upgrade mode in the first place.
> >
> > If that were true, I don't see a problem in having
> > CheckSlotPermissions() there, in fact it can act as an assertion.
> >
>
> I think we can change it to assertion or may elog(ERROR, ...) with a
> comment as to why we don't expect this can happen.

+1 for an assertion, to match other checks in the function.

>
> > > ---
> > > The following error message doesn't match the function name:
> > >
> > >     /* We must check before dereferencing the argument */
> > >     if (PG_ARGISNULL(0))
> > >         elog(ERROR, "null argument to
> > > binary_upgrade_validate_wal_records is not allowed");
> > >
>
> This should be fixed.
>
> > > ---
> > > { oid => '8046', descr => 'for use by pg_upgrade',
> > >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
> > >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> > >   proargtypes => 'name',
> > >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> > >
> > > The function is not a strict function but we check in the function if
> > > the passed argument is not null. I think it would be clearer to make
> > > it a strict function.
> >
> > I think it has been done that way similar to
> > binary_upgrade_create_empty_extension().

binary_upgrade_create_empty_extension() needs to be a non-strict
function since it needs to accept NULL in some arguments such as
extConfig. On the other hand,
binary_upgrade_logical_slot_has_caught_up() doesn't handle NULL and
it's conventional to make such a function a strict function.

> >
> > > ---
> > > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > > guess it's more suitable to be in slotfunc.s where similar functions
> > > such as pg_logical_replication_slot_advance() is also defined.
> >
> > Why not in logicalfuncs.c?
> >
>
> I am not sure if either of those is better than logical.c. IIRC, I
> thought it was okay to keep in logical.c as others primarily deal with
> exposed SQL functions and I felt it somewhat matches with the intent
> of logical.c ("The goal is to encapsulate most of the internal
> complexity for consumers of logical decoding, so they can create and
> consume a changestream with a low amount of code..").

I see your point. To me it looks that the functions in logical.c are
APIs and internal functions to manage logical decoding context and
replication slot (e.g., restart_lsn). On the other hand,
LogicalReplicationSlotHasPendingWal() seems to be a user of the
logical decoding. But anyway, it seems that three hackers have
different opinions. So we can keep it unless someone has a good reason
to change it.

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
>
> Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> un-strict to keep a caller function simpler.
>
> Currently get_old_cluster_logical_slot_infos() executes a query and it contains
> binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
> either true or false is returned.
>
> But if proisstrict is changed true, we must handle the case when NULL is returned.
> It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Streaming I/O, vectored I/O (WIP)