Re: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id CALj2ACXJnAziT7KiNxoW2BS5TaYcg8w55Ukwn13+uNi+QzOg-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Thanks for the patch. I have some comments on v42:
>
> Probably trying to keep it similar with
> binary_upgrade_create_empty_extension(). I think it depends what
> behaviour we expect for NULL input.

confirmed_flush_lsn for a logical slot can be null (for instance,
before confirmed_flush is updated for a newly created logical slot if
someone calls pg_stat_replication -> pg_get_replication_slots) and
when it is so, the binary_upgrade_create_empty_extension errors out.
Is this behaviour wanted? I think the function returning null on null
input is a better behaviour here.

> > 2.
> > +Datum
> > +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
> >
> > The function name looks too generic in the sense that it validates WAL
> > records for correctness/corruption, but it is not. Can it be something
> > like binary_upgrade_{check_for_wal_logical_end,
> > check_for_logical_end_of_wal} or such?
> >
>
> How about slightly modified version like
> binary_upgrade_validate_wal_logical_end?

Works for me.

> > 3.
> > +    /* Quick exit if the given lsn is larger than current one */
> > +    if (start_lsn >= GetFlushRecPtr(NULL))
> > +        PG_RETURN_BOOL(false);
> > +
> >
> > An LSN that doesn't exists yet is an error IMO, may be an error better here?
> >
>
> It will anyway lead to error at later point but we will provide more
> information about all the slots that have invalid value of
> confirmed_flush LSN.

I disagree with the function returning false for non-existing LSN.
IMO, failing fast when an LSN that doesn't exist yet is supplied to
the function is the right approach. We never know, the slots on disk
content can get corrupted for some reason and confirmed_flush_lsn is
'FFFFFFFF/FFFFFFFF' or a non-existing LSN.

> > 4.
> > + * This function is used to verify that there are no WAL records (except some
> > + * types) after confirmed_flush_lsn of logical slots, which means all the
> > + * changes were replicated to the subscriber. There is a possibility that some
> > + * WALs are inserted during upgrade, so such types would be ignored.
> > + *
> >
> > This comment before the function better be at the callsite of the
> > function, because as far as this function is concerned, it checks if
> > there are any WAL records that are not "certain" types after the given
> > LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
> >
>
> Yeah, we should give information at the callsite but I guess we need
> to give some context atop this function as well so that it is easier
> to explain the functionality.

At the callsite a detailed description is good. At the function
definition just a reference to the callsite is good.

> > 5. Trying to understand the interaction of this feature with custom
> > WAL records that a custom WAL resource manager puts in. Is it okay to
> > have custom WAL records after the "logical WAL end"?
> > +        /*
> > +         * There is a possibility that following records may be generated
> > +         * during the upgrade.
> > +         */
> >
>
> I don't think so. The only valid records for the checks in this
> function are probably the ones that can get generated by the upgrade
> process because we ensure that walsender sends all the records before
> it exits at shutdown time.

Can you help me understand how the list of WAL records that pg_upgrade
can generate is put up? Identified them after running some tests?

> > 10.
> > Why just wal_level and max_replication_slots, why not
> > max_worker_processes and max_wal_senders too?
>
> Isn't it sufficient to check the parameters that are required to
> create a slot aka what we check in the function
> CheckLogicalDecodingRequirements()? We are only creating logical slots
> here so I think that should be sufficient.

Ah, that makes sense.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: information_schema and not-null constraints