RE: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Peter Smith <smithpb2250@gmail.com>)
RE: [PoC] pg_upgrade: allow to upgrade publisher node  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
Dear Dilip,

Thank you for reviewing! PSA new version.

> 
> 1.
> Note that slot restoration must be done after the final pg_resetwal command
> during the upgrade because pg_resetwal will remove WALs that are required by
> the slots. Due to this restriction, the timing of restoring replication slots is
> different from other objects.
> 
> This comment in the commit message is confusing.  I understand the
> reason but from this, it is not very clear that if resetwal removes
> the WAL we needed then why it is good to create after the resetwal.  I
> think we should make it clear that creating new slot will set the
> restart lsn to current WAL location and after that resetwal can remove
> those WAL where slot restart lsn is pointing....

Just to confirm - WAL records must not be removed in any time if it is referred
as restart_lsn. The reason why the slot creation is done after pg_restwal is that
required WALs are not removed by the command. See [1].
Moreover, clarified more in the commit message.

> 2.
> 
> +    <itemizedlist>
> +     <listitem>
> +      <para>
> +       All slots on the old cluster must be usable, i.e., there are no slots
> +       whose
> +       <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>
> wal_status</structfield>
> +       is <literal>lost</literal>.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>c
> onfirmed_flush_lsn</structfield>
> +       of all slots on the old cluster must be the same as the latest
> +       checkpoint location.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The output plugins referenced by the slots on the old cluster must be
> +       installed in the new PostgreSQL executable directory.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The new cluster must have
> +       <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> +       configured to a value greater than or equal to the number of slots
> +       present in the old cluster.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The new cluster must have
> +       <link
> linkend="guc-wal-level"><varname>wal_level</varname></link> as
> +       <literal>logical</literal>.
> +      </para>
> +     </listitem>
> +    </itemizedlist>
> 
> I think we should also add that the new slot should not have any
> permanent existing logical replication slot.

Hmm, I wondered it should be really needed. Tables are required not to be in the
new cluster too, but not documented. It might be a trivial thing. Anyway, added.

FYI - the restriction was not introduced by the patch. I reported independently [2],
but no one has responded since now...

> 3.
> -       with the primary.)  Replication slots are not copied and must
> -       be recreated.
> +       with the primary.)  Replication slots on the old standby are not copied.
> +       Only logical slots on the primary are migrated to the new standby,
> +       and other slots must be recreated.
> 
> This paragraph should be rephrased.  I mean first stating that
> "Replication slots on the old standby are not copied" and then saying
> Only logical slots are migrated doesn't seem like the best way.  Maybe
> we can just say "Only logical slots on the primary are migrated to the
> new standby, and other slots must be recreated."

Per discussion on [3], I used another words. Thanks for suggesting.

> 4.
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
> 
> Rephrase the first line as ->  Raise an ERROR if the logical
> replication slot is invalidating during an upgrade.

Per discussion on [3], I used another words. Thanks for suggesting.

> 5.
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> 
> 
> For readability change this to if
> (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
> of the checks related to this, we are using 1700 so better be
> consistent in this.

Per discussion on [3], I did not change here.

> 6.
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> +   "New cluster must not have logical replication slots but found %d slots.",
> +   nslots_on_new),
> + nslots_on_new);
> ...
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine wal_level.");
> +
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (strcmp(wal_level, "logical") != 0)
> + pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
> + wal_level);
> 
> 
> I have noticed that the case of the first letter in the pg_fatal
> message is not consistent.

Actually there are some inconsistency even in the check.c file, so I devised
below rules. How do you think?

* Non-complete sentence starts with the lower case.
  (e.g., "could not open", "could not determine")
* proper nouns are always noted with the lower cases
  (e.g., "template0 must not allow...", "wal_level must be...").
* Other than above, the sentence starts with the upper case.

> 7.
> +
> + /* Is the slot still usable? */
> + if (slot->invalid)
> + {
> 
> Why comment says "Is the slot still usable?" I think it should be "Is
> the slot usable?" otherwise it appears that we have first fetched the
> slots and now we are refetching it and checking whether it is still
> usable.

Changed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYAPR01MB5866D277F6BEDEA4223B3559F5E6A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAFiTN-vs53SqZiZN1GcSuKLmMY%3D0d14wJDDm1aKmoBONwnqaGg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Himanshu Upadhyaya
Дата:
Сообщение: Re: CHECK Constraint Deferrable
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node