RE: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id TYCPR01MB5870AC61339753C7E3F0F1B0F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node
Список pgsql-hackers
Dear Peter,

Thank you for reviewing! The patch can be available in [1].

> Commit Message
> 
> 1.
> This commit allows nodes with logical replication slots to be upgraded. While
> reading information from the old cluster, a list of logical replication slots is
> newly extracted. At the later part of upgrading, pg_upgrade revisits the list
> and restores slots by using the pg_create_logical_replication_slots() on the new
> clushter.
> 
> ~
> 
> 1a
> /is newly extracted/is fetched/

Fixed.

> 1b.
> /using the pg_create_logical_replication_slots()/executing
> pg_create_logical_replication_slots()/

Fixed.

> 1c.
> /clushter/cluster/

Fixed.

> 2.
> Note that it must be done after the final pg_resetwal command during the upgrade
> because pg_resetwal will remove WALs that are required by the slots. Due to the
> restriction, the timing of restoring replication slots is different from other
> objects.
> 
> ~
> 
> 2a.
> /it must/slot restoration/

You meant to say s/it must/slot restoration must/, right? Fixed.

> 2b.
> /the restriction/this restriction/
> 
> ======
> doc/src/sgml/ref/pgupgrade.sgml
> 
> 3.
> +    <para>
> +     <application>pg_upgrade</application> attempts to migrate logical
> +     replication slots. This helps avoid the need for manually defining the
> +     same replication slot on the new publisher.
> +    </para>
> 
> /same replication slot/same replication slots/

Fixed.

> 4.
> +    <para>
> +     Before you start upgrading the publisher cluster, ensure that the
> +     subscription is temporarily disabled, by executing
> +     <link linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION ... DISABLE</command></link>.
> +     After the upgrade is complete, execute the
> +     <command>ALTER SUBSCRIPTION ... CONNECTION</command>
> command to update the
> +     connection string, and then re-enable the subscription.
> +    </para>
> 
> On the rendered page, it looks a bit strange that DISABLE has a link
> but COMMENTION does not have a link.

Added.

> 5.
> +    <para>
> +     There are some prerequisites for
> <application>pg_upgrade</application> to
> +     be able to upgrade the replication slots. If these are not met an error
> +     will be reported.
> +    </para>
> +
> +    <itemizedlist>
> 
> +1 to use all the itemizedlist changes that Amit suggested [1] in his
> attachment.

Yeah, I agreed it is nice. Applied.

> src/bin/pg_upgrade/check.c
> 
> 6.
> +static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
> 
> IMO the arg name should not shadow a global with the same name. See
> other review comment for this function signature.

OK, fixed.

> 7.
> + /* Extract a list of logical replication slots */
> + get_logical_slot_infos(&old_cluster, live_check);
> 
> But 'live_check' is never used?

It is needed for 0003, moved.

> 8. check_for_logical_replication_slots
> +
> +/*
> + * Verify the parameter settings necessary for creating logical replication
> + * slots.
> + */
> +static void
> +check_for_logical_replication_slots(ClusterInfo *new_cluster)
> 
> IMO the arg name should not shadow a global with the same name. If
> this is never going to be called with any param other than
> &new_cluster then probably it is better not even to pass have that
> argument at all. Just refer to the global new_cluster inside the
> function.
> 
> You can't say that 'check_for_new_tablespace_dir' does it already so
> it must be OK -- I think that the existing function has the same issue
> and it also ought to be fixed to avoid shadowing!

Fixed.

> 9. check_for_logical_replication_slots
> 
> + /* logical replication slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> + return;
> 
> IMO the code matches the comment better if you say < 1700 instead of <= 1600.

Changed.

> src/bin/pg_upgrade/function.c
> 
> 10. get_loadable_libraries
>   /*
> - * Fetch all libraries containing non-built-in C functions in this DB.
> + * Fetch all libraries containing non-built-in C functions or referred
> + * by logical replication slots in this DB.
>   */
>   ress[dbnum] = executeQueryOrDie(conn,
> ~
> 
> /referred by/referred to by/

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 11.
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + if (cluster == &old_cluster)
> + pg_log(PG_VERBOSE, "\nsource databases:");
> + else
> + pg_log(PG_VERBOSE, "\ntarget databases:");
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
> + DbInfo    *pDbInfo = &cluster->dbarr.dbs[dbnum];
> +
> + get_logical_slot_infos_per_db(cluster, pDbInfo);
> + slot_count += pDbInfo->slot_arr.nslots;
> +
> + if (log_opts.verbose)
> + {
> + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
> + print_slot_infos(&pDbInfo->slot_arr);
> + }
> + }
> +}
> +
> 
> 11a.
> Now the variable 'slot_count' is no longer being returned so it seems redundant.
> 
> ~
> 
> 11b.
> What is the 'live_check' parameter for? Nobody is using it.

These are needed for 0003, moved.

> 12. count_logical_slots
> 
> +int
> +count_logical_slots(ClusterInfo *cluster)
> +{
> + int dbnum;
> + int slotnum = 0;
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
> +
> + return slotnum;
> +}
> 
> IMO this variable should be called something like 'slot_count'. This
> is the same review comment also made in a previous review. (See [2]
> comment#12).

Changed.

> 13. print_slot_infos
> 
> +
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> + int slotnum;
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
> +    slot_arr->slots[slotnum].slotname,
> +    slot_arr->slots[slotnum].plugin,
> +    slot_arr->slots[slotnum].two_phase);
> +}
> 
> It might be nicer to introduce a variable, instead of all those array
> dereferences:
> 
> LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

Changed.

> 14.
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + {
> + /*
> + * Constructs query for creating logical replication slots.
> + *
> + * XXX: For simplification, pg_create_logical_replication_slot() is
> + * used. Is it sufficient?
> + */
> + appendPQExpBuffer(query, "SELECT
> pg_catalog.pg_create_logical_replication_slot(");
> + appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
> + conn);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
> + conn);
> + appendPQExpBuffer(query, ", false, %s);",
> +   slot_arr->slots[slotnum].two_phase ? "true" : "false");
> +
> + PQclear(executeQueryOrDie(conn, "%s", query->data));
> +
> + resetPQExpBuffer(query);
> + }
> +
> + PQfinish(conn);
> +
> + destroyPQExpBuffer(query);
> + }
> +
> + end_progress_output();
> + check_ok();
> 
> 14a
> Similar to the previous comment (#13). It might be nicer to introduce
> a variable, instead of all those array dereferences:
> 
> LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

Changed.

> 14b.
> It was not clear to me why this command is not being built using
> executeQueryOrDie directly instead of using the query buffer. Is there
> some reason?

I wanted to take care the encoding, that was the reason I used PQExpBuffer
functions, especially appendStringLiteralConn(). IIUC executeQueryOrDie() could
not take care it.

> 14c.
> I think it would be cleaner to have a separate res variable like you
> used elsewhere:
> res = executeQueryOrDie(...)
> 
> instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
> in one line

Hmm, there are some use cases for PQclear(executeQueryOrDie(...)) style, e.g.,
set_locale_and_encoding() and set_frozenxids(). I do not think your style is good
if the result of the query is not used. please tell me if you find a case that
res = executeQueryOrDie(...) is used but result is not checked.

> src/bin/pg_upgrade/pg_upgrade.
> 
> 15.
> +void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);
> 
> I didn't see a reason for that 'live_check' parameter.

It was needed for 0003, moved.

> .../pg_upgrade/t/003_logical_replication_slots.pl
> 
> 16.
> IMO this would be much easier to read if there were BIG comments
> between the actual TEST parts
> 
> For example
> 
> # ------------------------------
> # TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
> <preparation>
> <test>
> <cleanup>
> 
> # ------------------------------
> # TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
> <preparation>
> <test>
> <cleanup>
> 
> # ------------------------------
> # TEST: Successful upgrade
> <preparation>
> <test>
> <cleanup>

Added. 0003 also followed the style.

> 17.
> +# Cause a failure at the start of pg_upgrade because wal_level is replica
> +command_fails(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d',         $old_publisher->data_dir,
> + '-D',         $new_publisher->data_dir,
> + '-b',         $bindir,
> + '-B',         $bindir,
> + '-s',         $new_publisher->host,
> + '-p',         $old_publisher->port,
> + '-P',         $new_publisher->port,
> + $mode,
> + ],
> + 'run of pg_upgrade of old node with wrong wal_level');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> + "pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> The message is ambiguous
> 
> BEFORE
> 'run of pg_upgrade of old node with wrong wal_level'
> 
> SUGGESTION
> 'run of pg_upgrade where the new node has the wrong wal_level'

Changed.

> 18.
> +# Create an unnecessary slot on old node
> +$old_publisher->start;
> +$old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> +]);
> +
> +$old_publisher->stop;
> +
> +# Preparations for the subsequent test. max_replication_slots is set to
> +# smaller than existing slots on old node
> +$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
> +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> 
> 
> IMO the comment is misleading. It is not an "unnecessary slot", it is
> just a 2nd slot. And this is all part of the preparation for the next
> test so it should be under the other comment.
> 
> For example SUGGESTION changes like this:
> 
> # Preparations for the subsequent test.
> # 1. Create an unnecessary slot on the old node
> $old_publisher->start;
> $old_publisher->safe_psql(
> 'postgres', qq[
> SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> ]);
> $old_publisher->stop;
> # 2. max_replication_slots is set to smaller than the number of slots
> (2) present on the old node
> $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> # 3. new node wal_level is set correctly
> $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");

Followed the style.

> 19.
> +# Remove an unnecessary slot and consume WAL records
> +$old_publisher->start;
> +$old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_drop_replication_slot('test_slot2');
> + SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
> NULL)
> +]);
> +$old_publisher->stop;
> +
> 
> This comment should say more like:
> 
> # Preparations for the subsequent test.

Followed above style.

> 20.
> +# Actual run, pg_upgrade_output.d is removed at the end
> 
> This comment should mention that "successful upgrade is expected"
> because all the other prerequisites are now satisfied.

The suggestion was added to the comment

> 21.
> +$new_publisher->start;
> +my $result = $new_publisher->safe_psql('postgres',
> + "SELECT slot_name, two_phase FROM pg_replication_slots");
> +is($result, qq(test_slot1|t), 'check the slot exists on new node');
> 
> Should there be a matching new_pulisher->stop;?

Not sure it is really needed, but added.
Also, the word "node" was replaced to "cluster" because the later word is used
in the doc.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node