RE: [PoC] pg_upgrade: allow to upgrade publisher node
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Дата | |
Msg-id | TYCPR01MB5870AC61339753C7E3F0F1B0F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
|
Список | pgsql-hackers |
Dear Peter, Thank you for reviewing! The patch can be available in [1]. > Commit Message > > 1. > This commit allows nodes with logical replication slots to be upgraded. While > reading information from the old cluster, a list of logical replication slots is > newly extracted. At the later part of upgrading, pg_upgrade revisits the list > and restores slots by using the pg_create_logical_replication_slots() on the new > clushter. > > ~ > > 1a > /is newly extracted/is fetched/ Fixed. > 1b. > /using the pg_create_logical_replication_slots()/executing > pg_create_logical_replication_slots()/ Fixed. > 1c. > /clushter/cluster/ Fixed. > 2. > Note that it must be done after the final pg_resetwal command during the upgrade > because pg_resetwal will remove WALs that are required by the slots. Due to the > restriction, the timing of restoring replication slots is different from other > objects. > > ~ > > 2a. > /it must/slot restoration/ You meant to say s/it must/slot restoration must/, right? Fixed. > 2b. > /the restriction/this restriction/ > > ====== > doc/src/sgml/ref/pgupgrade.sgml > > 3. > + <para> > + <application>pg_upgrade</application> attempts to migrate logical > + replication slots. This helps avoid the need for manually defining the > + same replication slot on the new publisher. > + </para> > > /same replication slot/same replication slots/ Fixed. > 4. > + <para> > + Before you start upgrading the publisher cluster, ensure that the > + subscription is temporarily disabled, by executing > + <link linkend="sql-altersubscription"><command>ALTER > SUBSCRIPTION ... DISABLE</command></link>. > + After the upgrade is complete, execute the > + <command>ALTER SUBSCRIPTION ... CONNECTION</command> > command to update the > + connection string, and then re-enable the subscription. > + </para> > > On the rendered page, it looks a bit strange that DISABLE has a link > but COMMENTION does not have a link. Added. > 5. > + <para> > + There are some prerequisites for > <application>pg_upgrade</application> to > + be able to upgrade the replication slots. If these are not met an error > + will be reported. > + </para> > + > + <itemizedlist> > > +1 to use all the itemizedlist changes that Amit suggested [1] in his > attachment. Yeah, I agreed it is nice. Applied. > src/bin/pg_upgrade/check.c > > 6. > +static void check_for_logical_replication_slots(ClusterInfo *new_cluster); > > IMO the arg name should not shadow a global with the same name. See > other review comment for this function signature. OK, fixed. > 7. > + /* Extract a list of logical replication slots */ > + get_logical_slot_infos(&old_cluster, live_check); > > But 'live_check' is never used? It is needed for 0003, moved. > 8. check_for_logical_replication_slots > + > +/* > + * Verify the parameter settings necessary for creating logical replication > + * slots. > + */ > +static void > +check_for_logical_replication_slots(ClusterInfo *new_cluster) > > IMO the arg name should not shadow a global with the same name. If > this is never going to be called with any param other than > &new_cluster then probably it is better not even to pass have that > argument at all. Just refer to the global new_cluster inside the > function. > > You can't say that 'check_for_new_tablespace_dir' does it already so > it must be OK -- I think that the existing function has the same issue > and it also ought to be fixed to avoid shadowing! Fixed. > 9. check_for_logical_replication_slots > > + /* logical replication slots can be migrated since PG17. */ > + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600) > + return; > > IMO the code matches the comment better if you say < 1700 instead of <= 1600. Changed. > src/bin/pg_upgrade/function.c > > 10. get_loadable_libraries > /* > - * Fetch all libraries containing non-built-in C functions in this DB. > + * Fetch all libraries containing non-built-in C functions or referred > + * by logical replication slots in this DB. > */ > ress[dbnum] = executeQueryOrDie(conn, > ~ > > /referred by/referred to by/ Fixed. > src/bin/pg_upgrade/info.c > > 11. > +/* > + * get_logical_slot_infos() > + * > + * Higher level routine to generate LogicalSlotInfoArr for all databases. > + */ > +void > +get_logical_slot_infos(ClusterInfo *cluster, bool live_check) > +{ > + int dbnum; > + int slot_count = 0; > + > + if (cluster == &old_cluster) > + pg_log(PG_VERBOSE, "\nsource databases:"); > + else > + pg_log(PG_VERBOSE, "\ntarget databases:"); > + > + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > + { > + DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum]; > + > + get_logical_slot_infos_per_db(cluster, pDbInfo); > + slot_count += pDbInfo->slot_arr.nslots; > + > + if (log_opts.verbose) > + { > + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name); > + print_slot_infos(&pDbInfo->slot_arr); > + } > + } > +} > + > > 11a. > Now the variable 'slot_count' is no longer being returned so it seems redundant. > > ~ > > 11b. > What is the 'live_check' parameter for? Nobody is using it. These are needed for 0003, moved. > 12. count_logical_slots > > +int > +count_logical_slots(ClusterInfo *cluster) > +{ > + int dbnum; > + int slotnum = 0; > + > + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > + slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots; > + > + return slotnum; > +} > > IMO this variable should be called something like 'slot_count'. This > is the same review comment also made in a previous review. (See [2] > comment#12). Changed. > 13. print_slot_infos > > + > +static void > +print_slot_infos(LogicalSlotInfoArr *slot_arr) > +{ > + int slotnum; > + > + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++) > + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d", > + slot_arr->slots[slotnum].slotname, > + slot_arr->slots[slotnum].plugin, > + slot_arr->slots[slotnum].two_phase); > +} > > It might be nicer to introduce a variable, instead of all those array > dereferences: > > LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum]; Changed. > 14. > + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++) > + { > + /* > + * Constructs query for creating logical replication slots. > + * > + * XXX: For simplification, pg_create_logical_replication_slot() is > + * used. Is it sufficient? > + */ > + appendPQExpBuffer(query, "SELECT > pg_catalog.pg_create_logical_replication_slot("); > + appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname, > + conn); > + appendPQExpBuffer(query, ", "); > + appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin, > + conn); > + appendPQExpBuffer(query, ", false, %s);", > + slot_arr->slots[slotnum].two_phase ? "true" : "false"); > + > + PQclear(executeQueryOrDie(conn, "%s", query->data)); > + > + resetPQExpBuffer(query); > + } > + > + PQfinish(conn); > + > + destroyPQExpBuffer(query); > + } > + > + end_progress_output(); > + check_ok(); > > 14a > Similar to the previous comment (#13). It might be nicer to introduce > a variable, instead of all those array dereferences: > > LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum]; Changed. > 14b. > It was not clear to me why this command is not being built using > executeQueryOrDie directly instead of using the query buffer. Is there > some reason? I wanted to take care the encoding, that was the reason I used PQExpBuffer functions, especially appendStringLiteralConn(). IIUC executeQueryOrDie() could not take care it. > 14c. > I think it would be cleaner to have a separate res variable like you > used elsewhere: > res = executeQueryOrDie(...) > > instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data)); > in one line Hmm, there are some use cases for PQclear(executeQueryOrDie(...)) style, e.g., set_locale_and_encoding() and set_frozenxids(). I do not think your style is good if the result of the query is not used. please tell me if you find a case that res = executeQueryOrDie(...) is used but result is not checked. > src/bin/pg_upgrade/pg_upgrade. > > 15. > +void get_logical_slot_infos(ClusterInfo *cluster, bool live_check); > > I didn't see a reason for that 'live_check' parameter. It was needed for 0003, moved. > .../pg_upgrade/t/003_logical_replication_slots.pl > > 16. > IMO this would be much easier to read if there were BIG comments > between the actual TEST parts > > For example > > # ------------------------------ > # TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical' > <preparation> > <test> > <cleanup> > > # ------------------------------ > # TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low > <preparation> > <test> > <cleanup> > > # ------------------------------ > # TEST: Successful upgrade > <preparation> > <test> > <cleanup> Added. 0003 also followed the style. > 17. > +# Cause a failure at the start of pg_upgrade because wal_level is replica > +command_fails( > + [ > + 'pg_upgrade', '--no-sync', > + '-d', $old_publisher->data_dir, > + '-D', $new_publisher->data_dir, > + '-b', $bindir, > + '-B', $bindir, > + '-s', $new_publisher->host, > + '-p', $old_publisher->port, > + '-P', $new_publisher->port, > + $mode, > + ], > + 'run of pg_upgrade of old node with wrong wal_level'); > +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", > + "pg_upgrade_output.d/ not removed after pg_upgrade failure"); > > The message is ambiguous > > BEFORE > 'run of pg_upgrade of old node with wrong wal_level' > > SUGGESTION > 'run of pg_upgrade where the new node has the wrong wal_level' Changed. > 18. > +# Create an unnecessary slot on old node > +$old_publisher->start; > +$old_publisher->safe_psql( > + 'postgres', qq[ > + SELECT pg_create_logical_replication_slot('test_slot2', > 'test_decoding', false, true); > +]); > + > +$old_publisher->stop; > + > +# Preparations for the subsequent test. max_replication_slots is set to > +# smaller than existing slots on old node > +$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'"); > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1"); > > > IMO the comment is misleading. It is not an "unnecessary slot", it is > just a 2nd slot. And this is all part of the preparation for the next > test so it should be under the other comment. > > For example SUGGESTION changes like this: > > # Preparations for the subsequent test. > # 1. Create an unnecessary slot on the old node > $old_publisher->start; > $old_publisher->safe_psql( > 'postgres', qq[ > SELECT pg_create_logical_replication_slot('test_slot2', > 'test_decoding', false, true); > ]); > $old_publisher->stop; > # 2. max_replication_slots is set to smaller than the number of slots > (2) present on the old node > $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1"); > # 3. new node wal_level is set correctly > $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'"); Followed the style. > 19. > +# Remove an unnecessary slot and consume WAL records > +$old_publisher->start; > +$old_publisher->safe_psql( > + 'postgres', qq[ > + SELECT pg_drop_replication_slot('test_slot2'); > + SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, > NULL) > +]); > +$old_publisher->stop; > + > > This comment should say more like: > > # Preparations for the subsequent test. Followed above style. > 20. > +# Actual run, pg_upgrade_output.d is removed at the end > > This comment should mention that "successful upgrade is expected" > because all the other prerequisites are now satisfied. The suggestion was added to the comment > 21. > +$new_publisher->start; > +my $result = $new_publisher->safe_psql('postgres', > + "SELECT slot_name, two_phase FROM pg_replication_slots"); > +is($result, qq(test_slot1|t), 'check the slot exists on new node'); > > Should there be a matching new_pulisher->stop;? Not sure it is really needed, but added. Also, the word "node" was replaced to "cluster" because the later word is used in the doc. [1]: https://www.postgresql.org/message-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA%40TYCPR01MB5870.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node