RE: [PoC] pg_upgrade: allow to upgrade publisher node
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Дата | |
Msg-id | TYAPR01MB5866562EF047F2C9DDD1F9DEF51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
(Amit Kapila <amit.kapila16@gmail.com>)
Re: [PoC] pg_upgrade: allow to upgrade publisher node (Masahiko Sawada <sawada.mshk@gmail.com>) Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) RE: [PoC] pg_upgrade: allow to upgrade publisher node ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) |
Список | pgsql-hackers |
Dear Peter, PSA new version patch set. > Here are some review comments for the patch v21-0003 > > ====== > Commit message > > 1. > pg_upgrade fails if the old node has slots which status is 'lost' or they do not > consume all WAL records. These are needed for prevent the data loss. > > ~ > > Maybe some minor brush-up like: > > SUGGESTION > In order to prevent data loss, pg_upgrade will fail if the old node > has slots with the status 'lost', or with unconsumed WAL records. Improved. > src/bin/pg_upgrade/check.c > > 2. check_for_confirmed_flush_lsn > > + /* Check that all logical slots are not in 'lost' state. */ > + res = executeQueryOrDie(conn, > + "SELECT slot_name FROM pg_catalog.pg_replication_slots " > + "WHERE temporary = false AND wal_status = 'lost';"); > + > + ntups = PQntuples(res); > + i_slotname = PQfnumber(res, "slot_name"); > + > + for (i = 0; i < ntups; i++) > + { > + is_error = true; > + > + pg_log(PG_WARNING, > + "\nWARNING: logical replication slot \"%s\" is obsolete.", > + PQgetvalue(res, i, i_slotname)); > + } > + > + PQclear(res); > + > + if (is_error) > + pg_fatal("logical replication slots not to be in 'lost' state."); > + > > 2a. (GENERAL) > The above code for checking lost state seems out of place in this > function which is meant for checking confirmed flush lsn. > > Maybe you jammed both kinds of logic into one function to save on the > extra PGconn or something but IMO two separate functions would be > better. e.g. > - check_for_lost_slots > - check_for_confirmed_flush_lsn Separated into check_for_lost_slots and check_for_confirmed_flush_lsn. > 2b. > + /* Check that all logical slots are not in 'lost' state. */ > > SUGGESTION > /* Check there are no logical replication slots with a 'lost' state. */ Changed. > 2c. > + res = executeQueryOrDie(conn, > + "SELECT slot_name FROM pg_catalog.pg_replication_slots " > + "WHERE temporary = false AND wal_status = 'lost';"); > > This SQL fragment is very much like others in previous patches. Be > sure to make all the cases and clauses consistent with all those > similar SQL fragments. Unified the order. Note that they could not be the completely the same. > 2d. > + is_error = true; > > That doesn't need to be in the loop. Better to just say: > is_error = (ntups > 0); Removed the variable. > 2e. > There is a mix of terms in the WARNING and in the pg_fatal -- e.g. > "obsolete" versus "lost". Is it OK? Unified to 'lost'. > 2f. > + pg_fatal("logical replication slots not to be in 'lost' state."); > > English? And maybe it should be much more verbose... > > "Upgrade of this installation is not allowed because one or more > logical replication slots with a state of 'lost' were detected." I checked other pg_fatal() and the statement like "Upgrade of this installation is not allowed" could not be found. So I used later part. > 3. check_for_confirmed_flush_lsn > > + /* > + * Check that all logical replication slots have reached the latest > + * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot > be > + * done in case of live_check because the server has not been written the > + * SHUTDOWN_CHECKPOINT record yet. > + */ > + if (!live_check) > + { > + res = executeQueryOrDie(conn, > + "SELECT slot_name FROM pg_catalog.pg_replication_slots " > + "WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;", > + old_cluster.controldata.chkpnt_latest_upper, > + old_cluster.controldata.chkpnt_latest_lower); > + > + ntups = PQntuples(res); > + i_slotname = PQfnumber(res, "slot_name"); > + > + for (i = 0; i < ntups; i++) > + { > + is_error = true; > + > + pg_log(PG_WARNING, > + "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet", > + PQgetvalue(res, i, i_slotname)); > + } > + > + PQclear(res); > + PQfinish(conn); > + > + if (is_error) > + pg_fatal("All logical replication slots consumed all the WALs."); > > ~ > > 3a. > /This checks/This check/ The comment was no longer needed, because the caller checks live_check variable. More detail, please see my another post [1]. > 3b. > I don't think the separation of > chkpnt_latest_upper/chkpnt_latest_lower is needed like this. AFAIK > there is an LSN_FORMAT_ARGS(lsn) macro designed for handling exactly > this kind of parameter substitution. Fixed to use the macro. Previously I considered that the header "access/xlogdefs.h" could not be included from pg_upgrade, and it was the reason why I did not use. But it seemed my misunderstanding - I could include the file. > 3c. > + is_error = true; > > That doesn't need to be in the loop. Better to just say: > is_error = (ntups > 0); Removed. > 3d. > + pg_fatal("All logical replication slots consumed all the WALs."); > > The message seems backward. shouldn't it say something like: > "Upgrade of this installation is not allowed because one or more > logical replication slots still have unconsumed WAL records." I used only later part, see above reply. > src/bin/pg_upgrade/controldata.c > > 4. get_control_data > > + /* > + * Upper and lower part of LSN must be read and stored > + * separately because it is reported as %X/%X format. > + */ > + cluster->controldata.chkpnt_latest_upper = > + strtoul(p, &slash, 16); > + cluster->controldata.chkpnt_latest_lower = > + strtoul(++slash, NULL, 16); > > I felt that this field separation code is maybe not necessary. Please > refer to other review comments in this post. Hmm. I thought they must be read separately even if we stored as XLogRecPtr (uint64). This is because the pg_controldata reports the LSN as %X/%X style. Am I missing something? ``` $ pg_controldata -D data_N1/ | grep "Latest checkpoint location" Latest checkpoint location: 0/153C8D0 ``` > src/bin/pg_upgrade/pg_upgrade.h > > 5. ControlData > > + > + uint32 chkpnt_latest_upper; > + uint32 chkpnt_latest_lower; > } ControlData; > > ~ > > Actually, I did not recognise the reason why this cannot be stored > properly as a single XLogRecPtr field. Please see other review > comments in this post. Changed to use XLogRecPtr. See above comment. > .../t/003_logical_replication_slots.pl > > 6. GENERAL > > Many of the changes to this file are just renaming the > 'old_node'/'new_node' to 'old_publisher'/'new_publisher'. > > This seems a basic change not really associated with this patch 0003. > To reduce the code churn, this change should be moved into the earlier > patch where this test file (003_logical_replication_slots.pl) was > first introduced, Moved these renaming to 0002. > 7. > > # Cause a failure at the start of pg_upgrade because slot do not finish > # consuming all the WALs > > ~ > > Can you give a more detailed explanation in the comment of how this > test case achieves what it says? Slightly reworded above and this comment. How do you think? > src/test/regress/sql/misc_functions.sql > > 8. > @@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid'); > SELECT segment_number > 0 AS ok_segment_number, timeline_id > FROM pg_split_walfile_name('000000010000000100000000'); > SELECT segment_number > 0 AS ok_segment_number, timeline_id > - FROM pg_split_walfile_name('ffffffFF00000001000000af'); > + FROM pg_split_walfile_name('ffffffFF00000001000000af'); > \ No newline at end of file > > ~ > > What is this change for? It looks like maybe some accidental > whitespace change happened. It was unexpected, removed. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866691219B9CB280B709600F51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node