RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB571693E14A4460D9BFC23FF19483A@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Wednesday, November 29, 2023 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote:


Thanks for the comments.

> ======
> 1. General.
> 
> Previously (see [1] #0) I asked a question about if there is some documentation
> missing. Seems not yet answered.

The document was add in V39-0002 in logicaldecoding.sgml
because some necessary GUCs for slotsync are not in 0001.

> 
> ======
> Commit message
> 
> 2.
> Users can set this flag during CREATE SUBSCRIPTION or during
> pg_create_logical_replication_slot API. Examples:
> 
> CREATE SUBSCRIPTION mysub CONNECTION '..' PUBLICATION mypub WITH
> (failover = true);
> 
> (failover is the last arg)
> SELECT * FROM pg_create_logical_replication_slot('myslot',
> 'pgoutput', false, true, true);
> 
> ~
> 
> I felt it is better to say "Ex1" / "Ex2" (or "1" / "2" or something
> similar) to indicate better where these examples start and finish, otherwise they
> just sort of get lost among the text.

Changed.

> 
> ======
> doc/src/sgml/catalogs.sgml
> 
> 3.
> From previous review ([1] #6) I suggested reordering fields. Hous-san
> wrote: "but I think the functionality of two fields are different and I didn’t find
> the correlation between two fields except for the type of value."
> 
> Yes, that is true. OTOH, I felt grouping the attributes by the same types made
> the docs easier to read.

The document's order should be same as the pg_subscription catalog, and I
prefer not to move the new subfailoverstate in the middle of catalog as the
functionality of them is different.

> 
> ======
> src/backend/commands/subscriptioncmds.c
> 
> 4. CreateSubscription
> 
> + /*
> + * If only the slot_name is specified (without create_slot option),
> + * it is possible that the user intends to use an existing slot on
> + * the publisher, so here we enable failover for the slot if
> + * requested.
> + */
> + else if (opts.slot_name && failover_enabled) {
> 
> Unanswered question from previous review (see [1] #11a). i.e. How does this
> condition ensure that *only* the slot name was specified (like the comment is
> saying)?

It is the else part of 'if (opts.create_slot)', so it means create_slot is not
specified while only slot_name is specified. I have improved the comment.


> 
> ~~~
> 
> 5. AlterSubscription
> 
>   errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when two_phase is enabled"),
>   errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = false,
> or with copy_data = false, or use DROP/CREATE SUBSCRIPTION.")));
> 
> + if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED &&
> + opts.copy_data) ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when failover is enabled"),
> + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh =
> false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION.")));
> +
> 
> There are translations issues same as reported in my previous review (see [1]
> #12b and also several other places as noted in [1]). Hou-san replied that I "can
> start a separate thread to change the twophase related messages, and we can
> change accordingly if it's accepted.", but that's not right IMO because it is only
> the fact that this sysncslot patch is reusing a similar message that warrants the
> need to extract a "common" message part in the first place. So I think it is
> responsibility if this sycslot patch to make this change.

OK, changed.

> 
> ======
> src/backend/replication/logical/tablesync.c
> 
> 6. process_syncing_tables_for_apply
> 
> + if (MySubscription->twophasestate ==
> + LOGICALREP_TWOPHASE_STATE_PENDING)
> + ereport(LOG,
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that two_phase can be enabled",
> + MySubscription->name)));
> +
> + if (MySubscription->failoverstate ==
> + LOGICALREP_FAILOVER_STATE_PENDING)
> + ereport(LOG,
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that failover can be enabled",
> + MySubscription->name)));
> 
> 6a.
> You may end up log 2 restart messages for the same restart. Is it OK?

I think it's OK as it can provide complete information.

> 
> ~
> 
> 6b.
> This is another example where you should share the same common message
> (for less translations)

I adjusted the message there.

> 
> ======
> src/backend/replication/logical/worker.c
> 
> 7.
> + * The logical slot on the primary can be synced to the standby by
> + specifying
> + * the failover = true when creating the subscription. Enabling
> + failover allows
> + * us to smoothly transition to the standby in case the primary gets
> + promoted,
> + * ensuring that we can subscribe to the new primary without losing any data.
> 
> /the failover = true/the failover = true option/
> 
> or
> 
> /the failover = true/failover = true/
> 

Changed.

> ~~~
> 
> 8.
> +
>  #include "postgres.h"
> 
> Unnecessary extra blank line

Removed.

> 
> ======
> src/backend/replication/slot.c
> 
> 9. validate_standby_slots
> 
> There was no reply to the comment in my previous review (see [1] #27).
> Maybe you disagree or maybe accidentally overlooked?

The error message has already been adjusted in V39.
I adjusted the check in this version as well to be consistent.

> 
> ~~~
> 
> 10. check_standby_slot_names
> 
> In previous review I asked ([1] #28) why a special check was needed for "*".
> Hou-san replied that "SplitIdentifierString() does not give error for '*' and '*'
> can be considered as valid value which if accepted can mislead user".
> 
> Sure, but won't the code then just try to find if there is a replication slot called
> "*" and that will fail. That was my point, if the slot name lookup is going to fail
> anyway then why have the extra code for the special "*" case up-front? Note --
> I haven't tried it, so maybe code doesn't work like I think it does.

I think allowing "*" can mislead user because it normally means every slot, but
we don't want to support the "every slot" option as mentioned in the comment.
So I think reject it here is fine. Reporting ERROR because the slot named '*' was not
there may look confusing.

> 
> ======
> src/backend/replication/walsender.c
> 
> 11. PhysicalWakeupLogicalWalSnd
> 
> No reply to my previous review comment ([1] #33). Not done? Disagreed, or
> accidentally missed?

The function mentioned in your previous comment has been removed in
previous version, so I am not sure are you pointing to some other codes
that has similar issues ?

> 
> ~~~
> 
> 12. WalSndFilterStandbySlots
> 
> + /*
> + * If logical slot name is given in standby_slot_names, give WARNING
> + * and skip it. Since it is harmless, so WARNING should be enough, no
> + * need to error-out.
> + */
> + else if (SlotIsLogical(slot))
> + warningfmt = _("cannot have logical replication slot \"%s\" in
> parameter \"%s\", ignoring");
> 
> I previously raised an issue (see [1] #35) thinking this could not happen.
> Hou-san explained how it might happen ("user could drop the logical slot and
> recreate a physical slot with the same name without changing the GUC.") so this
> code was necessary. That is OK, but I think your same explanation in the code
> commen.

OK, I have added comments here.

> 
> ~~~
> 
> 13. WalSndFilterStandbySlots
> 
> + standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc);
> 
> I previously raised issue (see [1] #36). Hou-san replied "I think it's OK to remove
> slots if it's invalidated, dropped, or was changed to logical one as we don't
> need to wait for these slots to catch up anymore."
> 
> Sure, maybe code is fine, but my point was that the code is removing elements
> *more* scenarios than are mentioned by the function comment, so maybe
> update that function comment for all the removal scenarios.

Updated the comments.

> 
> ~~~
> 
> 14. WalSndWaitForStandbyConfirmation
> 
> The comment change from my previous review ([1] #37) not done.
> Disagreed, or accidentally missed?

Thanks for pointing, this was missed.

> 
> ~~~
> 
> 15. WalSndWaitForStandbyConfirmation
> 
> The question about calling ConditionVariablePrepareToSleep in my previous
> review ([1] #39) not answered. Accidentally missed?

I think V39 has already adjusted the order of reload and NIL check in this function.

> 
> ~~~
> 
> 16. WalSndWaitForWal
> 
>   if (RecentFlushPtr != InvalidXLogRecPtr &&
>   loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + {
> + WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots);
> 
> It is better to use XLogRecPtrIsInvalid macro here. I know it was not strictly
> added by your patch, but so much else changed nearby so I thought this should
> be fixed at the same time.

Changed.

> 
> ======
> src/bin/pg_upgrade/info.c
> 
> 17. get_old_cluster_logical_slot_infos
> 
> +
>   slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
> 
> Excessive whitespace.

Removed.

Best Regards,
Hou zj


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby