Re: Improve pg_sync_replication_slots() to wait for primary to advance
От | Ashutosh Bapat |
---|---|
Тема | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Дата | |
Msg-id | CAExHW5syhJm6qhXCtmNAP-=d9qp+SZMtniYCoW7C1dzT7pMJSw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve pg_sync_replication_slots() to wait for primary to advance (shveta malik <shveta.malik@gmail.com>) |
Список | pgsql-hackers |
On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta.malik@gmail.com> wrote: > > > 8) > + /* If refreshing, free the original list structures */ > + if (is_refresh) > + { > + foreach(lc, remote_slot_list) > + { > + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc); > + pfree(old_slot); > + } > + list_free(remote_slot_list); > + } > > We can get rid of 'is_refresh' and can simply check if > 'target_slot_list != NIL', free it. We can use list_free_deep instead > of freeing each element. Having said that, it looks slightly odd to > free the list in this function, I will think more here. Meanwhile, we > can do this. > +1. The function prologue doesn't mention that the original list is deep freed. So a caller may try to access it after this call, which will lead to a crash. As a safe programming practice we should let the caller free the original list if it is not needed anymore OR modify the input list in-place and return it for the convenience of the caller like all list_* interfaces. At least we should document this behavior in the function prologue. You could also use foreach_ptr instead of foreach. > 13) > - errmsg("could not synchronize replication slot \"%s\"", > - remote_slot->name), > - errdetail("Synchronization could lead to data loss, because the > remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the > standby has LSN %X/%08X and catalog xmin %u.", > - LSN_FORMAT_ARGS(remote_slot->restart_lsn), > - remote_slot->catalog_xmin, > - LSN_FORMAT_ARGS(slot->data.restart_lsn), > - slot->data.catalog_xmin)); > + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying", > + remote_slot->name), > + errdetail("Attempting Synchronization could lead to data loss, > because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, > but the standby has LSN %X/%08X and catalog xmin %u.", > + LSN_FORMAT_ARGS(remote_slot->restart_lsn), > + remote_slot->catalog_xmin, > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > + slot->data.catalog_xmin)); > > We can retain the same message as it was put after a lot of > discussion. We can attempt to change if others comment. The idea is > since a worker dumps it in each subsequent cycle (if such a situation > arises), on the same basis now the API can also do so because it is > also performing multiple cycles now. Earlier I had suggested changing > it for API based on messages 'continuing to wait..' which are no > longer there now. Also we usually don't use capital letters at the start of the error message. Any reason this is different? Some more + * When called from pg_sync_replication_slots, use a fixed 2 + * second wait time. Function prologue doesn't mention this. Probably the prologue should contain only the first sentence there. Rest of the prologues just repeat comments in the function. The function is small enough that a reader could read the details from the function instead of the prologue. + wait_time = SLOTSYNC_API_NAPTIME_MS; + } else { } else and { should be on separate lines. -- Best Wishes, Ashutosh Bapat
В списке pgsql-hackers по дате отправления: