Re: Synchronizing slots from primary to standby
От | Amit Kapila |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | CAA4eK1J5zTmm4NE4os59WgU4AZPNb74X-n67pY8SkoDfzsN_jA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Список | pgsql-hackers |
On Tue, Dec 19, 2023 at 5:17 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Dec 18, 2023 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Dec 15, 2023 at 11:03 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > Sorry, I missed attaching the patch. PFA v48. > > > > > > > Few comments on v48_0002 > > ======================== > > Thanks for reviewing. These are addressed in v50. > I was still reviewing the v48 version and have a few comments as below. If some of these are already addressed or not relevant, feel free to ignore them. 1. + /* + * Slot sync worker can be stopped at any time. + * Use exit status 1 so the background worker is restarted. We don't need to start the second line of comment in a separate line. 2. + * The assumption is that these dropped local invalidated slots will get + * recreated in next sync-cycle and it is okay to drop and recreate such slots In the above line '.. local invalidated ..' sounds redundant. Shall we remove it? 3. + if (remote_slot->confirmed_lsn > WalRcv->latestWalEnd) + { + SpinLockRelease(&WalRcv->mutex); + elog(ERROR, "skipping sync of slot \"%s\" as the received slot sync " This error message looks odd to me. At least, it should be exiting instead of skipping because we won't continue after this. 4. + /* User created slot with the same name exists, raise ERROR. */ + if (sync_state == SYNCSLOT_STATE_NONE) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("skipping sync of slot \"%s\" as it is a user created" + " slot", remote_slot->name), + errdetail("This slot has failover enabled on the primary and" + " thus is sync candidate but user created slot with" + " the same name already exists on the standby"))); Same problem as above. The skipping in error message doesn't seem to be suitable for the purpose. Additionally, errdetail message should end with a full stop. 5. + /* Slot ready for sync, so sync it. */ + if (sync_state == SYNCSLOT_STATE_READY) + { + /* + * Sanity check: With hot_standby_feedback enabled and + * invalidations handled appropriately as above, this should never + * happen. + */ + if (remote_slot->restart_lsn < slot->data.restart_lsn) + elog(ERROR, + "not synchronizing local slot \"%s\" LSN(%X/%X)" + " to remote slot's LSN(%X/%X) as synchronization " The start of the error message sounds odd. Shall we say 'cannot synchronize ...'? 6. All except one of the callers of local_slot_update() marks the slot dirty and the same is required as well. I think the remaining caller should also mark it dirty and we should move ReplicationSlotMarkDirty() in the caller space. 7. +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot) { ... + /* + * Copy the invalidation cause from remote only if local slot is not + * invalidated locally, we don't want to overwrite existing one. + */ + if (slot->data.invalidated == RS_INVAL_NONE) + { + SpinLockAcquire(&slot->mutex); + slot->data.invalidated = remote_slot->invalidated; + SpinLockRelease(&slot->mutex); + } ... It doesn't seem that after changing the invalidated flag, we always mark the slot dirty. Am, I missing something? 8. + /* + * Drop local slots that no longer need to be synced. Do it before + * synchronize_one_slot to allow dropping of slots before actual sync + * which are invalidated locally while still valid on the primary server. + */ + drop_obsolete_slots(remote_slot_list); The second part of the above comment seems redundant as that is obvious. 9. +static WalReceiverConn * +remote_connect(void) +{ + WalReceiverConn *wrconn = NULL; + char *err; + + wrconn = walrcv_connect(PrimaryConnInfo, true, false, + cluster_name[0] ? cluster_name : "slotsyncworker", + &err); + if (wrconn == NULL) + ereport(ERROR, + (errcode(ERRCODE_CONNECTION_FAILURE), + errmsg("could not connect to the primary server: %s", err))); + return wrconn; +} Do we need a function for this? It appears to be called from just one place, so not sure if it is helpful to have a function for this. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: Karina LitskevichДата:
Сообщение: Correct the documentation of extension building infrastructure