Обсуждение: Re: Improve pg_sync_replication_slots() to wait for primary to advance

Поиск
Список
Период
Сортировка

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Japin Li
Дата:
On Thu, 06 Nov 2025 at 18:53, Ajin Cherian <itsajin@gmail.com> wrote:
> On Tue, Nov 4, 2025 at 5:23 PM shveta malik <shveta.malik@gmail.com> wrote:
>>
>> >
>> > I have addressed the above comments in patch v21.
>> >
>>
>> Thank You. Please find a few comments:
>>
>> 1)
>> + fparams.slot_names = slot_names = NIL;
>>
>> I think it is not needed to set slot_names to NIL.
>>
>> 2)
>> -    WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
>> +    WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
>>
>> The new name does not seem appropriate. For the slotsync-worker case,
>> even when the primary is not behind, the worker still waits but it is
>> not waiting for primary to catch-up. I could not find a better name
>> except the original one 'WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN'. We can
>> change the explanation to :
>>
>> "Waiting in main loop of slot sync worker and slot sync API."
>> Or
>> "Waiting in main loop of slot synchronization."
>>
>> If anyone has any better name suggestions, we can consider changing.
>
> Changed as suggested above.
>
>>
>> 3)
>>
>> +# Attempt to synchronize slots using API. The API will continue retrying
>> +# synchronization until the remote slot catches up with the locally reserved
>> +# position. The API will not return until this happens, to be able to make
>> +# further calls, call the API in a background process.
>>
>> Shall we remove 'with the locally reserved position', it’s already
>> explained in the test header and the comment is good enough even
>> without it.
>>
>
> Changed.
>
>> 4)
>> +# Confirm log that the slot has been synced after becoming sync-ready.
>>
>> Shall we just say:
>> Confirm from the log that the slot is sync-ready now.
>>
>> 5)
>>  # Synchronize the primary server slots to the standby.
>>  $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
>> @@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
>>  is( $standby1->safe_psql(
>>   'postgres',
>>   q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
>> ('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
>> +
>>   ),
>>
>> Redundant change.
>
> Removed.
>
> Attaching patch v22 addressing the above comments.

@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN    "Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN    "Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN    "Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM    "Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN    "Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN    "Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_MAIN    "Waiting in main loop of slot synchronization."
 SYSLOGGER_MAIN    "Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN    "Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN    "Waiting in main loop of WAL sender process."

I've noticed that all events are sorted alphabetically.  I think we should keep
the order of REPLICATION_SLOTSYNC_MAIN unchanged.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Fri, Nov 7, 2025 at 10:36 AM Japin Li <japinli@hotmail.com> wrote:
>
> >
> > Attaching patch v22 addressing the above comments.
>
> @@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN    "Waiting in main loop of logical replication apply process."
>  LOGICAL_LAUNCHER_MAIN  "Waiting in main loop of logical replication launcher process."
>  LOGICAL_PARALLEL_APPLY_MAIN    "Waiting in main loop of logical replication parallel apply process."
>  RECOVERY_WAL_STREAM    "Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
> -REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot sync worker."
>  REPLICATION_SLOTSYNC_SHUTDOWN  "Waiting for slot sync worker to shut down."
> +REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot synchronization."
>  SYSLOGGER_MAIN "Waiting in main loop of syslogger process."
>  WAL_RECEIVER_MAIN      "Waiting in main loop of WAL receiver process."
>  WAL_SENDER_MAIN        "Waiting in main loop of WAL sender process."
>
> I've noticed that all events are sorted alphabetically.  I think we should keep
> the order of REPLICATION_SLOTSYNC_MAIN unchanged.
>

+1.

Few trivial comments:

1)
Since we have always used the term 'SQL function' rather than API in
existing code, shall we change all references of API to 'SQL function'
in current patch:

+ * If the pg_sync_replication API is used to sync the slots, and if the slots
"If the SQL function pg_sync_replication_slots() is used.."

+ * the reasons mentioned above, then the API also waits and retries until the
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
 pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

+ * the API can retry.
API --> SQL function

+ /* Set this, so that API can retry */
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ *   API to track if any slots could not be
pg_sync_replication_slots  API --> SQL function pg_sync_replication_slots()

+ * Interrupt handler for pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()


2)
ProcessSlotSyncAPIInterrupts
slotsync_api_reread_config
-- These also have API in it, but I do not have any better name
suggestions here, we can retain the current ones and see what others
say.

3)
/*
 * Re-read the config file.
 *
 * Exit if any of the slot sync GUCs have changed. The postmaster will
 * restart it.
 */
static void
slotsync_reread_config(void)

Shall we change this existing comment to: Re-read the config file for
slot sync worker.

4)

+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)

Shall we change comment to:
/*
 * Re-read the config file for SQL function pg_sync_replication_slots()
 *
 * Emit error if any of the slot sync GUCs have changed.
 */

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ashutosh Bapat
Дата:
On Mon, Nov 10, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
>
> 2)
> ProcessSlotSyncAPIInterrupts
> slotsync_api_reread_config
> -- These also have API in it, but I do not have any better name
> suggestions here, we can retain the current ones and see what others
> say.

ProcessSlotSyncInterrupts() handles shutdown waiting,
ProcessSlotSyncAPIInterrupts doesn't. Why is this difference? It will
be good to explain why we need two different functions for worker and
SQL function and also explain the difference between them.

$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");

I think, the intention behind dropping the slot is to be able to
create it again for the next test. But there is no comment here
explaining that. There is also no comment explaining why we are
dropping both slots here; when the test only needs dropping one.
That's going to create confusion. One might think that all the slots
need to be dropped at this stage, and drop and create any future slots
that are used by prior code, for example. At the end of this test, we
recreate the slot using pg_create_logical_replication_slot(), which is
different method of creating slot than this test does. Though I can
understand the reason, it's not apparent. Generally reusing slot names
across multiple tests (in this file) is a source of confusion. But at
least for the test you are adding, you could use a different slot name
to avoid confusion.
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");

If the standby does not receive the WAL corresponding to the DDL
before this function is executed, the slot will get synchronized
immediately. I think we have to make sure that the standby has
received the DDL before executing this function.

Also most of the code which uses query_until has pattern like this:
$h->query_until(qr/start/, q{\echo start
SQL command});
But we expect an empty string here. Why this difference?

I think we need a similar test to test promotion while the function is
waiting for the slot to become sync-ready.

SyncReplicationSlots() and the main loop in ReplSlotSyncWorkerMain()
are similar with some functional differences. Some part of their code
needs to be kept in sync in future. How do we achieve that? At least
we need a comment saying so in each of those patches and keep those
two codes in proximity.

--
Best Wishes,
Ashutosh Bapat



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Mon, Nov 10, 2025 at 8:31 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Nov 7, 2025 at 10:36 AM Japin Li <japinli@hotmail.com> wrote:
> >
> > >
> > > Attaching patch v22 addressing the above comments.
> >
> > @@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN    "Waiting in main loop of logical replication apply process."
> >  LOGICAL_LAUNCHER_MAIN  "Waiting in main loop of logical replication launcher process."
> >  LOGICAL_PARALLEL_APPLY_MAIN    "Waiting in main loop of logical replication parallel apply process."
> >  RECOVERY_WAL_STREAM    "Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
> > -REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot sync worker."
> >  REPLICATION_SLOTSYNC_SHUTDOWN  "Waiting for slot sync worker to shut down."
> > +REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot synchronization."
> >  SYSLOGGER_MAIN "Waiting in main loop of syslogger process."
> >  WAL_RECEIVER_MAIN      "Waiting in main loop of WAL receiver process."
> >  WAL_SENDER_MAIN        "Waiting in main loop of WAL sender process."
> >
> > I've noticed that all events are sorted alphabetically.  I think we should keep
> > the order of REPLICATION_SLOTSYNC_MAIN unchanged.
> >
>
> +1.
>

Yes, changed.

> Few trivial comments:
>
> 1)
> Since we have always used the term 'SQL function' rather than API in
> existing code, shall we change all references of API to 'SQL function'
> in current patch:
>
> + * If the pg_sync_replication API is used to sync the slots, and if the slots
> "If the SQL function pg_sync_replication_slots() is used.."
>
> + * the reasons mentioned above, then the API also waits and retries until the
> API --> SQL function
>
> + * persist. It is utilized by the pg_sync_replication_slots() API.
>  pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()
>
> + * the API can retry.
> API --> SQL function
>
> + /* Set this, so that API can retry */
> API --> SQL function
>
> + * persist. It is utilized by the pg_sync_replication_slots() API.
> pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()
>
> + * slot_persistence_pending - boolean used by pg_sync_replication_slots
> + *   API to track if any slots could not be
> pg_sync_replication_slots  API --> SQL function pg_sync_replication_slots()
>
> + * Interrupt handler for pg_sync_replication_slots() API.
> pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()
>
>
> 2)
> ProcessSlotSyncAPIInterrupts
> slotsync_api_reread_config
> -- These also have API in it, but I do not have any better name
> suggestions here, we can retain the current ones and see what others
> say.
>

Changed.

> 3)
> /*
>  * Re-read the config file.
>  *
>  * Exit if any of the slot sync GUCs have changed. The postmaster will
>  * restart it.
>  */
> static void
> slotsync_reread_config(void)
>
> Shall we change this existing comment to: Re-read the config file for
> slot sync worker.
>

> 4)
>
> +/*
> + * Re-read the config file and check for critical parameter changes.
> + *
> + */
> +static void
> +slotsync_api_reread_config(void)
>
> Shall we change comment to:
> /*
>  * Re-read the config file for SQL function pg_sync_replication_slots()
>  *
>  * Emit error if any of the slot sync GUCs have changed.
>  */
>
Changed.




On Mon, Nov 10, 2025 at 9:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Nov 10, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> >
> > 2)
> > ProcessSlotSyncAPIInterrupts
> > slotsync_api_reread_config
> > -- These also have API in it, but I do not have any better name
> > suggestions here, we can retain the current ones and see what others
> > say.
>
> ProcessSlotSyncInterrupts() handles shutdown waiting,
> ProcessSlotSyncAPIInterrupts doesn't. Why is this difference? It will
> be good to explain why we need two different functions for worker and
> SQL function and also explain the difference between them.
>

I've updated the function header to explain this. The slot sync worker
is a specific background worker while the API runs in the regular
backend, so special handling is not needed.


> $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
> +$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
>
> I think, the intention behind dropping the slot is to be able to
> create it again for the next test. But there is no comment here
> explaining that. There is also no comment explaining why we are
> dropping both slots here; when the test only needs dropping one.
> That's going to create confusion. One might think that all the slots
> need to be dropped at this stage, and drop and create any future slots
> that are used by prior code, for example. At the end of this test, we
> recreate the slot using pg_create_logical_replication_slot(), which is
> different method of creating slot than this test does. Though I can
> understand the reason, it's not apparent. Generally reusing slot names
> across multiple tests (in this file) is a source of confusion. But at
> least for the test you are adding, you could use a different slot name
> to avoid confusion.

I've added a comment there that dropping both the slots is required
for the next test. Also I cannot change the name of the slot as the
next tests need the same slot synced.

> +# Create some DDL on the primary so that the slot lags behind the standby
> +$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
> +
> +# Attempt to synchronize slots using API. The API will continue retrying
> +# synchronization until the remote slot catches up.
> +# The API will not return until this happens, to be able to make
> +# further calls, call the API in a background process.
> +my $log_offset = -s $standby1->logfile;
> +
> +my $h = $standby1->background_psql('postgres', on_error_stop => 0);
> +
> +$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
>
> If the standby does not receive the WAL corresponding to the DDL
> before this function is executed, the slot will get synchronized
> immediately. I think we have to make sure that the standby has
> received the DDL before executing this function.

Yes, I've added a line to make sure that the stanbdy has caught up.

>
> Also most of the code which uses query_until has pattern like this:
> $h->query_until(qr/start/, q{\echo start
> SQL command});
> But we expect an empty string here. Why this difference?
>

I've modified it as suggested.

> I think we need a similar test to test promotion while the function is
> waiting for the slot to become sync-ready.
>

Unfortunately, that will make this test too long if I add one more
wait loop for slot sync.

> SyncReplicationSlots() and the main loop in ReplSlotSyncWorkerMain()
> are similar with some functional differences. Some part of their code
> needs to be kept in sync in future. How do we achieve that? At least
> we need a comment saying so in each of those patches and keep those
> two codes in proximity.
>

I've added a comment in the header of ReplSlotSyncWorkerMain to suggest this.

Attaching patch v23 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Amit Kapila
Дата:
On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Attaching patch v23 addressing these comments.
>

Few comments:
=============
1.
In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.

I think a slotsync worker should still be a recommended method. So, we
shouldn't remove the last line.

2. I think we can unify slotsync_api_reread_config() and
ProcessSlotSyncAPIInterrupts() with corresponding existing functions
for slotsync worker. Having separate functions for API and worker to
handle interrupts looks odd and bug-prone w.r.t future changes in this
area.

3.
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+ List *slot_names = NIL;
+ MemoryContext oldcontext;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ foreach_ptr(RemoteSlot, remote_slot, remote_slots)

Why did we allocate this memory in TopMemoryContext here? We only need
till this function executes, isn't CurrentMemoryContext (which I think
is ExprContext) sufficient, if not, why? If we use some
function/query-level context then we don't need to make it part of
SlotSyncApiFailureParams. If we can get rid of slot_names from
SlotSyncApiFailureParams then we probably don't need struct
SlotSyncApiFailureParams.

--
With Regards,
Amit Kapila.



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> Attaching patch v23 addressing these comments.
>

Thanks for the patch.

I observed that if the API is taking a nap in between slot sync cycles
and a promotion is triggered during that time, the promotion has to
wait for the entire nap period to finish before slot-sync stops and
the process can continue. There should be a mechanism to wake up the
backend so the API can exit early once stopSignaled is set. How about
doing SetLatch for the process doing synchronization in
ShutDownSlotSync()?

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Wed, Nov 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > Attaching patch v23 addressing these comments.
> >
>
> Few comments:
> =============
> 1.
> In contrast, automatic synchronization
>        via <varname>sync_replication_slots</varname> provides continuous slot
>        updates, enabling seamless failover and supporting high availability.
> -      Therefore, it is the recommended method for synchronizing slots.
>
> I think a slotsync worker should still be a recommended method. So, we
> shouldn't remove the last line.
>

Changed.

> 2. I think we can unify slotsync_api_reread_config() and
> ProcessSlotSyncAPIInterrupts() with corresponding existing functions
> for slotsync worker. Having separate functions for API and worker to
> handle interrupts looks odd and bug-prone w.r.t future changes in this
> area.
>

I’ve refactored and unified slotsync_api_reread_config() and
ProcessSlotSyncAPIInterrupts() with their existing counterparts. As
part of this, I switched the shutdown signal for slot-syncing workers
and backends from SIGINT to SIGUSR1, so that all slot-synchronization
processes use a consistent signaling model. Background workers handle
SIGINT via StatementCancelHandler, which is not suitable for
coordinated shutdowns, but SIGUSR1 reliably sets the process latch and
wakes up both worker types. Once awakened, these processes invoke
ProcessSlotSyncInterrupts(), which now checks
SlotSyncCtx->stopSignaled to perform a clean shutdown with appropriate
logs.

> 3.
> +/*
> + * Helper function to extract slot names from a list of remote slots
> + */
> +static List *
> +extract_slot_names(List *remote_slots)
> +{
> + List *slot_names = NIL;
> + MemoryContext oldcontext;
> +
> + /* Switch to long-lived TopMemoryContext to store slot names */
> + oldcontext = MemoryContextSwitchTo(TopMemoryContext);
> +
> + foreach_ptr(RemoteSlot, remote_slot, remote_slots)
>
> Why did we allocate this memory in TopMemoryContext here? We only need
> till this function executes, isn't CurrentMemoryContext (which I think
> is ExprContext) sufficient, if not, why? If we use some
> function/query-level context then we don't need to make it part of
> SlotSyncApiFailureParams. If we can get rid of slot_names from
> SlotSyncApiFailureParams then we probably don't need struct
> SlotSyncApiFailureParams.
>

In further testing, I've found that a Transaction is always started
when pg_sync_replication_slots() is called, and there was no need to
start new transactions and there was no need for a seperate memory
context for remote_slots. I think the confusion was probably from an
earlier bug in my code. I have added an assert to make sure that
transactions are started when in the API (should I make it an error
instead?).

On Wed, Nov 19, 2025 at 6:05 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Attaching patch v23 addressing these comments.
> >
>
> Thanks for the patch.
>
> I observed that if the API is taking a nap in between slot sync cycles
> and a promotion is triggered during that time, the promotion has to
> wait for the entire nap period to finish before slot-sync stops and
> the process can continue. There should be a mechanism to wake up the
> backend so the API can exit early once stopSignaled is set. How about
> doing SetLatch for the process doing synchronization in
> ShutDownSlotSync()?
>

Yes. To address this, I now also store the background worker pid which
is calling pg_sync_replication_slots() in SlotSyncCtx->pid, and I've
modified the ShutDownSlotSync logic to issue a SIGUSR1 which will
SetLatch on SlotSyncCtx->pid.

Attaching patch v24, addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Fri, Nov 21, 2025 at 9:14 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> Attaching patch v24, addressing the above comments.
>

Thanks for the patch. Please find a few comments:


1)
Instead of passing an argument to slotsync_reread_config and
ProcessSlotSyncInterrupts, we can use 'AmLogicalSlotSyncWorkerProcess'
to distinguish the worker and API.

2)
Also, since we are not using a separate memory context, we don't need
to make structure 'SlotSyncApiFailureParams' for slot_names failure.
slot_names will be freed with the memory-context itself when
exec_simple_query finishes.

3)
- if (old_sync_replication_slots != sync_replication_slots)
+ /* Worker-specific check for sync_replication_slots change */
+ if (!from_api && old_sync_replication_slots != sync_replication_slots)
  {
  ereport(LOG,
- /* translator: %s is a GUC variable name */
- errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled", "sync_replication_slots"));
+ /* translator: %s is a GUC variable name */
+ errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled",
+    "sync_replication_slots"));
  proc_exit(0);
  }

Here, we need not to have different flow for api and worker. Both can
quit sync when this parameter is changed. The idea is if someone
enables 'sync_replication_slots' when API is working, that means we
need to start slot-sync worker, so it is okay if the API notices this
and exits too.

4)
+ if (from_api)
+ elevel = ERROR;
+ else
+ elevel = LOG;

- proc_exit(0);
+ ereport(elevel,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));
+

We can do:
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR, ...);

5)
 SlotSyncCtx->syncing = true;

+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+

We can shift Assert before we set the shared-memory flag 'SlotSyncCtx->syncing'.

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Fri, Nov 21, 2025 at 8:10 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Nov 21, 2025 at 9:14 AM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Attaching patch v24, addressing the above comments.
> >
>
> Thanks for the patch. Please find a few comments:
>
>
> 1)
> Instead of passing an argument to slotsync_reread_config and
> ProcessSlotSyncInterrupts, we can use 'AmLogicalSlotSyncWorkerProcess'
> to distinguish the worker and API.
>

Changed as such.

> 2)
> Also, since we are not using a separate memory context, we don't need
> to make structure 'SlotSyncApiFailureParams' for slot_names failure.
> slot_names will be freed with the memory-context itself when
> exec_simple_query finishes.
>

Removed.

> 3)
> - if (old_sync_replication_slots != sync_replication_slots)
> + /* Worker-specific check for sync_replication_slots change */
> + if (!from_api && old_sync_replication_slots != sync_replication_slots)
>   {
>   ereport(LOG,
> - /* translator: %s is a GUC variable name */
> - errmsg("replication slot synchronization worker will shut down
> because \"%s\" is disabled", "sync_replication_slots"));
> + /* translator: %s is a GUC variable name */
> + errmsg("replication slot synchronization worker will shut down
> because \"%s\" is disabled",
> +    "sync_replication_slots"));
>   proc_exit(0);
>   }
>
> Here, we need not to have different flow for api and worker. Both can
> quit sync when this parameter is changed. The idea is if someone
> enables 'sync_replication_slots' when API is working, that means we
> need to start slot-sync worker, so it is okay if the API notices this
> and exits too.
>

changed but used a different error message.

> 4)
> + if (from_api)
> + elevel = ERROR;
> + else
> + elevel = LOG;
>
> - proc_exit(0);
> + ereport(elevel,
> + errmsg("replication slot synchronization will stop because of a
> parameter change"));
> +
>
> We can do:
> ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR, ...);
>

changed as such.

> 5)
>  SlotSyncCtx->syncing = true;
>
> + /* The worker pid must not be already assigned in SlotSyncCtx */
> + Assert(SlotSyncCtx->pid == InvalidPid);
> +
>
> We can shift Assert before we set the shared-memory flag 'SlotSyncCtx->syncing'.
>

Done.

Attaching patch v25 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Mon, Nov 24, 2025 at 12:12 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Attaching patch v25 addressing the above comments.
>

The patch needs rebase due to recent commit 76b78721.

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
 A few comments:

1)
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+ WalReceiverConn *wrconn;
+ List *slot_names;
+} SlotSyncApiFailureParams;
+

We can get rid of it now as we do not use it.

2)
ProcessSlotSyncInterrupts():

+ if (!AmLogicalSlotSyncWorkerProcess())
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));
+ else
+ {

Can we please reverse the if-else i.e. first worker and then API.
Negated if-condition can be avoided in this case.

3)

slotsync_reread_config():
+ /* Worker-specific check for sync_replication_slots change */

Now since we check for both API and worker, this comment is not needed.

4)
- ereport(LOG,
- errmsg("replication slot synchronization worker will restart because
of a parameter change"));

- /*
- * Reset the last-start time for this worker so that the postmaster
- * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
- */
- SlotSyncCtx->last_start_time = 0;
+ ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));

Here, we should retain the same old message for worker i.e. 'worker
will restart..'. instead of 'synchronization will stop'. I find the
old message better in this case.

5)
slotsync_reread_config() is slightly difficult to follow.
I think in the case of API, we can display a common error message
instead of 2 different messages for 'sync_replication_slot change' and
the rest of the parameters. We can mark if any of the parameters
changed in both 'if' blocks and if the current process has not exited,
then at the end based on 'parameter-changed', we can deal with API by
giving a common message. Something like:

/*
 * If we have reached here with a parameter change, we must be running
 * in SQL function, emit error in such a case.
 */
if (parameter_changed (new variable))
{
Assert (!AmLogicalSlotSyncWorkerProcess);
ereport(ERROR,
errmsg("replication slot synchronization will stop because of a
parameter change"));
}

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Wed, Nov 26, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>  A few comments:
>
> 1)
> +/*
> + * Structure holding parameters that need to be freed on error in
> + * pg_sync_replication_slots()
> + */
> +typedef struct SlotSyncApiFailureParams
> +{
> + WalReceiverConn *wrconn;
> + List *slot_names;
> +} SlotSyncApiFailureParams;
> +
>
> We can get rid of it now as we do not use it.
>

Removed.

> 2)
> ProcessSlotSyncInterrupts():
>
> + if (!AmLogicalSlotSyncWorkerProcess())
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot continue replication slots synchronization"
> +    " as standby promotion is triggered"));
> + else
> + {
>
> Can we please reverse the if-else i.e. first worker and then API.
> Negated if-condition can be avoided in this case.
>

Changed.

> 3)
>
> slotsync_reread_config():
> + /* Worker-specific check for sync_replication_slots change */
>
> Now since we check for both API and worker, this comment is not needed.
>

Removed.

> 4)
> - ereport(LOG,
> - errmsg("replication slot synchronization worker will restart because
> of a parameter change"));
>
> - /*
> - * Reset the last-start time for this worker so that the postmaster
> - * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
> - */
> - SlotSyncCtx->last_start_time = 0;
> + ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> + errmsg("replication slot synchronization will stop because of a
> parameter change"));
>
> Here, we should retain the same old message for worker i.e. 'worker
> will restart..'. instead of 'synchronization will stop'. I find the
> old message better in this case.
>
> 5)
> slotsync_reread_config() is slightly difficult to follow.
> I think in the case of API, we can display a common error message
> instead of 2 different messages for 'sync_replication_slot change' and
> the rest of the parameters. We can mark if any of the parameters
> changed in both 'if' blocks and if the current process has not exited,
> then at the end based on 'parameter-changed', we can deal with API by
> giving a common message. Something like:
>
> /*
>  * If we have reached here with a parameter change, we must be running
>  * in SQL function, emit error in such a case.
>  */
> if (parameter_changed (new variable))
> {
> Assert (!AmLogicalSlotSyncWorkerProcess);
> ereport(ERROR,
> errmsg("replication slot synchronization will stop because of a
> parameter change"));
> }
>

Fixed as above.

I've addressed the above comments as well as rebased the patch based
on changes in commit 76b7872 in patch v26

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Fri, Nov 28, 2025 at 10:16 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> Fixed as above.
>
> I've addressed the above comments as well as rebased the patch based
> on changes in commit 76b7872 in patch v26
>

Thanks for the patch. Please find a few trivial comments:

1)
+ if (AmLogicalSlotSyncWorkerProcess())
+ Assert(sync_replication_slots);

Here too we can use 'worker'.

2)
+ /* check for sync_replication_slots change */

check --> Check

3)
Assert (!worker)
Extra space in between.

4)
check_and_set_sync_info() and ShutDownSlotSync() refers to the pid as
worker_pid. But now it could be backend-pid as well.
Using 'worker' in this variable could be misleading. Shall we make it
sync_process_pid?

5)
/*
 * Interrupt handler for main loop of slot sync worker.
 */
static void
ProcessSlotSyncInterrupts()

We can modify the comment to include API as well.

6)
/*
 * Shut down the slot sync worker.
 *
 * This function sends signal to shutdown slot sync worker, if required. It
 * also waits till the slot sync worker has exited or
 * pg_sync_replication_slots() has finished.
 */
void
ShutDownSlotSync(void)

We should change comments to give details on API as well.

7)
+# Remove the standby from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;

We can update the comment to below for better clarity:
Remove the dropped sb1_slot from the ...

8)
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.

We can move these comments atop:
my $h = $standby2->background_psql('postgres', on_error_stop => 0);

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Fri, Nov 28, 2025 at 5:03 PM Japin Li <japinli@hotmail.com> wrote:
>
> 1.
> Initialize slot_persistence_pending to false (to avoid uninitialized values, or
> initialize to true by mistaken) in update_and_persist_local_synced_slot(). This
> aligns with the handling of found_consistent_snapshot and remote_slot_precedes
> in update_local_synced_slot().
>
> diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
> index 20eada3393..c55ba11f17 100644
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
>         bool            found_consistent_snapshot = false;
>         bool            remote_slot_precedes = false;
>
> +       if (slot_persistence_pending)
> +               *slot_persistence_pending = false;
> +
>         /* Slotsync skip stats are handled in function update_local_synced_slot() */
>         (void) update_local_synced_slot(remote_slot, remote_dbid,
>                                                                         &found_consistent_snapshot,
>

I don't understand what the comment is here.

> 2.
> This change seems unnecessary。
>
>  static void
> -slotsync_reread_config(void)
> +slotsync_reread_config()
>
>  static void
> -ProcessSlotSyncInterrupts(void)
> +ProcessSlotSyncInterrupts()
>

Fixed.

> 3.
> Since we are already caching the result of AmLogicalSlotSyncWorkerProcess() in
> a local worker variable, how about applying this replacement:
> s/if (AmLogicalSlotSyncWorkerProcess())/if (worker)/g?
>
> +       bool            worker = AmLogicalSlotSyncWorkerProcess();
> +       bool            parameter_changed = false;
>
> -       Assert(sync_replication_slots);
> +       if (AmLogicalSlotSyncWorkerProcess())
> +               Assert(sync_replication_slots);
>

Fixed.



On Fri, Nov 28, 2025 at 5:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch. Please find a few trivial comments:
>
> 1)
> + if (AmLogicalSlotSyncWorkerProcess())
> + Assert(sync_replication_slots);
>
> Here too we can use 'worker'.
>

Fixed.

> 2)
> + /* check for sync_replication_slots change */
>
> check --> Check
>

Fixed.

> 3)
> Assert (!worker)
> Extra space in between.
>

Fixed

> 4)
> check_and_set_sync_info() and ShutDownSlotSync() refers to the pid as
> worker_pid. But now it could be backend-pid as well.
> Using 'worker' in this variable could be misleading. Shall we make it
> sync_process_pid?
>

Changed.

> 5)
> /*
>  * Interrupt handler for main loop of slot sync worker.
>  */
> static void
> ProcessSlotSyncInterrupts()
>
> We can modify the comment to include API as well.

Changed.

>
> 6)
> /*
>  * Shut down the slot sync worker.
>  *
>  * This function sends signal to shutdown slot sync worker, if required. It
>  * also waits till the slot sync worker has exited or
>  * pg_sync_replication_slots() has finished.
>  */
> void
> ShutDownSlotSync(void)
>
> We should change comments to give details on API as well.
>

changed.

> 7)
> +# Remove the standby from the synchronized_standby_slots list and reload the
> +# configuration.
> +$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
> +$primary->reload;
>
> We can update the comment to below for better clarity:
> Remove the dropped sb1_slot from the ...
>

Changed.

> 8)
> +# Attempt to synchronize slots using API. The API will continue retrying
> +# synchronization until the remote slot catches up.
> +# The API will not return until this happens, to be able to make
> +# further calls, call the API in a background process.
>
> We can move these comments atop:
> my $h = $standby2->background_psql('postgres', on_error_stop => 0);
>

Changed.

Attached patch v27 addresses the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> Attached patch v27 addresses the above comments.
>

Thanks for the patch. Please find a few comments:

1)
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+

We can mention just 'pid' here instead of 'worker pid'

2)
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
fetch_or_refresh_remote_slots --> fetch_remote_slots

3)
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
+

We can get rid of this blank line at the start of the function.

4)
 /*
  * Shut down the slot sync worker.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if
required. It
  * also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
Shall we change comment to something like (rephrase if required):

Shut down the slot synchronization.
This function wakes up the slot sync process (either worker or backend
running SQL function) and sets stopSignaled=true
so that worker can exit or SQL function pg_sync_replication_slots()
can finish. It also waits till the slot sync worker has exited or
pg_sync_replication_slots() has finished.

5)
We should change the comment atop 'SlotSyncCtxStruct' as well to
mention that this pid is either the slot sync worker's pid or
backend's pid running the SQL function. It is needed by the startup
process to wake these up, so that they can stop synchronization on
seeing stopSignaled.  <please rephrase as needed>

6)
+ ereport(LOG,
+ errmsg("replication slot synchronization worker is shutting down on
receiving SIGUSR1"));

SIGUSR1 was actually just a wake-up signal. We may change the comment to:
replication slot synchronization worker is shutting down as promotion
is triggered.

7)
update_synced_slots_inactive_since:
/* The slot sync worker or SQL function mustn't be running by now */
- Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+ Assert(!SlotSyncCtx->syncing);

Regarding this, I see that 'update_synced_slots_inactive_since' is
only called when we are sure that 'syncing' is false. So shouldn't pid
be also Invalid by that time? Even if it was backend's pid to start
with, but since backend has stopped syncing (finished or error-ed
out),
pid should be reset to Invalid in such a case. And this Assert need
not to be changed.

8)

+ if (sync_process_pid != InvalidPid)
+ kill(sync_process_pid, SIGUSR1);

We can write comments to say wake-up slot sync process.

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Ajin Cherian
Дата:
On Tue, Dec 2, 2025 at 8:35 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Attached patch v27 addresses the above comments.
> >
>
> Thanks for the patch. Please find a few comments:
>
> 1)
> + /* The worker pid must not be already assigned in SlotSyncCtx */
> + Assert(SlotSyncCtx->pid == InvalidPid);
> +
>
> We can mention just 'pid' here instead of 'worker pid'
>

Changed.

> 2)
> + /*
> + * The syscache access in fetch_or_refresh_remote_slots() needs a
> + * transaction env.
> + */
> fetch_or_refresh_remote_slots --> fetch_remote_slots
>
> 3)
>  SyncReplicationSlots(WalReceiverConn *wrconn)
>  {
> +
>
> We can get rid of this blank line at the start of the function.
>

Fixed.

> 4)
>  /*
>   * Shut down the slot sync worker.
>   *
> - * This function sends signal to shutdown slot sync worker, if required. It
> + * This function sends signal to shutdown the slot sync process, if
> required. It
>   * also waits till the slot sync worker has exited or
>   * pg_sync_replication_slots() has finished.
>   */
> Shall we change comment to something like (rephrase if required):
>
> Shut down the slot synchronization.
> This function wakes up the slot sync process (either worker or backend
> running SQL function) and sets stopSignaled=true
> so that worker can exit or SQL function pg_sync_replication_slots()
> can finish. It also waits till the slot sync worker has exited or
> pg_sync_replication_slots() has finished.
>

Changed.

> 5)
> We should change the comment atop 'SlotSyncCtxStruct' as well to
> mention that this pid is either the slot sync worker's pid or
> backend's pid running the SQL function. It is needed by the startup
> process to wake these up, so that they can stop synchronization on
> seeing stopSignaled.  <please rephrase as needed>
>

Changed.

> 6)
> + ereport(LOG,
> + errmsg("replication slot synchronization worker is shutting down on
> receiving SIGUSR1"));
>
> SIGUSR1 was actually just a wake-up signal. We may change the comment to:
> replication slot synchronization worker is shutting down as promotion
> is triggered.
>

Changed.

> 7)
> update_synced_slots_inactive_since:
> /* The slot sync worker or SQL function mustn't be running by now */
> - Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
> + Assert(!SlotSyncCtx->syncing);
>
> Regarding this, I see that 'update_synced_slots_inactive_since' is
> only called when we are sure that 'syncing' is false. So shouldn't pid
> be also Invalid by that time? Even if it was backend's pid to start
> with, but since backend has stopped syncing (finished or error-ed
> out),
> pid should be reset to Invalid in such a case. And this Assert need
> not to be changed.
>

Fixed.

> 8)
>
> + if (sync_process_pid != InvalidPid)
> + kill(sync_process_pid, SIGUSR1);
>
> We can write comments to say wake-up slot sync process.
>

Added comments.

Attaching patch v28 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
shveta malik
Дата:
On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 8:35 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > >
> > > Attached patch v27 addresses the above comments.
> > >
> >
> > Thanks for the patch. Please find a few comments:
> >
> > 1)
> > + /* The worker pid must not be already assigned in SlotSyncCtx */
> > + Assert(SlotSyncCtx->pid == InvalidPid);
> > +
> >
> > We can mention just 'pid' here instead of 'worker pid'
> >
>
> Changed.
>
> > 2)
> > + /*
> > + * The syscache access in fetch_or_refresh_remote_slots() needs a
> > + * transaction env.
> > + */
> > fetch_or_refresh_remote_slots --> fetch_remote_slots
> >
> > 3)
> >  SyncReplicationSlots(WalReceiverConn *wrconn)
> >  {
> > +
> >
> > We can get rid of this blank line at the start of the function.
> >
>
> Fixed.
>
> > 4)
> >  /*
> >   * Shut down the slot sync worker.
> >   *
> > - * This function sends signal to shutdown slot sync worker, if required. It
> > + * This function sends signal to shutdown the slot sync process, if
> > required. It
> >   * also waits till the slot sync worker has exited or
> >   * pg_sync_replication_slots() has finished.
> >   */
> > Shall we change comment to something like (rephrase if required):
> >
> > Shut down the slot synchronization.
> > This function wakes up the slot sync process (either worker or backend
> > running SQL function) and sets stopSignaled=true
> > so that worker can exit or SQL function pg_sync_replication_slots()
> > can finish. It also waits till the slot sync worker has exited or
> > pg_sync_replication_slots() has finished.
> >
>
> Changed.
>
> > 5)
> > We should change the comment atop 'SlotSyncCtxStruct' as well to
> > mention that this pid is either the slot sync worker's pid or
> > backend's pid running the SQL function. It is needed by the startup
> > process to wake these up, so that they can stop synchronization on
> > seeing stopSignaled.  <please rephrase as needed>
> >
>
> Changed.
>
> > 6)
> > + ereport(LOG,
> > + errmsg("replication slot synchronization worker is shutting down on
> > receiving SIGUSR1"));
> >
> > SIGUSR1 was actually just a wake-up signal. We may change the comment to:
> > replication slot synchronization worker is shutting down as promotion
> > is triggered.
> >
>
> Changed.
>
> > 7)
> > update_synced_slots_inactive_since:
> > /* The slot sync worker or SQL function mustn't be running by now */
> > - Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
> > + Assert(!SlotSyncCtx->syncing);
> >
> > Regarding this, I see that 'update_synced_slots_inactive_since' is
> > only called when we are sure that 'syncing' is false. So shouldn't pid
> > be also Invalid by that time? Even if it was backend's pid to start
> > with, but since backend has stopped syncing (finished or error-ed
> > out),
> > pid should be reset to Invalid in such a case. And this Assert need
> > not to be changed.
> >
>
> Fixed.
>
> > 8)
> >
> > + if (sync_process_pid != InvalidPid)
> > + kill(sync_process_pid, SIGUSR1);
> >
> > We can write comments to say wake-up slot sync process.
> >
>
> Added comments.
>
> Attaching patch v28 addressing these comments.
>

Thanks for the patch. A few trivial comments:

1)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));

slots->slot, since in every error-message we use 'replication slot
synchronization'

2)
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). It is needed by the startup
+ * process to wake these up, so that they can stop synchronization on seeing
+ * stopSignaled on promotion.
+ * Setting stopSignaled is also used to handle the race condition when the

Can we rephrase slightly to indicate clearly that it is the startup
process which sets 'stopSignaled' during promotion. Suggestion:

The pid is either the slot sync worker’s pid or the backend’s pid running
the SQL function pg_sync_replication_slots(). When the startup process
sets stopSignaled during promotion, it uses this pid to wake the
currently synchronizing process so that the process can immediately stop its
synchronization work upon seeing stopSignaled set to true.
Setting stopSignaled....

thanks
Shveta



Re: Improve pg_sync_replication_slots() to wait for primary to advance

От
Amit Kapila
Дата:
On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Attaching patch v28 addressing these comments.
>

Can we extract the part of the patch that handles SIGUSR1 signal
separately as a first patch and the remaining as a second patch?
Please do mention the reason in the commit message as to why we are
changing the signal for SIGINT to SIGUSR1.

--
With Regards,
Amit Kapila.