Re: Logical Replication of sequences

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Logical Replication of sequences
Дата
Msg-id CALDaNm0QCp8ZP2n5rn3c4uB6YSpGQvWAxRTF9JYqk7hFjD9yPw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Logical Replication of sequences  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
On Fri, 11 Jul 2025 at 14:26, shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jul 9, 2025 at 4:11 PM vignesh C <vignesh21@gmail.com> wrote:
>
> >
> > > 3)
> > > SyncFetchRelationStates:
> > > Earlier the name was FetchTableStates. If we really want to  use the
> > > 'Sync' keyword, we can name it FetchRelationSyncStates, as we are
> > > fetching sync-status only. Thoughts?
> >
> > Instead of FetchRelationSyncStates, I preferred FetchRelationStates,
> > and changed it to FetchRelationStates.
> >
>
> Okay, LGTM.
>
> > > 5)
> > > + if (!MyLogicalRepWorker->sequencesync_failure_time ||
> > > + TimestampDifferenceExceeds(MyLogicalRepWorker->sequencesync_failure_time,
> > > +    now, wal_retrieve_retry_interval))
> > > + {
> > > + MyLogicalRepWorker->sequencesync_failure_time = 0;
> > > +
> > > + logicalrep_worker_launch(WORKERTYPE_SEQUENCESYNC,
> > > + MyLogicalRepWorker->dbid,
> > > + MySubscription->oid,
> > > + MySubscription->name,
> > > + MyLogicalRepWorker->userid,
> > > + InvalidOid,
> > > + DSM_HANDLE_INVALID);
> > > + break;
> > > + }
> > >
> > > We set sequencesync_failure_time to 0, but if logicalrep_worker_launch
> > > is not able to launch the worker due to some reason, next time it will
> > > not even wait for 'wal_retrieve_retry_interval time' to attempt
> > > restarting it again. Is that intentional?
> > >
> > > In other workflows such as while launching table-sync or apply worker,
> > >  this scenario does not arise. This is because we maintain start_time
> > > there which can never be 0 instead of failure time and before
> > > attempting to start the workers, we set start_time to current time.
> > > The seq-sync failure-time OTOH is only set to non-null in
> > > logicalrep_seqsyncworker_failure() and it is not necessary that we
> > > will hit that function as the  logicalrep_worker_launch() may fail
> > > before that itself. Do you think we shall maintain start-time instead
> > > of failure-time for seq-sync worker as well? Or is there any other way
> > > to handle it?
> >
> > I preferred the suggestion from [1]. Modified it accordingly.
>
> Okay, works for me.
>
> > The attached v20250709 version patch has the changes for the same.
> >
>
> Thank You for the patches. Please find a few comments:
>
> 1)
> Shall we update pg_publication doc as well to indicate that pubinsert,
> pubupdate, pubdelete , pubtruncate, pubviaroot are meaningful only
> when publishing tables. For sequences, these have no meaning.

Since it is clearly mentioned it is for tables, I felt no need to
mention again it is not applicable for sequences.

> 2)
> Shall we have walrcv_disconnect() after copy is done in
> LogicalRepSyncSequences()

There is a cleanup function registered for the worker to handle this
at the worker exit. So this is not required.

> 3)
> Do we really need for-loop in ProcessSyncingSequencesForApply? I think
> this function is inspired from ProcessSyncingTablesForApply() but
> there we need different tablesync workers for different tables. For
> sequences, that is not the case and thus for-loop can be omitted. If
> we do so, we can amend the comments too where it says " Walk over all
> subscription sequences....."

Handled in the previous version

> 4)
> +# Confirm that the warning for parameters differing is logged.
> +$node_subscriber->wait_for_log(
>
> We can drop regress_seq_sub on the publisher now and check for missing
> warnings as the next step.

Modified

> 5)
> I am revisiting the test given in [1], I see there is some document change as:
>
> +     Incremental sequence changes are not replicated.  Although the data in
> +     serial or identity columns backed by sequences will be replicated as part
> +     of the table, the sequences themselves do not replicate ongoing changes.
> +     On the subscriber, a sequence will retain the last value it synchronized
> +     from the publisher. If the subscriber is used as a read-only database,
> +     then this should typically not be a problem.  If, however, some kind of
> +     switchover or failover to the subscriber database is intended, then the
> +     sequences would need to be updated to the latest values, either by
> +     executing <link
> linkend="sql-altersubscription-params-refresh-publication-sequences">
> +     <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION
> SEQUENCES</command></link>
> +     or by copying the current data from the publisher (perhaps using
> +     <command>pg_dump</command>) or by determining a sufficiently high value
> +     from the tables themselves.
>
> But this doc specifically mentions a failover case. It does not
> mention the case presented in [1] i.e. if user is trying to use
> sequence to populate identity column of a "subscribed" table where the
> sequence is also synced originally from publisher, then he may end up
> with corrupted state
> of IDENTITY column, and thus such cases should be used with caution.
> Please review once and see if we need to mention this and the example
> too.

In this case, the identity column data—as well as the non-identity
columns—will be sent by the publisher as part of the row data. This
behavior appears consistent with how non-sequence objects are handled
in a publication.
The following documentation note should be sufficient, as it already
clarifies that "it will retain the last value it synchronized from the
publisher":
On the subscriber, a sequence will retain the last value it
synchronized from the publisher. If the subscriber is used as a
read-only database, this should typically not pose a problem.
But if you have something in mind which should be added let me know.

The attached patch has the changes for the same.

Regards,
Vignesh

Вложения

В списке pgsql-hackers по дате отправления: