Re: Logical Replication of sequences

Поиск
Список
Период
Сортировка
От	vignesh C
Тема	Re: Logical Replication of sequences
Дата	21 августа 19:38:42
Msg-id	CALDaNm0KvDnXENHy6F7=HXyprSpgTKN3DPdDLZ+fzk+=0yOw-Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Logical Replication of sequences (shveta malik <shveta.malik@gmail.com>)
Список	pgsql-hackers
Дерево обсуждения
On Thu, 21 Aug 2025 at 11:49, shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 2:25 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 19 Aug 2025 at 23:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I imagined something like case 2. For logical replication of tables,
> > > if we support DDL replication (i.e., CREATE/ALTER/DROP TABLE), all
> > > changes the apply worker executes are serialized in commit LSN order.
> > > Therefore, users would not have to be concerned about schema changes
> > > that happened to the publisher. On the other hand, for sequence
> > > replication, even if we support DDL replication for sequences (i.e.,
> > > CREATE/ALTER/DROP SEQUENCES), users would have to execute REFRESH
> > > PUBLICATION SEQUENCES command after "ALTER SEQUENCE s1 MAXVALUE 12;"
> > > has been replicated on the subscriber. Otherwise, REFRESH PUBLICATION
> > > SEQUENCE command would fail because the sequence parameters no longer
> > > match.
> >
> > I am summarizing the challenges identified so far (assuming we have
> > DDL replication implemented through WAL support)
> > 1) Lack of sequence-synchronization resulting in DDL replication
> > failure/conflict.
> > On the subscriber, the sequence has advanced to 14:
> > SELECT currval('s1');
> > currval
> > ---------
> >      14
> >
> > On the publisher, the sequence is reset to 11 and MAXVALUE is changed to 12:
> > SELECT setval('s1', 11);
> > ALTER SEQUENCE s1 MAXVALUE 12;
> > If the subscriber did not execute REFRESH PUBLICATION SEQUENCES, DDL
> > replication will fail with error.
> > ERROR:  RESTART value (14) cannot be greater than MAXVALUE (12)
> >
> > 2) Manual DDL on subscriber resulting in sequence synchronization failure.
> > On the subscriber, the sequence maxvalue is changed:
> > ALTER SEQUENCE s1 MAXVALUE 12;
> >
> > On the publisher, the sequence has advanced to 14:
> > SELECT currval('s1');
> > currval
> > ---------
> >      14
> >
> > REFRESH PUBLICATION SEQUENCES will fail because setting currvalue to
> > 14 is greater than the changed maxvalue 12 in the subscriber.
> >
> > 3) Out of order DDL and REFRESH resulting in synchronization failure.
> > Initially we have the same sequence on pub and sub. Then lets say pub
> > has done parameter change:
> > ALTER SEQUENCE s1 MAXVALUE 12;
> > Now if this DDL is somehow not replicated on sub, REFRESH PUBLICATION
> > SEQUENCES will fail initially and may work once DDL is replicated.
> > ~~
> > Problems 1 and 2 exist in both designs. While the WAL-based REFRESH
> > may seem slightly better for Problem 3 since REFRESH on the subscriber
> > will execute only after prior DDLs are replicated—even with the
> > sequence-sync worker, this isn't a major issue. If a user triggers
> > REFRESH before the DDL is replicated, the worker will refresh all
> > sequences except the mismatched one, and keep restarting and retrying
> > until the DDL is applied. Once that happens, the sequence sync
> > completes automatically, without the user doing another REFRESH.
> > Furthermore, the likelihood of a user executing REFRESH exactly during
> > the window between the DDL execution on the publisher and its
> > application on the subscriber seems relatively low.
> >
> > WAL-based approach OTOH introduces several additional challenges that
> > may outweigh its potential benefits:
> > 1)  Increases load on WAL sender to collect sequence values. We are
> > talking about all the sequences here which could be huge in number.
> > 2)  Table replication may stall until sequence conflicts are resolved.
> > The  chances of hitting any conflict/error could be more here as
> > compared to tables specially when sequence synchronization is not
> > incremental and the number of sequences are huge. The continuous and
> > more frequent errors if not handled by users may even end up
> > invalidating the slot on primary.
> >
> > The worker approach neither blocks the apply worker in case of errors
> > nor adds extra load on the WAL sender. On its own, Case 3 doesn’t seem
> > significant enough to justify switching to a WAL-based design.
> > Overall, the worker-based approach appears to be less complex and a
> > better option.
> >
>
> Agree on this. Please find a few comments on the previous patch:
>
>
> 1)
> +        Returns information about the sequence. <literal>last_value</literal>
> +        last sequence value set in sequence by nextval or setval,
>
> <literal>last_value</literal> indicates ....

Modified

> 2)
> + * If 'resync_all_sequences' is true:
> + *     Perform the above operation only for sequences.
>
> Shall we update:
>   Perform the above operation only for sequences and resync all the
>   sequences including existing ones.
>
> <old comment, I think somehow missed to be addressed.>

This code is removed now

> 3)
> + check_and_launch_sync_worker(nsyncworkers, InvalidOid,
> + &MyLogicalRepWorker->last_seqsync_start_time);
>
> Shall we simply name it as 'launch_sync_worker'.
> 'check' looks a little odd. All such functions (ex:
> logicalrep_worker_launch) has internal checks but the name need not to
> have 'check' keyword

Modified

> 4)
> + * Attempt to launch a sync worker (sequence or table) if there is a worker
> + * available and the retry interval has elapsed.
>
> shall we say:
> 'if there is a sync worker slot available' instead of 'if there is a
> worker available'

Modified

> 5)
> copy_sequences:
> + if (!sequence_rel || !HeapTupleIsValid(tup))
> + {
> + elog(LOG, "skip synchronization of sequence \"%s.%s\" because it has
> been dropped concurrently",
> + seqinfo->nspname, seqinfo->seqname);
> +
> + batch_skipped_count++;
> + continue;
> + }
>
> Is it possible that sequence_rel is valid while tuple is not? If
> possible, then do we need table_close before continuing?

I felt it is not possible as the lock on sequence has been acquired
successfully.

> 6)
> In copy_sequences() wherever we are using seqinfo->nspname,
> seqinfo->seqname; shall we directly use local vars nspname, seqname.

Modified

> 7)
> LogicalRepSyncSequences:
> + /* Skip if sequence was dropped concurrently */
> + sequence_rel = try_table_open(subrel->srrelid, RowExclusiveLock);
> + if (!sequence_rel)
> + continue;
>
> Here we are not checking tuple-validity like we did in copy_sequences
> (comment 5 above). I think this alone should suffice even in
> copy_sequences(). What do you think?

In this case we just need the sequence and schema name which is
available in sequence_rel whereas in case of copy_sequences we need
other sequence parameters like min, max, cycle etc which requires the
tuple. I felt the existing code is ok.

I have also addressed all the comments from [1] in the attached
v20250823 version patch.
[1] - https://www.postgresql.org/message-id/CAA4eK1%2BoVQW8oP%3DLo1X8qac6dzg-fgGQ6R_F_psfokUEqe%2Ba6w%40mail.gmail.com

Regards,
Vignesh
Вложения

В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Logical Replication of sequences

Вложения