Re: Logical Replication of sequences
От | vignesh C |
---|---|
Тема | Re: Logical Replication of sequences |
Дата | |
Msg-id | CALDaNm0KvDnXENHy6F7=HXyprSpgTKN3DPdDLZ+fzk+=0yOw-Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical Replication of sequences (shveta malik <shveta.malik@gmail.com>) |
Список | pgsql-hackers |
On Thu, 21 Aug 2025 at 11:49, shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 20, 2025 at 2:25 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Tue, 19 Aug 2025 at 23:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I imagined something like case 2. For logical replication of tables, > > > if we support DDL replication (i.e., CREATE/ALTER/DROP TABLE), all > > > changes the apply worker executes are serialized in commit LSN order. > > > Therefore, users would not have to be concerned about schema changes > > > that happened to the publisher. On the other hand, for sequence > > > replication, even if we support DDL replication for sequences (i.e., > > > CREATE/ALTER/DROP SEQUENCES), users would have to execute REFRESH > > > PUBLICATION SEQUENCES command after "ALTER SEQUENCE s1 MAXVALUE 12;" > > > has been replicated on the subscriber. Otherwise, REFRESH PUBLICATION > > > SEQUENCE command would fail because the sequence parameters no longer > > > match. > > > > I am summarizing the challenges identified so far (assuming we have > > DDL replication implemented through WAL support) > > 1) Lack of sequence-synchronization resulting in DDL replication > > failure/conflict. > > On the subscriber, the sequence has advanced to 14: > > SELECT currval('s1'); > > currval > > --------- > > 14 > > > > On the publisher, the sequence is reset to 11 and MAXVALUE is changed to 12: > > SELECT setval('s1', 11); > > ALTER SEQUENCE s1 MAXVALUE 12; > > If the subscriber did not execute REFRESH PUBLICATION SEQUENCES, DDL > > replication will fail with error. > > ERROR: RESTART value (14) cannot be greater than MAXVALUE (12) > > > > 2) Manual DDL on subscriber resulting in sequence synchronization failure. > > On the subscriber, the sequence maxvalue is changed: > > ALTER SEQUENCE s1 MAXVALUE 12; > > > > On the publisher, the sequence has advanced to 14: > > SELECT currval('s1'); > > currval > > --------- > > 14 > > > > REFRESH PUBLICATION SEQUENCES will fail because setting currvalue to > > 14 is greater than the changed maxvalue 12 in the subscriber. > > > > 3) Out of order DDL and REFRESH resulting in synchronization failure. > > Initially we have the same sequence on pub and sub. Then lets say pub > > has done parameter change: > > ALTER SEQUENCE s1 MAXVALUE 12; > > Now if this DDL is somehow not replicated on sub, REFRESH PUBLICATION > > SEQUENCES will fail initially and may work once DDL is replicated. > > ~~ > > Problems 1 and 2 exist in both designs. While the WAL-based REFRESH > > may seem slightly better for Problem 3 since REFRESH on the subscriber > > will execute only after prior DDLs are replicated—even with the > > sequence-sync worker, this isn't a major issue. If a user triggers > > REFRESH before the DDL is replicated, the worker will refresh all > > sequences except the mismatched one, and keep restarting and retrying > > until the DDL is applied. Once that happens, the sequence sync > > completes automatically, without the user doing another REFRESH. > > Furthermore, the likelihood of a user executing REFRESH exactly during > > the window between the DDL execution on the publisher and its > > application on the subscriber seems relatively low. > > > > WAL-based approach OTOH introduces several additional challenges that > > may outweigh its potential benefits: > > 1) Increases load on WAL sender to collect sequence values. We are > > talking about all the sequences here which could be huge in number. > > 2) Table replication may stall until sequence conflicts are resolved. > > The chances of hitting any conflict/error could be more here as > > compared to tables specially when sequence synchronization is not > > incremental and the number of sequences are huge. The continuous and > > more frequent errors if not handled by users may even end up > > invalidating the slot on primary. > > > > The worker approach neither blocks the apply worker in case of errors > > nor adds extra load on the WAL sender. On its own, Case 3 doesn’t seem > > significant enough to justify switching to a WAL-based design. > > Overall, the worker-based approach appears to be less complex and a > > better option. > > > > Agree on this. Please find a few comments on the previous patch: > > > 1) > + Returns information about the sequence. <literal>last_value</literal> > + last sequence value set in sequence by nextval or setval, > > <literal>last_value</literal> indicates .... Modified > 2) > + * If 'resync_all_sequences' is true: > + * Perform the above operation only for sequences. > > Shall we update: > Perform the above operation only for sequences and resync all the > sequences including existing ones. > > <old comment, I think somehow missed to be addressed.> This code is removed now > 3) > + check_and_launch_sync_worker(nsyncworkers, InvalidOid, > + &MyLogicalRepWorker->last_seqsync_start_time); > > Shall we simply name it as 'launch_sync_worker'. > 'check' looks a little odd. All such functions (ex: > logicalrep_worker_launch) has internal checks but the name need not to > have 'check' keyword Modified > 4) > + * Attempt to launch a sync worker (sequence or table) if there is a worker > + * available and the retry interval has elapsed. > > shall we say: > 'if there is a sync worker slot available' instead of 'if there is a > worker available' Modified > 5) > copy_sequences: > + if (!sequence_rel || !HeapTupleIsValid(tup)) > + { > + elog(LOG, "skip synchronization of sequence \"%s.%s\" because it has > been dropped concurrently", > + seqinfo->nspname, seqinfo->seqname); > + > + batch_skipped_count++; > + continue; > + } > > Is it possible that sequence_rel is valid while tuple is not? If > possible, then do we need table_close before continuing? I felt it is not possible as the lock on sequence has been acquired successfully. > 6) > In copy_sequences() wherever we are using seqinfo->nspname, > seqinfo->seqname; shall we directly use local vars nspname, seqname. Modified > 7) > LogicalRepSyncSequences: > + /* Skip if sequence was dropped concurrently */ > + sequence_rel = try_table_open(subrel->srrelid, RowExclusiveLock); > + if (!sequence_rel) > + continue; > > Here we are not checking tuple-validity like we did in copy_sequences > (comment 5 above). I think this alone should suffice even in > copy_sequences(). What do you think? In this case we just need the sequence and schema name which is available in sequence_rel whereas in case of copy_sequences we need other sequence parameters like min, max, cycle etc which requires the tuple. I felt the existing code is ok. I have also addressed all the comments from [1] in the attached v20250823 version patch. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BoVQW8oP%3DLo1X8qac6dzg-fgGQ6R_F_psfokUEqe%2Ba6w%40mail.gmail.com Regards, Vignesh
Вложения
- v20250823-0004-Update-ALTER-SUBSCRIPTION-REFRESH-to-ALTER.patch
- v20250823-0001-Enhance-pg_get_sequence_data-function.patch
- v20250823-0002-Introduce-ALL-SEQUENCES-support-for-Postgr.patch
- v20250823-0003-Reorganize-tablesync-Code-and-Introduce-sy.patch
- v20250823-0005-Introduce-REFRESH-PUBLICATION-SEQUENCES-fo.patch
- v20250823-0007-Documentation-for-sequence-synchronization.patch
- v20250823-0006-New-worker-for-sequence-synchronization-du.patch
В списке pgsql-hackers по дате отправления: