Re: Logical Replication of sequences

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Logical Replication of sequences
Дата
Msg-id CAA4eK1+ep00WMG6sW70RQXHJSaXE_fC5Zg-Yocnxo0HuU9W8+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Logical Replication of sequences  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Fri, Jun 14, 2024 at 5:16 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jun 13, 2024 at 03:36:05PM +0530, Amit Kapila wrote:
> > Fair enough. However, this raises the question Dilip and Vignesh are
> > discussing whether we need a new relfilenode for sequence update even
> > during initial sync? As per my understanding, the idea is that similar
> > to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> > will create the new sequence entries in pg_subscription_rel with the
> > state as 'i'. Then the sequence-sync worker would start a transaction
> > and one-by-one copy the latest sequence values for each sequence (that
> > has state as 'i' in pg_subscription_rel) and mark its state as ready
> > 'r' and commit the transaction. Now if there is an error during this
> > operation it will restart the entire operation.
>
> Hmm.  You mean to use only one transaction for all the sequences?
> I've heard about deployments with a lot of them.  Could it be a
> problem to process them in batches, as well?

I don't think so. We can even sync one sequence per transaction but
then it would be resource and time consuming without much gain. As
mentioned in a previous email, we might want to sync 100 or some other
threshold number of sequences per transaction. The other possibility
is to make a subscription-level option for this batch size but I don't
see much advantage in doing so as it won't be convenient for users to
set it. I feel we should pick some threshold number that is neither
too low nor too high and if we later see any problem with it, we can
make it a configurable knob.

>
> > The idea of creating a
> > new relfilenode is to handle the error so that if there is a rollback,
> > the sequence state will be rolled back to 'i' and the sequence value
> > will also be rolled back. The other option could be that we update the
> > sequence value without a new relfilenode and if the transaction rolled
> > back then only the sequence's state will be rolled back to 'i'. This
> > would work with a minor inconsistency that sequence values will be
> > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > I am not sure if that matters because anyway, they can quickly be
> > out-of-sync with the publisher again.
>
> Seeing a mention to relfilenodes specifically for sequences freaks me
> out a bit, because there's some work I have been doing in this area
> and sequences may not have a need for a physical relfilenode at all.
> But I guess that you refer to the fact that like tables, relfilenodes
> would only be created as required because anything you'd do in the
> apply worker path would just call some of the routines of sequence.h,
> right?
>

Yes, I think so. The only thing the patch expects is a way to rollback
the sequence changes if the transaction rolls back during the initial
sync. But I am not sure if we need such a behavior. The discussion for
the same is in progress. Let's wait for the outcome.

> > Now, say we don't want to maintain the state of sequences for initial
> > sync at all then after the error how will we detect if there are any
> > pending sequences to be synced? One possibility is that we maintain a
> > subscription level flag 'subsequencesync' in 'pg_subscription' to
> > indicate whether sequences need sync. This flag would indicate whether
> > to sync all the sequences in pg_susbcription_rel. This would mean that
> > if there is an error while syncing the sequences we will resync all
> > the sequences again. This could be acceptable considering the chances
> > of error during sequence sync are low.
>
> There could be multiple subscriptions to a single database that point
> to the same set of sequences.  Is there any conflict issue to worry
> about here?
>

I don't think so. In the worst case, the same value would be copied
twice. The same scenario in case of tables could lead to duplicate
data or unique key violation ERRORs which is much worse. So, I expect
users to be careful about the same.

> > The benefit is that both the
> > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> > idea and sync all sequences without needing a new relfilenode. Users
> > can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> > all the sequences are synced after executing the command.
>
> That would be cheaper, indeed.  Isn't a boolean too limiting?
>

In this idea, we only need a flag to say whether the sequence sync is
required or not.

> Isn't that something you'd want to track with a LSN as "the point in
> WAL where all the sequences have been synced"?
>

It won't be any better for the required purpose because after CREATE
SUBSCRIPTION, if REFERESH wants to toggle the flag to indicate the
sequences need sync again then using LSN would mean we need to set it
to Invalid value.

> The approach of doing all the sync work from the subscriber, while
> having a command that can be kicked from the subscriber side is a good
> user experience.
>

Thank you for endorsing the idea.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: libpq contention due to gss even when not using gss
Следующее
От: Michail Nikolaev
Дата:
Сообщение: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY