Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoDBDycb3FL4dF-xDSr-yNk40ETpBB0_xiYp-ZTN6s4wTA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Mon, Jan 10, 2022 at 6:27 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, Jan 7, 2022 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > I thought we just want to lock before clearing the skip_xid something
> > > > > > > > > like take the lock, check if the skip_xid in the catalog is the same
> > > > > > > > > as we have skipped, if it is the same then clear it, otherwise, leave
> > > > > > > > > it as it is. How will that disallow users to change skip_xid when we
> > > > > > > > > are skipping changes?
> > > > > > > >
> > > > > > > > Oh I thought we wanted to keep holding the lock while skipping changes
> > > > > > > > (changing skip_xid requires acquiring the lock).
> > > > > > > >
> > > > > > > > So if skip_xid is already changed, the apply worker would do
> > > > > > > > replorigin_advance() with WAL logging, instead of committing the
> > > > > > > > catalog change?
> > > > > > > >
> > > > > > >
> > > > > > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > > > > > commit transaction would do it but when we are skipping all changes,
> > > > > > > the commit might not do it as there won't be any transaction id
> > > > > > > assigned.
> > > > > >
> > > > > > I've not tested it yet but replorigin_advance() with wal_log = true
> > > > > > seems to work for this case.
> > > > >
> > > > > I've tested it and realized that we cannot use replorigin_advance()
> > > > > for this purpose without changes. That is, the current
> > > > > replorigin_advance() doesn't allow to advance the origin by the owner:
> > > > >
> > > > >         /* Make sure it's not used by somebody else */
> > > > >         if (replication_state->acquired_by != 0)
> > > > >         {
> > > > >             ereport(ERROR,
> > > > >                     (errcode(ERRCODE_OBJECT_IN_USE),
> > > > >                      errmsg("replication origin with OID %d is already
> > > > > active for PID %d",
> > > > >                             replication_state->roident,
> > > > >                             replication_state->acquired_by)));
> > > > >         }
> > > > >
> > > > > So we need to change it so that the origin owner can advance its
> > > > > origin, which makes sense to me.
> > > > >
> > > > > Also, when we have to update the origin instead of committing the
> > > > > catalog change while updating the origin, we cannot record the origin
> > > > > timestamp.
> > > > >
> > > >
> > > > Is it because we currently update the origin timestamp with commit record?
> > >
> > > Yes.
> > >
> > > >
> > > > > This behavior makes sense to me because we skipped the
> > > > > transaction. But ISTM it’s not good if we emit the origin timestamp
> > > > > only when directly updating the origin. So probably we need to always
> > > > > omit origin timestamp.
> > > > >
> > > >
> > > > Do you mean to say that you want to omit it even when we are
> > > > committing the changes?
> > >
> > > Yes, it would be better to record only origin lsn in terms of consistency.
> > >
> > > >
> > > > > Apart from that, I'm vaguely concerned that the logic seems to be
> > > > > getting complex. Probably it comes from the fact that we store
> > > > > skip_xid in the catalog and update the catalog to clear/set the
> > > > > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > > > > shmem (e.g., ReplicationState)?
> > > > >
> > > >
> > > > IIRC, the problem with that idea was that we won't remember skip_xid
> > > > information after server restart and the user won't even know that it
> > > > has to set it again.
> > >
> > > Right, I agree that it’s not convenient when the server restarts or
> > > crashes, but these problems could not be critical in the situation
> > > where users have to use this feature; the subscriber already entered
> > > an error loop so they can know xid again and it’s an uncommon case
> > > that they need to restart during skipping changes.
> > >
> > > Anyway, I'll submit an updated patch soon so we can discuss complexity
> > > vs. convenience.
> >
> > Attached an updated patch. Please review it.

Thank you for the comments!

>
> Thanks for the updated patch, few comments:
> 1) Should this be case insensitive to support NONE too:
> +                       /* Setting xid = NONE is treated as resetting xid */
> +                       if (strcmp(xid_str, "none") == 0)
> +                               xid = InvalidTransactionId;

I think the string value is always small cases so we don't need to do
strcacsecmp here.

>
> 2) Can we have an option to specify last_error_xid of
> pg_stat_subscription_workers. Something like:
> alter subscription sub1 skip ( XID = 'last_subscription_error');
>
> When the user specified last_subscription_error, it should pick
> last_error_xid from pg_stat_subscription_workers.
> As this operation is a critical operation, if there is an option which
> could automatically pick and set from pg_stat_subscription_workers, it
> would be useful.

As I mentioned before in another mail, I think we can do that in a
separate patch.

>
> 3) Currently the following syntax is being supported, I felt this
> should throw an error:
> postgres=# alter subscription sub1 set ( XID = 100);
> ALTER SUBSCRIPTION

Fixed.

>
> 4) You might need to rebase the patch:
> git am v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
> Applying: Add ALTER SUBSCRIPTION ... SKIP to skip the transaction on
> subscriber nodes
> error: patch failed: doc/src/sgml/logical-replication.sgml:333
> error: doc/src/sgml/logical-replication.sgml: patch does not apply
> Patch failed at 0001 Add ALTER SUBSCRIPTION ... SKIP to skip the
> transaction on subscriber nodes
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
>
> 5) You might have to rename 027_skip_xact to 028_skip_xact as
> 027_nosuperuser.pl already exists
> diff --git a/src/test/subscription/t/027_skip_xact.pl
> b/src/test/subscription/t/027_skip_xact.pl
> new file mode 100644
> index 0000000000..a63c9c345e
> --- /dev/null
> +++ b/src/test/subscription/t/027_skip_xact.pl

I've resolved these conflicts.

These comments are incorporated into the latest v3 patch I just submitted[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoD9JXah2V8uFURUpZbK_ewsut%2Bjb1ESm6YQkrhQm3nJRg%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message