Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAKFQuwaLfgeForu2XYgaCuzRg5=8=5vPe13kKwxGbwu0LyeJgQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Skipping logical replication transactions on subscriber side  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sat, Jan 22, 2022 at 2:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 22, 2022 at 12:41 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
>> <david.g.johnston@gmail.com> wrote:
>> >
>> > On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>

>
> I agree with incorporating "reset" into the paragraph somehow - does not have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family friendly abbreviation...) is what does it.
>

It is not clear to me what you have in mind here but to me in this
context saying "Setting <literal>NONE</literal> resets the transaction
ID." seems quite reasonable.

OK

Yeah, we will error in that situation. The only invalid values are
system reserved values (1,2).

So long as the ALTER command errors when asked to skip those IDs there isn't any reason for an end-user, who likely doesn't know or care that 1 and 2 are special, to be concerned about them (the only two invalid values) while reading the docs.


> Or, why even force them to specify a number instead of just saying SKIP and if there is a current error we skip its transaction, otherwise we warn them that nothing happened because there is no last error.
>

The idea is that we might extend this feature to skip specific
operations on relations or maybe by having other identifiers.

Again, you've already got syntax reserved that lets you add more features to this command in the future; and removing warnings or errors because new features make them moot is easy.  Lets document and code what we are willing to implement today.  A single top-level transaction xid that is presently blocking the worker from applying any more WAL.

One idea
we discussed was to automatically fetch the last error xid but then
decided it can be done as a later patch.

This seems backwards.  The user-friendly approach is to not make them type in anything at all.  That said, this particular UX seems like it could use some safety.  Thus I would propose at this time that attempting to set the skip_option to anything but THE active_error_xid for the named subscription results in an error.  Once you add new features the user can set the skip_option to other things without provoking errors.  Again, I consider this a safety feature since the user now has to accurately match the xid to the name in the SQL in order to perform a successful skip - and the to-be affected transaction has to be one that is preventing replication from moving forward.  I'm not interested in providing a foot-gun where an arbitrary future transaction can be scheduled to be skipped.  Running the command twice with the same values should provoke an error since the first run should be allowed to finish (?).  Also, we handle the situation where the state of the worker changes between when the user saw the error and wrote down the xid to skip and the actual execution of the alter command.  Maybe not highly anticipated scenarios but this is an easy win to deal with them.


> Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped.  Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?
>

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.
 
I really dislike the user experience this provides, and given it is new in v15 (and right now this table seems to exist solely to support this feature) changing this seems within the realm of possibility. I have to imagine these workers have a sense of local state that would just be "no errors, no need to touch pg_stat_subscription_workers at the end of this transaction's commit".  It would save a local state of the error_xid and if a successfully committed transaction has that xid it would clear the error.  The skip code path would also check for and see the matching xid value and clear the error.  Even if the local state thing doesn't work, one catalog lookup per transaction seems like potentially reasonable overhead to incur here.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: fairywren is generating bogus BASE_BACKUP commands
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side