Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAKFQuwYqGOai6JKBYO4zr6xs30TfxQwiiksU55styFerWbG8Lg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Sun, Jan 23, 2022 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I really dislike the user experience this provides, and given it is new in v15 (and right now this table seems to exist solely to support this feature) changing this seems within the realm of possibility. I have to imagine these workers have a sense of local state that would just be "no errors, no need to touch pg_stat_subscription_workers at the end of this transaction's commit".  It would save a local state of the error_xid and if a successfully committed transaction has that xid it would clear the error.  The skip code path would also check for and see the matching xid value and clear the error.  Even if the local state thing doesn't work, one catalog lookup per transaction seems like potentially reasonable overhead to incur here.
>

Are you telling to update the catalog to save error_xid when an error
occurs? If so, that has many challenges like we are not supposed to
perform any such operations when the transaction is in an error state.
We have discussed this and other ideas in the beginning. I don't find
any of your arguments convincing to change the basic approach here but
I would like to see what others think on this matter?


Then how does the table get updated to that state in the first place since it doesn't know the error details until there is an error?

In any case, clearing out the entries in the table would not happen while it is applying the replication stream, in an error state or otherwise.

in = while streaming
out = not streaming

1(in). replication stream is working
2(in). replication stream fails; capture error information
3(in->out). stop replication stream; perform rollback on xid
4(out). update pg_stat_subscription_worker to report the failure, including xid of the transaction
5(out). wait for the user to manually restart the replication stream
[if they do so by skipping the xid, save the xid from pg_stat_subscription_worker into pg_subscription.subskipxid - possibly requiring the user to confirm the xid]
[user has now done their thing and requested that the replication stream resume]
6(out). clear the error information from pg_stat_subscription_worker; it is no longer useful/doesn't exist because the user just took action to avoid that very error, one way (skipping its transaction) or another.
7(out->in). resume the replication stream, return to step 1

You are already doing steps 1-5 and 7 today however you are forced to deal with transactions and catalog access.  I am just adding step 6, which turns last_error_xid into current_error_xid because it is current value of the error in the stream during step 5 when the user needs to decide how to recover from the error.  Once the user decides and the stream resumes that error information has no value (go look in the logs if you want history).  Thus when 7 comes around and the stream is restarted the error info in pg_stat_subscription_worker is empty waiting for the next error to happen.  If the user did nothing in step 5 then when that same wal is replayed at step 2 the error will come back.

The main thing is how many ways can the user exit step 5 and to make sure that no matter which way they exit step 6 happens before step 7.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Bogus duplicate command issued in pg_dump
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: row filtering for logical replication