Re: Design of pg_stat_subscription_workers vs pgstats

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Design of pg_stat_subscription_workers vs pgstats
Дата
Msg-id CAKFQuwaTr6wszUiBjf+0u-nhPx3w1j=gRiXLWH6oGJZ93O1bCQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Design of pg_stat_subscription_workers vs pgstats  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Feb 2, 2022 at 5:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

...
>
> I already explained that the concept of err_cnt is not useful.  The fact that you include it here makes me think you are still thinking that this all somehow is meant to keep track of history.  It is not.  The workers are state machines and "error" is one of the states - with relevant attributes to display to the user, and system, while in that state.  The state machine reporting does not care about historical states nor does it report on them.  There is some uncertainty if we continue with the automatic re-launch;
>

I think automatic retry will help to allow some transient errors say
like network glitches that can be resolved on retry and will keep the
behavior transparent. This is also consistent with what we do in
standby mode where if there is an error on primary due to which
standby is not able to fetch some data it will just retry. We can't
fix any error that occurred on the server-side, so the way is to retry
which is true for both standby and subscribers.

Good points.  In short there are two subsets of problems to deal with here.  We should address them separately, though the pg_subscription_worker table should provide relevant information for both cases.  If we are in a retry situation relevant information, like next_scheduled_retry (estimated), should be provided (if there is some kind of delay involved).  In a situation like "unique constraint violation" the "next_scheduled_retry" would be null; or make the field a text field and print "Manual Intervention Required".  Likewise, the XID/LSN would be null in a retry situation since we haven't received a wholly intact transaction from the publisher (we may know of such an ID but if the final COMMIT message is never even seen before the feed dies we should not be exposing that incomplete information to the user).

A standby is not expected to encounter any user data constraint problems so even a system with manual intervention for such will work for standbys because they will never hit that code path.  And you cannot simply skip applying the failed transaction and move onto the next one - that data also never came over.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Server-side base backup: why superuser, not pg_write_server_files?
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: pg_receivewal - couple of improvements