Re: Design of pg_stat_subscription_workers vs pgstats

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Design of pg_stat_subscription_workers vs pgstats
Дата
Msg-id CAKFQuwaGo9zOhr_E3eYm+Pevvzi=MCGYo5nkzvkuX35wiyysYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Design of pg_stat_subscription_workers vs pgstats  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Design of pg_stat_subscription_workers vs pgstats  (Andres Freund <andres@anarazel.de>)
Re: Design of pg_stat_subscription_workers vs pgstats  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jan 27, 2022 at 11:16 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote:
>
> > There will be some challenges in a case where updating pg_subscription_rel
> > also failed too (what to report to the user, etc.). And moreover, we don't
> > want to consume space for temporary information in the system catalog.
>
> You're consuming resources in a *WAY* worse way right now. The stats file gets
> constantly written out, and quite often read back by backends. In contrast to
> parts of pg_subscription_rel or such that data can't be removed from
> shared_buffers under pressure.
>

I don't think pg_subscription_rel is the right place to store error
info as the error can happen say while processing some message type
like BEGIN where we can't map it to pg_subscription_rel entry. There
could be other cases as well where we won't be able to map it to
pg_subscription_rel like some error related to some other table while
processing trigger function.

In general, there doesn't appear to be much advantage in storing all
the error info in system catalogs as we don't want it to be persistent
(crash-safe). Also, this information is not about any system object
that future operations can use, so won't map from that angle as well.
 
Repeating myself here to try and keep complaints regarding pg_stat_subscription_worker in one place.

This is my specific email with respect to the pg_stat_scription_workers design.


Specifically,

pg_stat_subscription_workers is defined as showing:
"will contain one row per subscription
worker on which errors have occurred, for workers applying logical
replication changes and workers handling the initial data copy of the
subscribed tables."

The fact that these errors remain (last_error_*) even after they are no longer relevant is my main gripe regarding this feature.  The information itself is generally useful though last_error_count is not.  These fields should auto-clear and be named current_error_* as they exist for purposes of describing the current state of any error-encountering logical replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual intervention, can be done with that knowledge without having to scan the subscriber's server logs.

This is my email trying to understand reality better in order to figure out what exactly is causing the limitations that are negatively impacting the design of this feature.


In short, it was convenient to use the statistics collector here even if doing so resulted in a non-user friendly (IMO) design.  Given all of the limitations to the statistics collection infrastructure, and the fact that this data is not statistical in the usual usage of the term, I find that to be less than satisfying.  To the point that I'd be inclined to revert this feature and hold up the ALTER SUBSCRIPTION SET patch until a more user-friendly design can be done using proper IPC techniques. (I also noted in the first email that pg_stat_archiver, solely by observing the column names it exposes, shares this same abuse of the statistics collector for non-statistical data).

In my second email I did some tracing and ended up at the PG_CATCH() block in src/backend/replication/logical/worker.c:L3629.  When mentioning trying to get rid of the PG_RE_THROW() there apparently doing so completely is unwarranted due to fatal/panic errors.  I am curious that the addition of the statistic reporting logic doesn't seem to care about the same.  And in any case, while maybe PG_RE_THROW() cannot go away it could maybe be done conditionally, and the worker still allowed to exit should that be more desirable than making the routine safe for looping after an error.

Andres, I do not know how to be more precise than your comment "But: You don't need to. Just abort the current transaction, start a new one, and update the state.".  When I suggested that idea it didn't seem to resonate with anyone on the other thread.  Starting at the main PG_TRY() loop in worker.c noted above, could you maybe please explain in a bit more detail whether, and how hard, it would be to go from "just PG_RE_THROW();" to "abort and start a new transaction"?

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Two noncritical bugs of pg_waldump
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Why is INSERT-driven autovacuuming based on pg_class.reltuples?