Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoA8_ArnMhnKTAH+VN0mEOQWaCYYL9zVWGdSx+u5mAJ2uA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
On Wed, Jan 26, 2022 at 7:05 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 8:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> Given that we cannot use rely on the pg_stat_subscription_workers view
>> for this purpose, we would need either a new sub-system that tracks
>> each logical replication status so the system can set the error XID to
>> subskipxid, or to wait for shared-memory based stats collector.
>
>
> I'm reading over the monitoring-stats page to try and get my head around all of this.  First of all, it defines two
kindsof views: 
>
> 1. PostgreSQL's statistics collector is a subsystem that supports collection and reporting of information about
serveractivity. 
> 2. PostgreSQL also supports reporting dynamic information ... This facility is independent of the collector process.
>
> In then has two tables:
>
> 28.1 Dynamic Statistics Views (describing #2 above)
> 28.2 Collected Statistics Views (describing #1 above)
>
> Apparently the "collector process" is UDP-like, not reliable.  The documentation fails to mention this fact.  I'd
arguethat this is a documentation bug. 
>
> I do see that the pg_stat_subscription_workers view is correctly placed in Table 28.2
>
> Reviewing the other views listed in that table only pg_stat_archiver abuses the statistics collector in a similar
fashion. All of the others are actually metric oriented. 
>
> I don't care for the specification: "will contain one row per subscription worker on which errors have occurred, for
workersapplying logical replication changes and workers handling the initial data copy of the subscribed tables." 
>
> I would much rather have this behave similar to pg_stat_activity (which, of course, is a Dynamic Statistics View...)
inthat it shows only and all workers that are presently working. 

I have no objection against having a dynamic statistics view showing
the status of each running worker but I think it should be implemented
in a separate view and not be something that replaces the
pg_stat_subscription_workers. I think pg_stat_subscription would be
the right place for it.

> The tablesync workers should go away when they have finished synchronizing.  I should not have to manually intervene
toget rid of unreliable expired data.  The log file feels like a superior solution to this monitoring view. 
>
> Alternatively, if the tablesync workers are done but we've been accumulating real statistics for them, then by all
meanskeep them included in the view - but regardless of whether they encountered an error.  But maybe the view can
rightjoin in pg_stat_subscription as show a column for "(pid is not null) AS is_active". 
>
> Maybe we need to add a track_finished_tablesync_workers GUC so the DBA can decide whether to devote storage and
processingresources to that historical information. 
>
> If you had kept the original view name, "pg_stat_subscription_error", this whole issue goes away.  But you decided to
makeit more generic and call it "pg_stat_subscription_workers" - which means you need to get rid of the error-specific
conditionin the WHERE clause for the view.  Show all workers - I can filter on is_active.  Showing only active workers
isalso acceptable.  You won't get to change your mind so decide whether this wants to show only current and running
stateor whether historical statistics for now defunct tablesync workers are desired.  Personally, I would just show
activeworkers and if someone wants to add the feature they can add a track_tablesync_worker_stats GUC and a matching
view.

We plan to clear/remove table sync entries who finished synchronization.

It’s better not to merge dynamic statistics such as pid and is_active
and accumulative statistics into one view. I think we can have both
views: pg_stat_subscription_workers view with some changes based on
the review comments (e.g., removing defunct tablesync entry), and
another view showing dynamic statistics such as the worker status.

> From that, every apply worker should be sending a statistics message to the collector periodically.  If error info is
notpresent and the state is "all is well", clear out any existing error info from the view.  The attempt to include an
actualstatistic field here doesn't seem useful nor redeeming.  I would add a "state" field in its place (well, after
subrelid). And I would still rename the columns to current_error_* and note that these should be null unless the status
fieldshows error (there may be some additional complexity here).  Just get rid of last_error_count. 
>

I don't think that using the stats collector to show the current
status of each worker is a good idea because of 500ms lag, UDP
connection etc. Even if error info is not present and the state is
good according to the view, it might be out-of-date or simply not
true. If we want to do that, it’s much better to prepare something on
shmem so each worker can store its status (running or error, error
xid, etc.) and have pg_stat_subscription (or another view)  show the
information. One thing we need to consider is that it needs to leave
the status even after exiting apply/tablesync worker but we don't know
how many statuses for workers we need to allocate on the shmem at
startup time.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side