Re: Failed transaction statistics to measure the logical replication progress

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Failed transaction statistics to measure the logical replication progress
Дата
Msg-id CAA4eK1LWYc15=ASj1tMTEFsXtxu=02aGoMwq9YanUVr9-QMhdQ@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Failed transaction statistics to measure the logical replication progress  ("tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com>)
Ответы RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Список pgsql-hackers
On Tue, Feb 22, 2022 at 6:45 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
>
> I found a problem when using it. When a replication workers exits, the
> transaction stats should be sent to stats collector if they were not sent before
> because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats weren't
> updated as expected.
>
> I looked into it and found that the replication worker would send the
> transaction stats (if any) before it exits. But it got invalid subid in
> pgstat_send_subworker_xact_stats(), which led to the following result:
>
> postgres=# select pg_stat_get_subscription_worker(0, null);
>  pg_stat_get_subscription_worker
> ---------------------------------
>  (0,,2,0,0,,,,0,"",)
> (1 row)
>
> I think that's because subid has already been cleaned when trying to send the
> stats. I printed the value of before_shmem_exit_list, the functions in this list
> would be called in shmem_exit() when the worker exits.
> logicalrep_worker_onexit() would clean up the worker info (including subid), and
> pgstat_shutdown_hook() would send stats if any.  logicalrep_worker_onexit() was
> called before calling pgstat_shutdown_hook().
>

Yeah, I think that is a problem and maybe we can think of solving it
by sending the stats via logicalrep_worker_onexit before subid is
cleared but not sure that is a good idea. I feel we need to go back to
the idea of v21 for sending stats instead of using pgstat_report_stat.
I think the one thing which we could improve is to avoid trying to
send it each time before receiving each message by walrcv_receive and
rather try to send it before we try to wait (WaitLatchOrSocket).
Trying after each message doesn't seem to be required and could lead
to some overhead as well. What do you think?

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Handle infinite recursion in logical replication setup
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit