Re: Failed transaction statistics to measure the logical replication progress

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Failed transaction statistics to measure the logical replication progress
Дата
Msg-id CALDaNm1T3qnUnCiTytACUFiw=vktBPi9dvy8iNXCNAyPALOP_g@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Ответы RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Список pgsql-hackers
On Sat, Dec 4, 2021 at 6:32 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Friday, December 3, 2021 3:12 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for the updated patch.
> > Currently we are storing the commit count, error_count and abort_count for
> > each table of the table sync operation. If we have thousands of tables, we will
> > be storing the information for each of the tables.
> > Shouldn't we be storing the consolidated information in this case.
> > diff --git a/src/backend/replication/logical/tablesync.c
> > b/src/backend/replication/logical/tablesync.c
> > index f07983a..02e9486 100644
> > --- a/src/backend/replication/logical/tablesync.c
> > +++ b/src/backend/replication/logical/tablesync.c
> > @@ -1149,6 +1149,11 @@ copy_table_done:
> >         MyLogicalRepWorker->relstate_lsn = *origin_startpos;
> >         SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >
> > +       /* Report the success of table sync. */
> > +       pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
> > +
> >   MyLogicalRepWorker->relid,
> > +
> >   0 /* no logical message type */ );
> Okay.
>
> I united all stats into that of apply worker.
> In line with this change, I fixed the TAP tests as well
> to cover the updates of stats done by table sync workers.
>
> Also, during my self-review, I noticed that
> I should call pgstat_report_subworker_xact_end() before
> process_syncing_tables() because it can lead to process
> exit, which results in missing one increment of the stats columns.
> I noted this point in a comment as well.

Thanks for the updated patch, few comments:
1) We can keep the documentation similar to mention the count includes
both table sync worker / main apply worker in case of
commit_count/error_count and abort_count to keep it consistent.
+       <structfield>commit_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of transactions successfully applied in this subscription.
+       COMMIT and COMMIT PREPARED increments this counter.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>error_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of transactions that failed to be applied by the table
+       sync worker or main apply worker in this subscription.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>abort_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of transactions aborted in this subscription.
+       ROLLBACK PREPARED increments this counter.
+      </para></entry>
+     </row>

2) Can this be changed:
+       /*
+        * If this is a new error reported by table sync worker,
consolidate this
+        * error count into the entry of apply worker.
+        */
+       if (OidIsValid(msg->m_subrelid))
+       {
+               /* Gain the apply worker stats */
+               subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
                    InvalidOid, true);
+               subwentry->error_count++;
+       }
+       else
+               subwentry->error_count++;       /* increment the apply
worker's counter. */
To:
+       /*
+        * If this is a new error reported by table sync worker,
consolidate this
+        * error count into the entry of apply worker.
+        */
+       if (OidIsValid(msg->m_subrelid))
+               /* Gain the apply worker stats */
+               subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
                    InvalidOid, true);
+
+       subwentry->error_count++;       /* increment the apply
worker's counter. */

3) Since both 026_worker_stats and 027_worker_xact_stats.pl are
testing pg_stat_subscription_workers, can we move the tests to
026_worker_stats.pl. If possible the error_count validation can be
combined with the existing tests.
diff --git a/src/test/subscription/t/027_worker_xact_stats.pl
b/src/test/subscription/t/027_worker_xact_stats.pl
new file mode 100644
index 0000000..31dbea1
--- /dev/null
+++ b/src/test/subscription/t/027_worker_xact_stats.pl
@@ -0,0 +1,162 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Tests for subscription worker statistics during apply.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 1;
+
+# Create publisher node

Regards,
Vignesh



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Make pg_waldump report replication origin ID, LSN, and timestamp.
Следующее
От: Amit Langote
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid