Re: Report bytes and transactions actually sent downtream

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Report bytes and transactions actually sent downtream
Дата
Msg-id CAExHW5v8yBQXDLP2vPCE7rxQAP0hFCrG=ou=G8kVtBM5squYUg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Report bytes and transactions actually sent downtream  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Report bytes and transactions actually sent downtream
Список pgsql-hackers
On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > > In a recent logical replication issue, there were multiple replication
> > > > slots involved, each using a different publication. Thus the amount of
> > > > data that was replicated through each slot was expected to be
> > > > different. However, total_bytes and total_txns were reported the same
> > > > for all the replication slots as expected. One of the slots started
> > > > lagging and we were trying to figure out whether its the WAL sender
> > > > slowing down or the consumer  (in this case Debezium). The lagging
> > > > slot then showed total_txns and total_bytes lesser than other slots
> > > > giving an impression that the WAL sender is processing the data
> > > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > > actually sent downstream, it would have been easier to compare it with
> > > > the amount of data received by the consumer and thus pinpoint the
> > > > bottleneck.
> > > >
> > > > Here's a patch to do the same. It adds two columns
> > > >     - sent_txns: The total number of transactions sent downstream.
> > > >     - sent_bytes: The total number of bytes sent downstream in data messages
> > > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > > as part of 'd' messages and does not include keep alive messages or
> > > > CopyDone messages for example. But those are very few and can be
> > > > ignored. If others feel that those are important to be included, we
> > > > can make that change.
> > > >
> > > > Plugins may choose not to send an empty transaction downstream. It's
> > > > better to increment sent_txns counter in the plugin code when it
> > > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > > and pg_output_begin(). This means that every plugin will need to be
> > > > modified to increment the counter for it to reported correctly.
> > > >
> > >
> > > What if some plugin didn't implemented it or does it incorrectly?
> > > Users will then complain that PG view is showing incorrect value.
> >
> > That is right.
> >
> > To fix the problem of plugins not implementing the counter increment
> > logic we could use logic similar to how we track whether
> > OutputPluginPrepareWrite() has been called or not. In
> > ReorderBufferTxn, we add a new member sent_status which would be an
> > enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> > = UNKNOWN. We provide a function called
> > plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> > sent_status = SENT when sent = true and sent_status = NOT_SENT when
> > sent = false. In all the end transaction callback wrappers like
> > commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> > stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> > tsent_status = UNKNOWN, we throw an error.
> >
>
> I think we don't want to make it mandatory for plugins to implement
> these stats, so instead of throwing ERROR, the view should show that
> the plugin doesn't provide stats. How about having OutputPluginStats
> similar to OutputPluginCallbacks and OutputPluginOptions members in
> LogicalDecodingContext? It will have members like stats_available,
> txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

> I am thinking it will
> be better to provide this information in a separate view like
> pg_stat_plugin_stats or something like that, here we can report
> slot_name, plugin_name, then the other stats we want to implement part
> of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

--
Best Wishes,
Ashutosh Bapat



В списке pgsql-hackers по дате отправления: