Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver
| От | Bertrand Drouvot | 
|---|---|
| Тема | Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver | 
| Дата | |
| Msg-id | Z8gFnH4o3jBm5BRz@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст | 
| Ответ на | Monitoring gaps in XLogWalRcvWrite() for the WAL receiver (Michael Paquier <michael@paquier.xyz>) | 
| Ответы | Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver | 
| Список | pgsql-hackers | 
Hi,
On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
> Hi all,
> 
> While doing some monitoring of a replication setup for a stable
> branch, I have been surprised by the fact that we have never tracked
> WAL statistics for the WAL receiver in pg_stat_wal because we have
> never bothered to update its code so as WAL stats are reported.
Nice catch!
> This
> is relevant for the write and sync counts and timings.
Also for sync? sync looks fine as issue_xlog_fsync() is being called in 
XLogWalRcvFlush(), no?
> As of f4694e0f35b2, the situation is better thanks to the addition of
> a pgstat_report_wal() in the WAL receiver main loop, so we have some
> data.  However, we are only able to gather the data for segment syncs
> and initializations, not the writes themselves as these are managed by
> an independent code path, XLogWalRcvWrite().
> 
> A second thing that lacks in XLogWalRcvWrite() is a wait event around
> the pg_pwrite() call, which is useful as the WAL receiver is listed in
> pg_stat_activity.  Note that it is possible to re-use the same wait
> event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL
> receiver does not rely on the write and flush calls from xlog.c when
> doing its work, and both have the same meaning, aka they write WAL.
> The fsync calls use issue_xlog_fsync() and the segment inits happen in
> XLogFileInit().
> 
> Perhaps there's a point in backpatching a portion of what's in the
> attached patch (the wait event?), but I am not planning to bother much
> with the stable branches based on the lack of complaints.
We're not emitting some statistics, so I think that it's hard for users to
complain about something they don't/can't see.
> If you
> have an opinion about that, please feel free.
I'm tempted to say that the wal receiver part of f4694e0f35b2 should be
backpatched as well as what you're doing here.
+               /*
+                * Measure I/O timing to write WAL data, for pg_stat_io.
+                */
+               start = pgstat_prepare_io_time(track_wal_io_timing);
+
+               pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
                byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff);
+               pgstat_report_wait_end();
Same logic as in XLogWrite() and I don't think there is a need for a 
dedicated wait event, so LGTM.
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
		
	В списке pgsql-hackers по дате отправления: