Обсуждение: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
Hi all,
While debugging checkpointer write behavior, I recently found some of the enhancements related to extending pg_stat_checkpointer by including checkpoint type (manual/timed/immediate), last_checkpoint_time and checkpoint_total_time information to checkpoint completion logs through SQL when `log_checkpoints` is enabled. I am attaching my observations, screenshots and patch in support for this.
1. Log for type of checkpoint occured:
2025-11-20 11:51:06.128 IST [18026] LOG: checkpoint complete
(immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
lsn=0/7F4EDE80
2. Log for the checkpoint_total_time and last_checkpoint_time:
checkpoint_total_time | last_checkpoint_time
-----------------------+----------------------------------
175138 | 2025-11-20 11:58:02.879149+05:30
(1 row)
While debugging checkpointer write behavior, I recently found some of the enhancements related to extending pg_stat_checkpointer by including checkpoint type (manual/timed/immediate), last_checkpoint_time and checkpoint_total_time information to checkpoint completion logs through SQL when `log_checkpoints` is enabled. I am attaching my observations, screenshots and patch in support for this.
1. Log for type of checkpoint occured:
2025-11-20 11:51:06.128 IST [18026] LOG: checkpoint complete
(immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
lsn=0/7F4EDE80
2. Log for the checkpoint_total_time and last_checkpoint_time:
checkpoint_total_time | last_checkpoint_time
-----------------------+----------------------------------
175138 | 2025-11-20 11:58:02.879149+05:30
(1 row)
2025-11-20 11:58:02.879 IST [18026] LOG: checkpoint complete
(immediate): wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s,
total=0.019 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=0 kB, estimate=478214 kB; lsn=0/7F4EDFE0, redo lsn=0/7F4EDF88
(immediate): wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s,
total=0.019 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=0 kB, estimate=478214 kB; lsn=0/7F4EDFE0, redo lsn=0/7F4EDF88
Looking forward to more feedback.
Regards,
Soumya
Regards,
Soumya
Вложения
Hi, On Mon, Nov 24, 2025 at 11:40:44AM +0530, Soumya S Murali wrote: > While debugging checkpointer write behavior, I recently found some of the > enhancements related to extending pg_stat_checkpointer by including > checkpoint type (manual/timed/immediate), last_checkpoint_time and > checkpoint_total_time information to checkpoint completion logs through SQL > when `log_checkpoints` is enabled. I am attaching my observations, > screenshots and patch in support for this. > > 1. Log for type of checkpoint occured: > > 2025-11-20 11:51:06.128 IST [18026] LOG: checkpoint complete > (immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL > file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s, > total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s; > distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo > lsn=0/7F4EDE80 I think that'd be useful; the checkpoint complete log line clearly has the more interesting output, and having it state the type would make it easier to answer question like "how many buffers did the last wal-based checkpoint write? > 2. Log for the checkpoint_total_time and last_checkpoint_time: > > checkpoint_total_time | last_checkpoint_time > -----------------------+---------------------------------- > 175138 | 2025-11-20 11:58:02.879149+05:30 > (1 row) Reading throught the patch, it looks like checkpoint_total_time is the total time of the last checkpoint? > + proparallel => 'r', prorettype => 'float8', proargtypes => '', > + prosrc => 'pg_stat_get_checkpointer_checkpoint_total_time' }, If so, the naming is pretty confusing, last_checkpoint_duration or something might be clearer. In general I doubt how much those gauges (as oppposed to counters) only pertaining to the last checkpoint are useful in pg_stat_checkpointer. What would be the use case for those two values? Also, as a nitpick, your patch adds unnecessary newlines and I think stats_reset should be kept as last column in pg_stat_checkpointer as usual. Michael
Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
От
Álvaro Herrera
Дата:
On 2025-Nov-24, Michael Banck wrote: > In general I doubt how much those gauges (as oppposed to counters) only > pertaining to the last checkpoint are useful in pg_stat_checkpointer. > What would be the use case for those two values? I think it's useful to know how long checkpoint has to work. It's a bit lame to have only one duration (the last one), but at least with this arrangement you can have external monitoring software connect to the server, extract that value and save it somewhere else. Monitoring systems do this all the time, and we've been waiting for a better implementation to store monitoring data inside Postgres for years. I think we shouldn't block this proposal just because of this issue, because it can clearly be useful. However, I'm not sure I'm very interested in knowing only the duration of the checkpoint. I mean, much of the time the duration is going to be whatever fraction of the checkpoint timeout you have as checkpoint_completion_target, right? Which includes sleeps. So I think you really want two durations: one is the duration itself, and the other is what fraction of that did the checkpointer sleep in order to achieve that duration. So you know how much time checkpointer spent trying to get the operating system do stuff rather than just sit there waiting. We already have that data, kinda, in write_time and sync_time, but those are cumulative rather than just for the last one. (I guess you can have the monitoring system compute the deltas as it finds each new checkpoint.) I'm not sure how good this system is. In the past, I looked at a couple of monitoring dashboards offered by cloud vendors, searching for anything valuable in terms of checkpoints. What I saw was very disappointing -- mostly just "how many checkpoints per minute", which is mostly flat zero with periodic spikes. Totally useless. Does anybody know if some vendor has good charts for this? Also, if we were to add this new proposed duration, how could these charts improve? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "How strange it is to find the words "Perl" and "saner" in such close proximity, with no apparent sense of irony. I doubt that Larry himself could have managed it." (ncm, http://lwn.net/Articles/174769/)
Hi, On Mon, Nov 24, 2025 at 11:07:41AM +0100, Álvaro Herrera wrote: > On 2025-Nov-24, Michael Banck wrote: > > > In general I doubt how much those gauges (as oppposed to counters) only > > pertaining to the last checkpoint are useful in pg_stat_checkpointer. > > What would be the use case for those two values? > > I think it's useful to know how long checkpoint has to work. It's a bit > lame to have only one duration (the last one), but at least with this > arrangement you can have external monitoring software connect to the > server, extract that value and save it somewhere else. Monitoring > systems do this all the time, and we've been waiting for a better > implementation to store monitoring data inside Postgres for years. I > think we shouldn't block this proposal just because of this issue, > because it can clearly be useful. I don't know - what happens if the monitoring systems reads those values every minute, but then suddenly Postgres checkpoints every 20 seconds due to a traffic spike? It would just not see those additional checkpoints in this case, no? What monitoring systems do (have to do) is query write_time + sync_time as total_time in pg_stat_checkpointer and store that along with the timestamp of the query. Then you (maybe awkwardly) generate a graph of the checkpoint durations over time. > However, I'm not sure I'm very interested in knowing only the duration > of the checkpoint. I mean, much of the time the duration is going to be > whatever fraction of the checkpoint timeout you have as > checkpoint_completion_target, right? Which includes sleeps. Yeah, that is the other thing I was wondering about, but did not mention in my mail, good point. > So I think you really want two durations: one is the duration itself, > and the other is what fraction of that did the checkpointer sleep in > order to achieve that duration. So you know how much time > checkpointer spent trying to get the operating system do stuff rather > than just sit there waiting. We already have that data, kinda, in > write_time and sync_time, but those are cumulative rather than just > for the last one. I think that we either have "last timestamp whatever" or "total", but I think we don't have "last duration" anywhere? > (I guess you can have the monitoring system compute > the deltas as it finds each new checkpoint.) I'm not sure how good > this system is. Right, this is what I meant above. But from what I see on PG18, total_time just seems tbe write_time + sync_time, do we have the sleep somewhere? > In the past, I looked at a couple of monitoring dashboards offered by > cloud vendors, searching for anything valuable in terms of checkpoints. > What I saw was very disappointing -- mostly just "how many checkpoints > per minute", which is mostly flat zero with periodic spikes. Totally > useless. Does anybody know if some vendor has good charts for this? > Also, if we were to add this new proposed duration, how could these > charts improve? I don't have a good answer here. Michael