Обсуждение: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

Поиск
Список
Период
Сортировка

[PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

While debugging checkpointer write behavior, I recently found some of the enhancements related to extending pg_stat_checkpointer by including checkpoint type (manual/timed/immediate), last_checkpoint_time and checkpoint_total_time information to checkpoint completion logs through SQL when `log_checkpoints` is enabled. I am attaching my observations, screenshots and patch in support for this.

1. Log for type of checkpoint occured:

2025-11-20 11:51:06.128 IST [18026] LOG:  checkpoint complete
(immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
lsn=0/7F4EDE80

2. Log for the checkpoint_total_time and last_checkpoint_time:

 checkpoint_total_time |       last_checkpoint_time
-----------------------+----------------------------------
                175138 | 2025-11-20 11:58:02.879149+05:30
(1 row)

2025-11-20 11:58:02.879 IST [18026] LOG:  checkpoint complete
(immediate): wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s,
total=0.019 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=0 kB, estimate=478214 kB; lsn=0/7F4EDFE0, redo lsn=0/7F4EDF88

Looking forward to more feedback.

Regards,
Soumya

Вложения

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Mon, Nov 24, 2025 at 11:40:44AM +0530, Soumya S Murali wrote:
> While debugging checkpointer write behavior, I recently found some of the
> enhancements related to extending pg_stat_checkpointer by including
> checkpoint type (manual/timed/immediate), last_checkpoint_time and
> checkpoint_total_time information to checkpoint completion logs through SQL
> when `log_checkpoints` is enabled. I am attaching my observations,
> screenshots and patch in support for this.
>
> 1. Log for type of checkpoint occured:
>
> 2025-11-20 11:51:06.128 IST [18026] LOG:  checkpoint complete
> (immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
> file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
> total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
> distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
> lsn=0/7F4EDE80

I think that'd be useful; the checkpoint complete log line clearly has
the more interesting output, and having it state the type would make it
easier to answer question like "how many buffers did the last wal-based
checkpoint write?

> 2. Log for the checkpoint_total_time and last_checkpoint_time:
>
>  checkpoint_total_time |       last_checkpoint_time
> -----------------------+----------------------------------
>                 175138 | 2025-11-20 11:58:02.879149+05:30
> (1 row)

Reading throught the patch, it looks like checkpoint_total_time is the
total time of the last checkpoint?

> +  proparallel => 'r', prorettype => 'float8', proargtypes => '',
> +  prosrc => 'pg_stat_get_checkpointer_checkpoint_total_time' },

If so, the naming is pretty confusing, last_checkpoint_duration or
something might be clearer.

In general I doubt how much those gauges (as oppposed to counters) only
pertaining to the last checkpoint are useful in pg_stat_checkpointer.
What would be the use case for those two values?

Also, as a nitpick, your patch adds unnecessary newlines and I think
stats_reset should be kept as last column in pg_stat_checkpointer as
usual.


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Álvaro Herrera
Дата:
On 2025-Nov-24, Michael Banck wrote:

> In general I doubt how much those gauges (as oppposed to counters) only
> pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> What would be the use case for those two values?

I think it's useful to know how long checkpoint has to work.  It's a bit
lame to have only one duration (the last one), but at least with this
arrangement you can have external monitoring software connect to the
server, extract that value and save it somewhere else.  Monitoring
systems do this all the time, and we've been waiting for a better
implementation to store monitoring data inside Postgres for years.  I
think we shouldn't block this proposal just because of this issue,
because it can clearly be useful.

However, I'm not sure I'm very interested in knowing only the duration
of the checkpoint.  I mean, much of the time the duration is going to be
whatever fraction of the checkpoint timeout you have as
checkpoint_completion_target, right?  Which includes sleeps.  So I think
you really want two durations: one is the duration itself, and the other
is what fraction of that did the checkpointer sleep in order to achieve
that duration.  So you know how much time checkpointer spent trying to
get the operating system do stuff rather than just sit there waiting.
We already have that data, kinda, in write_time and sync_time, but those
are cumulative rather than just for the last one.  (I guess you can have
the monitoring system compute the deltas as it finds each new
checkpoint.)  I'm not sure how good this system is.

In the past, I looked at a couple of monitoring dashboards offered by
cloud vendors, searching for anything valuable in terms of checkpoints.
What I saw was very disappointing -- mostly just "how many checkpoints
per minute", which is mostly flat zero with periodic spikes.  Totally
useless.  Does anybody know if some vendor has good charts for this?
Also, if we were to add this new proposed duration, how could these
charts improve?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Mon, Nov 24, 2025 at 11:07:41AM +0100, Álvaro Herrera wrote:
> On 2025-Nov-24, Michael Banck wrote:
>
> > In general I doubt how much those gauges (as oppposed to counters) only
> > pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> > What would be the use case for those two values?
>
> I think it's useful to know how long checkpoint has to work.  It's a bit
> lame to have only one duration (the last one), but at least with this
> arrangement you can have external monitoring software connect to the
> server, extract that value and save it somewhere else.  Monitoring
> systems do this all the time, and we've been waiting for a better
> implementation to store monitoring data inside Postgres for years.  I
> think we shouldn't block this proposal just because of this issue,
> because it can clearly be useful.

I don't know - what happens if the monitoring systems reads those values
every minute, but then suddenly Postgres checkpoints every 20 seconds
due to a traffic spike? It would just not see those additional
checkpoints in this case, no?

What monitoring systems do (have to do) is query write_time + sync_time
as total_time in pg_stat_checkpointer and store that along with the
timestamp of the query. Then you (maybe awkwardly) generate a graph of
the checkpoint durations over time.

> However, I'm not sure I'm very interested in knowing only the duration
> of the checkpoint.  I mean, much of the time the duration is going to be
> whatever fraction of the checkpoint timeout you have as
> checkpoint_completion_target, right? Which includes sleeps.

Yeah, that is the other thing I was wondering about, but did not mention
in my mail, good point.

> So I think you really want two durations: one is the duration itself,
> and the other is what fraction of that did the checkpointer sleep in
> order to achieve that duration.  So you know how much time
> checkpointer spent trying to get the operating system do stuff rather
> than just sit there waiting.  We already have that data, kinda, in
> write_time and sync_time, but those are cumulative rather than just
> for the last one.

I think that we either have "last timestamp whatever" or "total", but I
think we don't have "last duration" anywhere?

> (I guess you can have the monitoring system compute
> the deltas as it finds each new checkpoint.)  I'm not sure how good
> this system is.

Right, this is what I meant above. But from what I see on PG18,
total_time just seems tbe write_time + sync_time, do we have the sleep
somewhere?

> In the past, I looked at a couple of monitoring dashboards offered by
> cloud vendors, searching for anything valuable in terms of checkpoints.
> What I saw was very disappointing -- mostly just "how many checkpoints
> per minute", which is mostly flat zero with periodic spikes.  Totally
> useless.  Does anybody know if some vendor has good charts for this?
> Also, if we were to add this new proposed duration, how could these
> charts improve?

I don't have a good answer here.


Michael