Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
| От | Álvaro Herrera |
|---|---|
| Тема | Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer |
| Дата | |
| Msg-id | 202511240955.vt3fjrb4ksrs@alvherre.pgsql обсуждение исходный текст |
| Ответ на | Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer (Michael Banck <mbanck@gmx.net>) |
| Ответы |
Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
|
| Список | pgsql-hackers |
On 2025-Nov-24, Michael Banck wrote: > In general I doubt how much those gauges (as oppposed to counters) only > pertaining to the last checkpoint are useful in pg_stat_checkpointer. > What would be the use case for those two values? I think it's useful to know how long checkpoint has to work. It's a bit lame to have only one duration (the last one), but at least with this arrangement you can have external monitoring software connect to the server, extract that value and save it somewhere else. Monitoring systems do this all the time, and we've been waiting for a better implementation to store monitoring data inside Postgres for years. I think we shouldn't block this proposal just because of this issue, because it can clearly be useful. However, I'm not sure I'm very interested in knowing only the duration of the checkpoint. I mean, much of the time the duration is going to be whatever fraction of the checkpoint timeout you have as checkpoint_completion_target, right? Which includes sleeps. So I think you really want two durations: one is the duration itself, and the other is what fraction of that did the checkpointer sleep in order to achieve that duration. So you know how much time checkpointer spent trying to get the operating system do stuff rather than just sit there waiting. We already have that data, kinda, in write_time and sync_time, but those are cumulative rather than just for the last one. (I guess you can have the monitoring system compute the deltas as it finds each new checkpoint.) I'm not sure how good this system is. In the past, I looked at a couple of monitoring dashboards offered by cloud vendors, searching for anything valuable in terms of checkpoints. What I saw was very disappointing -- mostly just "how many checkpoints per minute", which is mostly flat zero with periodic spikes. Totally useless. Does anybody know if some vendor has good charts for this? Also, if we were to add this new proposed duration, how could these charts improve? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "How strange it is to find the words "Perl" and "saner" in such close proximity, with no apparent sense of irony. I doubt that Larry himself could have managed it." (ncm, http://lwn.net/Articles/174769/)
В списке pgsql-hackers по дате отправления: