Обсуждение: [PATCH] Add last_executed timestamp to pg_stat_statements
Hello, I would like to propose adding a last_executed timestamptz column to pg_stat_statements. This column records when each tracked statement was most recently executed. The motivation comes from real world experience with monitoring tools like pgwatch that poll pg_stat_statements regularly. Currently, these tools must fetch and store statistics for all statements, even those that haven't executed recently. This creates significant storage overhead. For a database with around 3400 statements polled every 3 minutes, storing full query text requires roughly 2.5 MB per snapshot. Over two weeks, this accumulates to about 17 GB. Even without query text, storage reaches 10 GB. With a last_executed timestamptz, monitoring tools can simply filter statements by "last_executed > NOW() - polling_interval" to fetch only statements that have been executed since the last poll. This eliminates the need for complex workarounds that some tools currently use to identify changed statements (https://github.com/cybertec-postgresql/pgwatch/blob/759df3a149cbbe973165547186068aa7b5332f9d/internal/metrics/metrics.yaml#L2605-L2766). Beyond monitoring efficiency, the timestamp enables other useful queries. You can find statements that haven't executed in 30 days to identify deprecated code paths. You can correlate statement execution with specific time windows during incident investigation. You can also make informed decisions about which statistics to reset. The implementation is straightforward. The timestamp is stored in the Counters structure and updated on every statement execution, protected by the existing spinlock. The overhead is minimal, just a single timestamp assignment per execution. The timestamp persists with other statistics across server restarts. I've bumped the stats file format version to handle the structure change cleanly. The patch includes a new pg_stat_statements_1_14 function, the upgrade script from 1.13 to 1.14, and regression tests. All existing tests continue to pass. I believe this is a simple addition that addresses a real pain point for database monitoring and provides useful functionality for understanding query patterns over time. Thanks in advance! Attached patch applies cleanly to the current master.
Вложения
Hi, Thanks for raising this. I did not look at the patch, but I have some high level comments. > I would like to propose adding a last_executed timestamptz column to > pg_stat_statements. This column records when each tracked statement > was most recently executed. I do think there is value in adding a last_executed timestamp. I actually think last_executed should be the time the query started timestamp, so we should actually create an entry at ExecutorStart, along with calls_started and calls_completed. This is great for tracking cancelled queries. The issue is the extra overhead of tracking the query on EcecutorStart, but that should be less of an issue once we move pg_stat_statements to the cumulative statistics system, which will be possible once we get some prerequisite work to make this happen [0]. Another concern is the width of the current view. I think before we add any new attribute, pg_stat_statements fields should be split. This was discussed in [1]. > The motivation comes from real world experience with monitoring tools > like pgwatch that poll pg_stat_statements regularly. Currently, these > tools must fetch and store statistics for all statements, even those > that haven't executed recently. This creates significant storage > overhead. For a database with around 3400 statements polled every 3 > minutes, storing full query text requires roughly 2.5 MB per snapshot. > Over two weeks, this accumulates to about 17 GB. Even without query > text, storage reaches 10 GB. > > With a last_executed timestamptz, monitoring tools can simply filter > statements by "last_executed > NOW() - polling_interval" to fetch only > statements that have been executed since the last poll. This > eliminates the need for complex workarounds that some tools currently > use to identify changed statements > (https://github.com/cybertec-postgresql/pgwatch/blob/759df3a149cbbe973165547186068aa7b5332f9d/internal/metrics/metrics.yaml#L2605-L2766). Can pg_stat_statements.stats_since help here? for example "where stats_since > last_poll_timestamp" ? The client does have to track the last_poll_timestamp in that case. [0] https://www.postgresql.org/message-id/flat/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com [1] https://www.postgresql.org/message-id/03f82e6f-66a3-4c4d-935c-ea4d93871dc1%40gmail.com -- Sami Imseih Amazon Web Services (AWS)
Hi >Hi, > >Thanks for raising this. I did not look at the patch, but I have some high >level comments. > >> I would like to propose adding a last_executed timestamptz column to >> pg_stat_statements. This column records when each tracked statement >> was most recently executed. > >I do think there is value in adding a last_executed timestamp. Thanks for your support! >Can pg_stat_statements.stats_since help here? > >for example "where stats_since > last_poll_timestamp" ? Actually no, monitoring tools fetch snapshots to find the difference between snapshots. Data for every statement is changes after each execution. But stats_since is inserted only once when the new statement execution appears and is never updated during next executions. > > >The client does have to track the last_poll_timestamp in that >case. > >[0] https://www.postgresql.org/message-id/flat/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com >[1] https://www.postgresql.org/message-id/03f82e6f-66a3-4c4d-935c-ea4d93871dc1%40gmail.com > >-- >Sami Imseih >Amazon Web Services (AWS)
> >Can pg_stat_statements.stats_since help here?
> >
> >for example "where stats_since > last_poll_timestamp" ?
>
> Actually no, monitoring tools fetch snapshots to find the difference
> between snapshots.
> Data for every statement is changes after each execution.
>
> But stats_since is inserted only once when the new statement execution
> appears and is never updated during next executions.
I was thinking of using stats_since to avoid fetching query text,
since that does not change. But you are talking about avoiding all
the stats if they have not changed. I see that now.
FWIW, this was discussed back in 2017 [0], and at that time there was
some support for last_executed, but the patch did not go anywhere.
After looking at the patch, I have a few comments:
1/ There are whitespace errors when applying.
2/ Calling GetCurrentTimestamp while holding a spinlock is
not a good idea and should be avoided. This was also a point
raised in [0]. Even when we move pg_stat_statements
to cumulative stats and not at the mercy of the spinlock for updating
entries, i would still hesitate to add an additional GetCurrentTimestamp()
for every call.
I wonder if we can use GetCurrentStatementStartTimestamp()
instead?
```
/*
* GetCurrentStatementStartTimestamp
*/
TimestampTz
GetCurrentStatementStartTimestamp(void)
{
return stmtStartTimestamp;
}
```
stmtStartTimestamp is the time the query started, which seems OK for
the use-case you are mentioning. But also, stmtStartTimestamp gets
set at the top-level so nested entries (toplevel = false ) will just
inherit the timestamp of the top-level entry.
IMO, this is the most important point in the patch for now.
3/ last_executed, or maybe (last_toplevel_start) if we go with #2 should not
be added under pgssEntry->Counters, but rather directory under pgssEntry.
@@ -213,6 +214,7 @@ typedef struct Counters
* launched */
int64 generic_plan_calls; /* number of calls using a generic plan */
int64 custom_plan_calls; /* number of calls using a
custom plan */
+ TimestampTz last_executed; /* timestamp of last statement execution */
} Counters;
4/ instead of a " last_executed" maybe the tests should be added to
entry_timestamp.sql?
[0]
https://www.postgresql.org/message-id/flat/CA%2BTgmoZgZMeuN8t9pawSt6M%3DmvxKiAZ4CvPofBWwwVWeZwHe4w%40mail.gmail.com#beeebe3ca4a3dcda4ed625f7c15bb2d8
--
Sami Imseih
Amazon Web Services (AWS)
Re: Sami Imseih > I wonder if we can use GetCurrentStatementStartTimestamp() > instead? The main use case for this column is being able to retrieve the stats that were updated since the last time one was looking. That only works if it's the statement end time, or else long-running statements spanning more than one poll interval would be missed. Perhaps the column should rather be called "stats_last_updated" to match "stats_since", and be moved to the very end? Similarly, nested statements would also have to get that stamp. Oh and, yes, I'm definitely +1 on this feature. Christoph
> > I wonder if we can use GetCurrentStatementStartTimestamp() > > instead? > > The main use case for this column is being able to retrieve the stats > that were updated since the last time one was looking. That only works > if it's the statement end time, or else long-running statements > spanning more than one poll interval would be missed. Sure, I get it is not perfect for the polling use-case due to the scenario you mention, but I don't think it will be acceptable to call GetCurrentTimeStamp() at the end of every execution and especially with a SpinLock held. This will probably be worse for nested_tacking as well, due to multiple GetCurrentTimeStamp() calls. What do you think? -- Sami Imseih Amazon Web Services (AWS)
Hi, On Tue, Feb 03, 2026 at 08:37:31PM -0600, Sami Imseih wrote: > > > I wonder if we can use GetCurrentStatementStartTimestamp() > > > instead? > > > > The main use case for this column is being able to retrieve the stats > > that were updated since the last time one was looking. That only works > > if it's the statement end time, or else long-running statements > > spanning more than one poll interval would be missed. > > Sure, I get it is not perfect for the polling use-case due to the scenario > you mention, but I don't think it will be acceptable to call > GetCurrentTimeStamp() at the end of every execution and especially > with a SpinLock held. I think the same, that would not match (lmgr/README): " * Spinlocks. These are intended for *very* short-term locks. If a lock is to be held more than a few dozen instructions, or across any sort of kernel call " Out of curiosity I looked for GetCurrentTimeStamp() calls while holding a spinlock and found one in WalReceiverMain(). But I guess it's less of an issue since it's only called when the walreceiver starts. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi all, Thank you to Sami, Christoph, and Bertrand for the thorough review and valuable feedback on v1. I've prepared a v2 patch that addresses all the concerns raised. You're absolutely right that calling `GetCurrentTimestamp()` while holding a spinlock is unacceptable. I've adopted Sami's suggestion to use `GetCurrentStatementStartTimestamp()` instead. While Christoph correctly points out that this means we're capturing statement start time rather than end time, I believe this is an acceptable trade-off for several reasons: - the vast majority of statements complete quickly (well under typical polling intervals of 1-5 minutes); - long-running queries will appear in the next poll anyway; - deferring timestamp updates adds significant complexity; - and most importantly zero performance overhead vs a kernel call per execution. For the monitoring use case, what matters is knowing has this statement been active since my last poll? The start time serves this purpose effectively. I've renamed the column to `stats_last_updated` as Christoph suggested. This matches the existing "stats_since" column for consistency. Following Christoph's suggestion, I've also moved it to the end of the view. Per Sami's feedback, I've moved the timestamp from the `Counters` struct directly to `pgssEntry`. This is more semantically correct since it's metadata about the entry rather than a statistical counter. I've updated `PGSS_FILE_HEADER` to 0x20260205. Among other changes: - fixed all whitespace errors - moved tests to `entry_timestamp.sql` as suggested - added documentation notice about nested statement behavior - the timestamp is now set outside the spinlock-protected section Sami mentioned this was discussed in 2017 [0]. Looking at that thread, the main concerns were overhead and use cases. I believe we've addressed both: - zero overhead by using existing statement start timestamp - clear, documented use case from production monitoring tools (pgwatch, etc.) [0] https://www.postgresql.org/message-id/flat/CA+TgmoZgZMeuN8t9pawSt6M=mvxKiAZ4CvPofBWwwVWeZwHe4w@mail.gmail.com I've attached the v2 patch. All existing tests pass. I'd appreciate further review and feedback. Thanks, Pavlo Golub
Вложения
> Thank you to Sami, Christoph, and Bertrand for the thorough review and valuable > feedback on v1. I've prepared a v2 patch that addresses all the concerns raised. Thanks for the patch! I have not looked at v2 in detail yet. Did take a quick peek at the doc. Some comments: > I've renamed the column to `stats_last_updated` as Christoph suggested. This > matches the existing "stats_since" column for consistency. Following Christoph's > suggestion, I've also moved it to the end of the view. I still wonder if "stats_last_updated" is a good name here. What about "last_execution_start", since that is exactly what this timestamp is. + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>stats_last_updated</structfield> <type>timestamp with time zone</type> + </para> + <para> + Time at which the statement statistics were last updated (specifically, + the time when the statement most recently started execution). Here I think we can just say: "The start time of the most recent execution of the statement that completed. " + This is useful for monitoring tools to identify which statements + have been executed since their last poll. I am not sure we need this part for the docs. others may disagree. + For nested statements (when <varname>pg_stat_statements.track</varname> + is set to <literal>all</literal>), this reflects the start time of the + parent top-level statement. + </para></entry> Maybe this is better as it mentioned "toplevel" "For nested statements (toplevel = false), this reflects the start time of the top-level statement." what do you think? -- Sami Imseih Amazon Web Services (AWS)
Hey >> Thank you to Sami, Christoph, and Bertrand for the thorough review and valuable >> feedback on v1. I've prepared a v2 patch that addresses all the concerns raised. > >Thanks for the patch! I have not looked at v2 in detail yet. Did take >a quick peek >at the doc. Some comments: > >> I've renamed the column to `stats_last_updated` as Christoph suggested. This >> matches the existing "stats_since" column for consistency. Following Christoph's >> suggestion, I've also moved it to the end of the view. > >I still wonder if "stats_last_updated" is a good name here. What about >"last_execution_start", since that is exactly what this timestamp is. Yeah, sounds better really. Thanks > > >+ <entry role="catalog_table_entry"><para role="column_definition"> >+ <structfield>stats_last_updated</structfield> <type>timestamp with >time zone</type> >+ </para> >+ <para> >+ Time at which the statement statistics were last updated (specifically, >+ the time when the statement most recently started execution). > >Here I think we can just say: > >"The start time of the most recent execution of the statement that completed. " +1 > > >+ This is useful for monitoring tools to identify which statements >+ have been executed since their last poll. > >I am not sure we need this part for the docs. others may disagree. Agree, too much details. > > >+ For nested statements (when <varname>pg_stat_statements.track</varname> >+ is set to <literal>all</literal>), this reflects the start time of the >+ parent top-level statement. >+ </para></entry> > >Maybe this is better as it mentioned "toplevel" > >"For nested statements (toplevel = false), this reflects the start time >of the top-level statement." I like it. Looks easier to read. > > >what do you think? Thanks, I'll prepare updated patch > > >-- >Sami Imseih >Amazon Web Services (AWS)
Re: Pavlo Golub
> > I still wonder if "stats_last_updated" is a good name here. What about
> > "last_execution_start", since that is exactly what this timestamp is.
>
> Yeah, sounds better really. Thanks
I still believe that storing execution start time is the wrong thing
to do as it will miss all long-running statements. Consider this
timeline:
09:55 get all stats changed since 09:50
-> doesn't see the statement because it hasn't started yet
09:57 start long-running statement
10:00 get all stats changed since 09:55
-> doesn't see the statement because it's still running
10:02 long-running statement terminates, storing 09:57 as timestamp
10:05 get all stats changed since 10:00
-> doesn't see the statement because it's too old
Christoph
> >I still believe that storing execution start time is the wrong thing >to do as it will miss all long-running statements. Consider this >timeline: > >09:55 get all stats changed since 09:50 > -> doesn't see the statement because it hasn't started yet >09:57 start long-running statement >10:00 get all stats changed since 09:55 > -> doesn't see the statement because it's still running >10:02 long-running statement terminates, storing 09:57 as timestamp >10:05 get all stats changed since 10:00 > -> doesn't see the statement because it's too old > Thanks for sharing this. I spent a sleepless night and it seems I found the solution for the issue. If we do WHERE last_execution_start + max_exec_time * INTERVAL '1 ms' > NOW() - polling_interval we will grab all long-running statements. The worst thing that might happen, if some query has significant run time deviation, then we could grab it several times. But again this is far better than re-fetching thousands of rows every time. For short queries it will just work as expected. "Issues" might be only with long queries with a high deviation longer than polling interval. But fetching one or two rows once in a while is not a problem.
Re: Pavlo Golub > If we do > > WHERE last_execution_start + max_exec_time * INTERVAL '1 ms' > NOW() - > polling_interval Is this extra complexity worth one saved GetCurrentTimestamp()? src/backend/access/transam/xact.c is calling GetCurrentTimestamp a lot already, so I don't really buy the argument it should be avoided at all cost in pg_stat_statements. Just storing the statement end time would make this use case much nicer. Christoph
> >Is this extra complexity worth one saved GetCurrentTimestamp()? I'm not sure really. But I really need a new column in one way or another, so I can agree to whatever decision we'll make. > > >src/backend/access/transam/xact.c is calling GetCurrentTimestamp a >lot already, so I don't really buy the argument it should be avoided >at all cost in pg_stat_statements. Just storing the statement end time >would make this use case much nicer. +1 Maybe just move GetCurrentTimestamp outside of spin lock is engouh already? > > >Christoph
OK, here is one more try. I discovered the `total_time` argument to the `pgss_store()` function! So we can calculate the finish time without calling `GetCurrentTimestamp()`. This is version 3 of the patch adding a `stats_last_updated` column (yes, again) to pg_stat_statements. Based on feedback, this version improves the implementation with better performance and correctness. The main improvement uses `statement_start + execution_duration` with `rint(total_time * 1000.0)` to convert milliseconds to microseconds with proper rounding. The calculation performed BEFORE acquiring spinlock and assigned within locked scope. I'm wondering how we all missed this trick from the very beginning and started to argue if `GetCurrentTimestamp()` is heavy or not. :) Best regards, Pavlo
Вложения
> > WHERE last_execution_start + max_exec_time * INTERVAL '1 ms' > NOW() -
> > polling_interval
>
> Is this extra complexity worth one saved GetCurrentTimestamp()?
>
> src/backend/access/transam/xact.c is calling GetCurrentTimestamp a
> lot already, so I don't really buy the argument it should be avoided
> at all cost in pg_stat_statements. Just storing the statement end time
> would make this use case much nicer.
We do call GetCurrentTimestamp() to set timestamps for xact_start, etc.
But, we also take special care to avoid extra calls.
```
/*
* set transaction_timestamp() (a/k/a now()). Normally, we want this to
* be the same as the first command's statement_timestamp(), so don't do a
* fresh GetCurrentTimestamp() call (which'd be expensive anyway). But
* for transactions started inside procedures (i.e., nonatomic SPI
* contexts), we do need to advance the timestamp. Also, in a parallel
* worker, the timestamp should already have been provided by a call to
* SetParallelStartTimestamps().
*/
if (!IsParallelWorker())
{
if (!SPI_inside_nonatomic_context())
xactStartTimestamp = stmtStartTimestamp;
else
xactStartTimestamp = GetCurrentTimestamp();
```
> OK, here is one more try. I discovered the `total_time` argument to
> the `pgss_store()` function! So we can calculate the finish time
> without calling `GetCurrentTimestamp()`.
>
> This is version 3 of the patch adding a `stats_last_updated` column
> (yes, again) to pg_stat_statements. Based on feedback, this version
> improves the implementation with better performance and correctness.
>
> The main improvement uses `statement_start + execution_duration` with
> `rint(total_time * 1000.0)` to convert milliseconds to microseconds
> with proper rounding. The calculation performed BEFORE acquiring
> spinlock and assigned within locked scope.
Correct, this also crossed my mind. Although I would consider doing things a bit
different. Instead of calculating the end time in a single column, I still
think it's worth having a last_executed_start and a
last_execution_time (duration),
and the user can calculate this on the SQL layer. I think it's better because
last_execution_start is already a known timestamp in
pg_stat_activity.query_start
and some tool that finds a long running query in pg_stat_activity, knowing the
query_start they could then go look it up in pg_stat_statements. What I'm
really getting at is separating these fields will open up more use cases, IMO.
Of course, we are adding 2 new columns instead of just 1, but these values
do have benefits.
--
Sami Imseih
Amazon Web Services (AWS)
Re: Pavlo Golub > How about > TimestampTz stmt_end = TimestampTzPlusMilliseconds( > GetCurrentStatementStartTimestamp(), > (int64) total_time > ); > We have total_time as an argument already! No kernel calls, sweet and easy! Cool idea! Re: Sami Imseih > I think it's better because last_execution_start is already a known > timestamp in pg_stat_activity.query_start and some tool that finds a > long running query in pg_stat_activity, knowing the > query_start they could then go look it up in pg_stat_statements. That only works if a) the query was not yet overwritten in pg_stat_activity and b) neither in pg_stat_statements. Optimizing for that use case seems pretty narrow. > What I'm really getting at is separating these fields will open up > more use cases, IMO. Generally, I think pgss should have cumulative statistics, and less about individual executions, so I'm not really sure what practical problem "last start" and "last runtime" would solve. The last_stats_update column we are talking about here is different in the sense that it's not about an individual execution, but infrastructure for retrieving the stats sensibly. Christoph
> > How about
> > TimestampTz stmt_end = TimestampTzPlusMilliseconds(
> > GetCurrentStatementStartTimestamp(),
> > (int64) total_time
> > );
> > We have total_time as an argument already! No kernel calls, sweet and easy!
>
> Cool idea!
This calculation could be wrong for very common cases in extended
query protocol,
Here is a script to test with:
```
select pg_stat_statements_reset();
BEGIN;
select now() as now, clock_timestamp() as clock_timestamp,
pg_sleep($1) \bind 10 \g
\! sleep 10
SELECT now() as now, clock_timestamp() as clock_timestamp, $1 \bind 1 \g
END;
select stats_last_updated, total_exec_time, substr(query, 1, 150) as
query from pg_stat_statements;
````
With v3 applied, notice the output is calculating a stats_last_updated
that is beyond the current time
```
pg_stat_statements_reset
-------------------------------
2026-02-09 16:13:35.188849+00
(1 row)
BEGIN
now | clock_timestamp | pg_sleep
------------------------------+-------------------------------+----------
2026-02-09 16:13:35.18911+00 | 2026-02-09 16:13:35.189397+00 |
(1 row)
now | clock_timestamp | ?column?
------------------------------+-------------------------------+----------
2026-02-09 16:13:35.18911+00 | 2026-02-09 16:13:55.193443+00 | 1
(1 row)
COMMIT
stats_last_updated | total_exec_time |
query
-------------------------------+-----------------+-------------------------------------------------------------------------
2026-02-09 16:13:55.19367+00 | 0.007401 | SELECT now() as
now, clock_timestamp() as clock_timestamp, $1
2026-02-09 16:13:55.193664+00 | 0.00103 | END
2026-02-09 16:13:35.189111+00 | 0.00098 | BEGIN
2026-02-09 16:13:35.188584+00 | 0.090183 | select
pg_stat_statements_reset()
2026-02-09 16:14:05.194134+00 | 10000.751122 | select now() as
now, clock_timestamp() as clock_timestamp, pg_sleep($1)
(5 rows)
```
This happens because in the case of extended query protocol,
ExecutorEnd is called
at the next query. This has been discussed in [1] [2].
So, for this to work, we will likely need to store the query start
time in the queryDesc; actually
queryDesc->totaltime, and set the query start time at ExecutorStart,
during InstrAlloc.
> > I think it's better because last_execution_start is already a known
> > timestamp in pg_stat_activity.query_start and some tool that finds a
> > long running query in pg_stat_activity, knowing the
> > query_start they could then go look it up in pg_stat_statements.
>
> That only works if a) the query was not yet overwritten in
> pg_stat_activity and b) neither in pg_stat_statements. Optimizing for
> that use case seems pretty narrow.
>
> > What I'm really getting at is separating these fields will open up
> > more use cases, IMO.
Maybe this is a bad use case. But I felt separating these 2 fields will
be more flexible.
> Generally, I think pgss should have cumulative statistics, and less
> about individual executions, so I'm not really sure what practical
> problem "last start" and "last runtime" would solve. The
> last_stats_update column we are talking about here is different in the
> sense that it's not about an individual execution, but infrastructure
> for retrieving the stats sensibly.
Sure, generally, pg_stat_statements is for cumulative stats, but we also
do have computed stats such as max/min/stddev, etc. But, it's not without
precedent that we track timestamps of the last time some operation occurred.
We do that in views that have a purpose of tracking cumulative data, because
these timestamps are useful. See pg_stat_all_tables.last_seq_scan or
last_autovacuum
as an example.
Maybe having the last runtime column is not that valuable if we can
correctly calculate
the last execution time. AlsoI will be a strong -1 calling this field
"stats_last_updated"
instead of "last_execution_time".
[1] https://www.postgresql.org/message-id/3d07ee43-8855-42db-97e0-bad5db82d972@dalibo.com
[2] https://www.postgresql.org/message-id/CAA5RZ0t2+GLnE_55L2cfCay+L8yPFpdPRVQo-JswUFgXy-EK5Q@mail.gmail.com
--
Sami Imseih
Amazon Web Services (AWS)
Hi hackers, This is v4 of the patch adding a stats_last_updated column to pg_stat_statements. The thread is somehow shifted to things that are not related to the patch initial idea. I decided to return to the beginning. I simplified the implementation to use GetCurrentTimestamp() directly instead of calculating the timestamp from GetCurrentStatementStartTimestamp() + total_time. The previous optimization was premature, benchmark testing proves GetCurrentTimestamp() adds no measurable overhead. Test environment: Dockerized Linux x86_64 under Windows host (the worst possible combination), PostgreSQL 19devel, gcc-14.2.0 Test: 1,000,000 iterations of PERFORM 1 (fastest possible statement) Unpatched PostgreSQL: track='none' (baseline): 562.07ms average track='all' (tracking): 719.17ms average Overhead: 157.10ms (27.9%) Patched PostgreSQL (with stats_last_updated): track='none' (baseline): 548.95ms average track='all' (tracking): 732.50ms average Overhead: 183.55ms (33.4%) Direct comparison (what matters): Unpatched track='all': 719.17ms Patched track='all': 732.50ms Difference: +13.33ms (+1.85%) Per-statement: 13.33ms / 1,000,000 = 13 nanoseconds The baseline comparison shows patched is actually faster (-13ms), which is impossible. This confirms the 13ms variance is a measurement noise, not real overhead. Real-world impact for 100ms query is 0.000013% overhead GetCurrentTimestamp() is the standard approach used throughout PostgreSQL for monitoring features. The measured overhead of 13 nanoseconds per statement is negligible for any realistic workload and well within measurement noise. The implementation captures GetCurrentTimestamp() before acquiring the spinlock, so no syscall occurs while holding the lock. This is simple, correct, and has no measurable performance impact. Changes from v2-v3: - Simplified implementation to use GetCurrentTimestamp() directly - Removed complex calculation with GetCurrentStatementStartTimestamp() and total_time computation (premature optimization) - Added comprehensive benchmark testing (unpatched vs patched) - Benchmark testing shows no measurable overhead (<2% in synthetic tests) - Measured overhead: 13ns per statement (1.85% for 1M iteration test, negligible for real queries >0.1ms) Changes from v1: - Rename column from last_executed to stats_last_updated (Christoph Berg) - Move timestamp from Counters struct to pgssEntry for better semantics - Place column at end of view to match stats_since naming convention - Fixed whitespace errors - Moved tests to entry_timestamp.sql (Sami Imseih) - Updated PGSS_FILE_HEADER to handle structure change Patch, benchmark script and raw results are attached. Best regards, Pavlo Golub
Вложения
Hi,
> I simplified the implementation to use GetCurrentTimestamp() directly
> instead of calculating the timestamp from
I don't think it will be acceptable to add GetCurrentTimestamp() at the end of
every execution. We take extra measures to avoid such overhead in core, and
I don't think we should impose this on pg_stat_statements users as well.
For example, in c037471832e which tracks lastscan:
"
To make it easier to detect the last time a relation has been scanned, track
that time in each relation's pgstat entry. To minimize overhead a) the
timestamp is updated only when the backend pending stats entry is flushed to
shared stats b) the last transaction's stop timestamp is used as the
timestamp.
"
My opinion is still that using the last_start_time will be useful for monitoring
as you demonstrated [0], albeit with some complexity involved for the
monitoring client,
will be the best way and safest way forward.
Others may disagree.
> GetCurrentStatementStartTimestamp() + total_time. The previous optimization was
> premature, benchmark testing proves GetCurrentTimestamp() adds no
> measurable overhead.
I am not sure if this benchmark is enough to build confidence in this approach.
Workloads with nested queries and tack. 'all' are now paying an even heavier
cost.
Also, quickly looking at v4, you probably want to call GetCurrentTimestamp()
when kind == PGSS_EXEC only.
Assert(kind == PGSS_PLAN || kind == PGSS_EXEC);
+ /*
+ * Get current timestamp before acquiring spinlock to avoid holding
+ * the lock during syscall.
+ */
+ stats_updated_at = GetCurrentTimestamp();
+
/*
[0] https://www.postgresql.org/message-id/ema0100e49-522e-4b8f-9c25-9257af1f0793%40cybertec.at
--
Sami Imseih
Amazon Web Services (AWS)
On Mon, Feb 16, 2026 at 04:06:50PM -0600, Sami Imseih wrote: >> I simplified the implementation to use GetCurrentTimestamp() directly >> instead of calculating the timestamp from > > I don't think it will be acceptable to add GetCurrentTimestamp() at the end of > every execution. We take extra measures to avoid such overhead in core, and > I don't think we should impose this on pg_stat_statements users as well. The posted v4 adds a GetCurrentTimestamp(), synonym of gettimeofday(), for each call of pgss_store(). I suspect that this choice most likely makes it a no-go on performance ground, and there have been complaints about PGSS becoming noticeably slower with more fields added for the last couple of releases. A timestamp call at reset or allocation is a different matter, as it is capped naturally by a workload and the number of entries configured. A timestamp call done each time an entry is updated will make this code show higher in profiles. -- Michael
Вложения
Hi! > I am not sure if this benchmark is enough to build confidence in this approach. > Workloads with nested queries and tack. 'all' are now paying an even heavier > cost. I would love to see and test sql that will reveal the bigger overhead. Do you have something specific on your mind? I'd love to share results, we can use them later considering further updates to pgss. > > Also, quickly looking at v4, you probably want to call GetCurrentTimestamp() > when kind == PGSS_EXEC only. Fair enough! Thanks for noticing!
> > I am not sure if this benchmark is enough to build confidence in this approach.
> > Workloads with nested queries and tack. 'all' are now paying an even heavier
> > cost.
>
> I would love to see and test sql that will reveal the bigger overhead.
> Do you have something specific on your mind? I'd love to share
> results, we can use them later considering further updates to pgss.
Here is something I was experimenting with today. I ran 2
benchmarks; one on HEAD and one with GetCurrentTimestamp()
added when we are accumulating stats.
"""
/* Increment the counts, except when jstate is not NULL */
if (!jstate)
{
Assert(kind == PGSS_PLAN || kind == PGSS_EXEC);
GetCurrentTimestamp();
"""
The benchmak script is a series of "SELECT;"
# select_tx.sql
"""
begin;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
select;
end;
"""
The benchmark was on my Ubuntu on EC2 c5a.12xlarge,
with default pg_stat_statements settings ( no plan tracking
and top tracking only ).
pgbench command:
```
pgbench -c48 -j16 -P1 -f select_tx.sql -T120
```
Results for 3 runs
## HEAD
tps = 29351.794589 (without initial connection time)
tps = 29470.287111 (without initial connection time)
tps = 29902.245458 (without initial connection time)
## with GetCurrentTimestamp()
tps = 28569.471891 (without initial connection time)
tps = 28013.051778 (without initial connection time)
tps = 28518.468843 (without initial connection time)
I see around 4-5% performance degradation.
--
Sami Imseih
Amazon Web Services (AWS)