Обсуждение: [PATCH] Add last_executed timestamp to pg_stat_statements

Поиск
Список
Период
Сортировка

[PATCH] Add last_executed timestamp to pg_stat_statements

От
Pavlo Golub
Дата:
Hello,

I would like to propose adding a last_executed timestamptz column to
pg_stat_statements. This column records when each tracked statement
was most recently executed.

The motivation comes from real world experience with monitoring tools
like pgwatch that poll pg_stat_statements regularly. Currently, these
tools must fetch and store statistics for all statements, even those
that haven't executed recently. This creates significant storage
overhead. For a database with around 3400 statements polled every 3
minutes, storing full query text requires roughly 2.5 MB per snapshot.
Over two weeks, this accumulates to about 17 GB. Even without query
text, storage reaches 10 GB.

With a last_executed timestamptz, monitoring tools can simply filter
statements by "last_executed > NOW() - polling_interval" to fetch only
statements that have been executed since the last poll. This
eliminates the need for complex workarounds that some tools currently
use to identify changed statements

(https://github.com/cybertec-postgresql/pgwatch/blob/759df3a149cbbe973165547186068aa7b5332f9d/internal/metrics/metrics.yaml#L2605-L2766).

Beyond monitoring efficiency, the timestamp enables other useful
queries. You can find statements that haven't executed in 30 days to
identify deprecated code paths. You can correlate statement execution
with specific time windows during incident investigation. You can also
make informed decisions about which statistics to reset.

The implementation is straightforward. The timestamp is stored in the
Counters structure and updated on every statement execution, protected
by the existing spinlock. The overhead is minimal, just a single
timestamp assignment per execution. The timestamp persists with other
statistics across server restarts. I've bumped the stats file format
version to handle the structure change cleanly.

The patch includes a new pg_stat_statements_1_14 function, the upgrade
script from 1.13 to 1.14, and regression tests. All existing tests
continue to pass.

I believe this is a simple addition that addresses a real pain point
for database monitoring and provides useful functionality for
understanding query patterns over time.

Thanks in advance!

Attached patch applies cleanly to the current master.

Вложения

Re: [PATCH] Add last_executed timestamp to pg_stat_statements

От
Sami Imseih
Дата:
Hi,

Thanks for raising this. I did not look at the patch, but I have some high
level comments.

> I would like to propose adding a last_executed timestamptz column to
> pg_stat_statements. This column records when each tracked statement
> was most recently executed.

I do think there is value in adding a last_executed timestamp. I actually think
last_executed should be the time the query started timestamp, so we should
actually create an entry at ExecutorStart, along with calls_started and
calls_completed. This is great for tracking cancelled queries.
The issue is the extra overhead of tracking the query on
EcecutorStart, but that
should be less of an issue once we move pg_stat_statements to the cumulative
statistics system, which will be possible once we get some prerequisite
work to make this happen [0].

Another concern is the width of the current view. I think before we add
any new attribute, pg_stat_statements fields should be split.
This was discussed in [1].

> The motivation comes from real world experience with monitoring tools
> like pgwatch that poll pg_stat_statements regularly. Currently, these
> tools must fetch and store statistics for all statements, even those
> that haven't executed recently. This creates significant storage
> overhead. For a database with around 3400 statements polled every 3
> minutes, storing full query text requires roughly 2.5 MB per snapshot.
> Over two weeks, this accumulates to about 17 GB. Even without query
> text, storage reaches 10 GB.
>
> With a last_executed timestamptz, monitoring tools can simply filter
> statements by "last_executed > NOW() - polling_interval" to fetch only
> statements that have been executed since the last poll. This
> eliminates the need for complex workarounds that some tools currently
> use to identify changed statements
>
(https://github.com/cybertec-postgresql/pgwatch/blob/759df3a149cbbe973165547186068aa7b5332f9d/internal/metrics/metrics.yaml#L2605-L2766).

Can pg_stat_statements.stats_since help here?

for example "where stats_since > last_poll_timestamp" ?

The client does have to track the last_poll_timestamp in that
case.

[0] https://www.postgresql.org/message-id/flat/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
[1] https://www.postgresql.org/message-id/03f82e6f-66a3-4c4d-935c-ea4d93871dc1%40gmail.com

--
Sami Imseih
Amazon Web Services (AWS)



Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements

От
"Pavlo Golub"
Дата:
Hi


>Hi,
>
>Thanks for raising this. I did not look at the patch, but I have some high
>level comments.
>
>>  I would like to propose adding a last_executed timestamptz column to
>>  pg_stat_statements. This column records when each tracked statement
>>  was most recently executed.
>
>I do think there is value in adding a last_executed timestamp.

Thanks for your support!

>Can pg_stat_statements.stats_since help here?
>
>for example "where stats_since > last_poll_timestamp" ?

Actually no, monitoring tools fetch snapshots to find the difference
between snapshots.
Data for every statement is changes after each execution.

But stats_since is inserted only once when the new statement execution
appears and is never updated during next executions.

>
>
>The client does have to track the last_poll_timestamp in that
>case.
>
>[0] https://www.postgresql.org/message-id/flat/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
>[1] https://www.postgresql.org/message-id/03f82e6f-66a3-4c4d-935c-ea4d93871dc1%40gmail.com
>
>--
>Sami Imseih
>Amazon Web Services (AWS)



Re: Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements

От
Sami Imseih
Дата:
> >Can pg_stat_statements.stats_since help here?
> >
> >for example "where stats_since > last_poll_timestamp" ?
>
> Actually no, monitoring tools fetch snapshots to find the difference
> between snapshots.
> Data for every statement is changes after each execution.
>
> But stats_since is inserted only once when the new statement execution
> appears and is never updated during next executions.

I was thinking of using stats_since to avoid fetching query text,
since that does not change. But you are talking about avoiding all
the stats if they have not changed. I see that now.

FWIW, this was discussed back in 2017 [0], and at that time there was
some support for last_executed, but the patch did not go anywhere.

After looking at the patch, I have a few comments:

1/ There are whitespace errors when applying.

2/ Calling GetCurrentTimestamp while holding a spinlock is
not a good idea and should be avoided. This was also a point
raised in [0]. Even when we move pg_stat_statements
to cumulative stats and not at the mercy of the spinlock for updating
entries, i would still hesitate to add an additional GetCurrentTimestamp()
for every call.

I wonder if we can use GetCurrentStatementStartTimestamp()
instead?

```
/*
* GetCurrentStatementStartTimestamp
*/
TimestampTz
GetCurrentStatementStartTimestamp(void)
{
return stmtStartTimestamp;
}
```

stmtStartTimestamp is the time the query started, which seems OK for
the use-case you are mentioning. But also, stmtStartTimestamp gets
set at the top-level so nested entries (toplevel = false ) will just
inherit the timestamp of the top-level entry.

IMO, this is the most important point in the patch for now.

3/ last_executed, or maybe (last_toplevel_start) if we go with #2 should not
be added under pgssEntry->Counters, but rather directory under pgssEntry.

@@ -213,6 +214,7 @@ typedef struct Counters
                                              * launched */
     int64        generic_plan_calls; /* number of calls using a generic plan */
     int64        custom_plan_calls;    /* number of calls using a
custom plan */
+    TimestampTz last_executed;    /* timestamp of last statement execution */
 } Counters;

4/ instead of a " last_executed" maybe the tests should be added to
entry_timestamp.sql?


[0]
https://www.postgresql.org/message-id/flat/CA%2BTgmoZgZMeuN8t9pawSt6M%3DmvxKiAZ4CvPofBWwwVWeZwHe4w%40mail.gmail.com#beeebe3ca4a3dcda4ed625f7c15bb2d8

--
Sami Imseih
Amazon Web Services (AWS)



Re: Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements

От
Christoph Berg
Дата:
Re: Sami Imseih
> I wonder if we can use GetCurrentStatementStartTimestamp()
> instead?

The main use case for this column is being able to retrieve the stats
that were updated since the last time one was looking. That only works
if it's the statement end time, or else long-running statements
spanning more than one poll interval would be missed.

Perhaps the column should rather be called "stats_last_updated" to
match "stats_since", and be moved to the very end? Similarly, nested
statements would also have to get that stamp.

Oh and, yes, I'm definitely +1 on this feature.

Christoph



Re: Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements

От
Sami Imseih
Дата:
> > I wonder if we can use GetCurrentStatementStartTimestamp()
> > instead?
>
> The main use case for this column is being able to retrieve the stats
> that were updated since the last time one was looking. That only works
> if it's the statement end time, or else long-running statements
> spanning more than one poll interval would be missed.

Sure, I get it is not perfect for the polling use-case due to the scenario
you mention,  but I don't think it will be acceptable to call
GetCurrentTimeStamp() at the end of every execution and especially
with a SpinLock held.

This will probably be worse for nested_tacking as well, due to
multiple GetCurrentTimeStamp() calls.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)



Re: Re[2]: [PATCH] Add last_executed timestamp to pg_stat_statements

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Feb 03, 2026 at 08:37:31PM -0600, Sami Imseih wrote:
> > > I wonder if we can use GetCurrentStatementStartTimestamp()
> > > instead?
> >
> > The main use case for this column is being able to retrieve the stats
> > that were updated since the last time one was looking. That only works
> > if it's the statement end time, or else long-running statements
> > spanning more than one poll interval would be missed.
> 
> Sure, I get it is not perfect for the polling use-case due to the scenario
> you mention,  but I don't think it will be acceptable to call
> GetCurrentTimeStamp() at the end of every execution and especially
> with a SpinLock held.

I think the same, that would not match (lmgr/README):

"
* Spinlocks.  These are intended for *very* short-term locks.  If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call
"

Out of curiosity I looked for GetCurrentTimeStamp() calls while holding a spinlock
and found one in WalReceiverMain(). But I guess it's less of an issue since
it's only called when the walreceiver starts.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com