Обсуждение: [PROPOSAL] timestamp informations to pg_stat_statements

Поиск
Список
Период
Сортировка

[PROPOSAL] timestamp informations to pg_stat_statements

От
Jun Cheol Gim
Дата:
Hi hackers!

Following is a proposal to add timestamp informations to `pg_stat_statements`.

# Use case
- If we want to gather list and stats for queries executed at least once last 1 hour, we had to reset a hours ago. There is no way if we didn't.
- If we found some strange query from `pg_stat_statments`, we might want to identify when it ran firstly.

If we have timestamp of first and last executed, we can easily gather thess informations and there are tons of more use cases.

# Implementations
Firstly, I added API version 1.5 to add additional fields and I added two fields to Counters structure. Now it has 25 fields in total.

```
@@ -156,6 +158,8 @@ typedef struct Counters
  double    blk_read_time;  /* time spent reading, in msec */
  double    blk_write_time; /* time spent writing, in msec */
  double    usage;      /* usage factor */
+ TimestampTz   created;  /* timestamp of created time */
+ TimestampTz   last_updated; /* timestamp of last updated */
 } Counters;

 /*
```

The `created` field is filled at the first time the entry will added to hash table.

```
@@ -1663,6 +1690,8 @@ entry_alloc(pgssHashKey *key, Size query_offset, int query_len, int encoding,

    /* reset the statistics */
    memset(&entry->counters, 0, sizeof(Counters));
+   /* set the created timestamp */
+    entry->counters.created = GetCurrentTimestamp();
    /* set the appropriate initial usage count */
    entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
    /* re-initialize the mutex each time ... we assume no one using it */
```

The `last_updated` will be updated every time `pgss_store()` updates stats.

```
@@ -1251,6 +1256,7 @@ pgss_store(const char *query, uint32 queryId,
    e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
    e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
    e->counters.usage += USAGE_EXEC(total_time);
+   e->counters.last_updated = GetCurrentTimestamp();

    SpinLockRelease(&e->mutex);
  }
```

The attached is my first implementation.

Regards,
Jason Kim.
Вложения

Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Peter Geoghegan
Дата:
On Sun, Jul 17, 2016 at 12:22 AM, Jun Cheol Gim <dialogbox@gmail.com> wrote:
> If we have timestamp of first and last executed, we can easily gather thess
> informations and there are tons of more use cases.

-1 from me.

I think that this is the job of a tool that aggregates things from
pg_stat_statements. It's unfortunate that there isn't a good
third-party tool that does that, but there is nothing that prevents
it.

-- 
Peter Geoghegan



Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> On Sun, Jul 17, 2016 at 12:22 AM, Jun Cheol Gim <dialogbox@gmail.com> wrote:
>> If we have timestamp of first and last executed, we can easily gather thess
>> informations and there are tons of more use cases.

> -1 from me.

> I think that this is the job of a tool that aggregates things from
> pg_stat_statements. It's unfortunate that there isn't a good
> third-party tool that does that, but there is nothing that prevents
> it.

The concern I've got about this proposal is that the results get very
questionable as soon as we start dropping statement entries for lack
of space.  last_executed would be okay, perhaps, but first_executed
not so much.
        regards, tom lane



Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Peter Geoghegan
Дата:
On Sun, Jul 17, 2016 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The concern I've got about this proposal is that the results get very
> questionable as soon as we start dropping statement entries for lack
> of space.  last_executed would be okay, perhaps, but first_executed
> not so much.

Agreed.

Also, for what it's worth, I should point out to Jun that
GetCurrentTimestamp() should definitely not be called when a spinlock
is held like that.

-- 
Peter Geoghegan



Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Julien Rouhaud
Дата:
On 18/07/2016 01:06, Peter Geoghegan wrote:
> On Sun, Jul 17, 2016 at 12:22 AM, Jun Cheol Gim <dialogbox@gmail.com> wrote:
>> If we have timestamp of first and last executed, we can easily gather thess
>> informations and there are tons of more use cases.
> 
> -1 from me.
> 
> I think that this is the job of a tool that aggregates things from
> pg_stat_statements. It's unfortunate that there isn't a good
> third-party tool that does that, but there is nothing that prevents
> it.
> 

FWIW there's https://github.com/dalibo/powa-archivist which does that,
assuming you're using a 9.4+ server.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Jason Kim
Дата:
Hi guys,
Thank you for feedbacks.

> I think that this is the job of a tool that aggregates things from 
> pg_stat_statements. It's unfortunate that there isn't a good 
> third-party tool that does that, but there is nothing that prevents 
> it. 

Right. We can do this if we aggregate it frequently enough. However,
third-parties can do it better if we have more informations.
I think these are fundamental informations to strong third-parties could be.

> The concern I've got about this proposal is that the results get very 
> questionable as soon as we start dropping statement entries for lack 
> of space.  last_executed would be okay, perhaps, but first_executed 
> not so much. 

If we set pg_stat_statements.max to large enough, there could be long
lived entries and short lived ones simultaneously. In this case, per call 
statistics could be accurate but per seconds stats can not.
The idea of I named it as created and last_updated (not
first_executed and last_executed) was this. It represents timestamp ofstats are created and updated, so we can
calculateper second stats with
 
simple math.

> Also, for what it's worth, I should point out to Jun that 
> GetCurrentTimestamp() should definitely not be called when a spinlock 
> is held like that. 

I moved it outside of spinlock.

@@ -1204,6 +1209,11 @@ pgss_store(const char *query, uint32 queryId,    */   volatile pgssEntry *e = (volatile
pgssEntry*) entry;
 

+   /*
+    * Read a timestamp before grab the spinlock
+    */
+   TimestampTz last_updated = GetCurrentTimestamp();
+   SpinLockAcquire(&e->mutex);
   /* "Unstick" entry if it was previously sticky */
@@ -1251,6 +1261,7 @@ pgss_store(const char *query, uint32 queryId,   e->counters.blk_read_time +=
INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);   e->counters.blk_write_time +=
INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);   e->counters.usage += USAGE_EXEC(total_time);
+   e->counters.last_updated = last_updated;
   SpinLockRelease(&e->mutex); }

pg_stat_statements_with_timestamp_v2.patch
<http://postgresql.nabble.com/file/n5912461/pg_stat_statements_with_timestamp_v2.patch>  

Regards,
Jason Kim.



--
View this message in context:
http://postgresql.nabble.com/PROPOSAL-timestamp-informations-to-pg-stat-statements-tp5912306p5912461.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [HACKERS] [PROPOSAL] timestamp informations to pg_stat_statements

От
Robert Haas
Дата:
On Sun, Jul 17, 2016 at 7:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On Sun, Jul 17, 2016 at 12:22 AM, Jun Cheol Gim <dialogbox@gmail.com> wrote:
>>> If we have timestamp of first and last executed, we can easily gather thess
>>> informations and there are tons of more use cases.
>
>> -1 from me.
>
>> I think that this is the job of a tool that aggregates things from
>> pg_stat_statements. It's unfortunate that there isn't a good
>> third-party tool that does that, but there is nothing that prevents
>> it.
>
> The concern I've got about this proposal is that the results get very
> questionable as soon as we start dropping statement entries for lack
> of space.  last_executed would be okay, perhaps, but first_executed
> not so much.

ISTM that last_executed is useful - you can then see for yourself
which of the statements that you see in the pg_stat_statements output
have been issued recently, and which are older.  I mean, you could
also do that, as Peter says, with an additional tool that takes
periodic snapshots of the data and then figures out an approximate
last_executed time, but if you had this, you wouldn't need an
additional tool, at least not for simple cases.  Better yet, the data
would be more exact.  I dunno what's not to like about that, unless
we're worried that tracking it will incur too much overhead.

first_executed doesn't seem as useful as last_executed, but it isn't
useless either.  It can't properly be read as the first time that
statement was ever executed, but it can be properly read as the amount
of time that has passed during which that statement has been executed
frequently enough to stay in the hash table, which is something that
someone might want to know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
legrand legrand
Дата:
+1
Having the time of first occurence of a statement is very usefull for
trouble shouting,
it permits for exemple to retrieve the order of operations in some complex
cases (and thoses informations aren't taken by any third party collecting
tool, that will only be able to provide a time range of occurence).

I thougth that pgss rows where removed randomly when max rows was reached,
wouldn't having last_executed information permit a kind of LRU removal ?

Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
Tomas Vondra
Дата:

On 03/10/2018 04:43 PM, legrand legrand wrote:
> +1
> Having the time of first occurence of a statement is very usefull
> for trouble shouting, it permits for exemple to retrieve the order of
> operations in some complex cases (and thoses informations aren't
> taken by any third party collecting tool, that will only be able to
> provide a time range of occurence).
> 

I really don't see how this would be useful in reconstructing order of
operations, particularly in complex cases where I'd expect the queries
to be executed repeatedly / interleaved in different ways. Furthermore,
it would only work for the very first execution of all statements, so
you would probably have to reset the stats over and over - which seems
to directly contradict the purpose of pg_stat_statements (aggregation of
data over longer periods of time).

So unfortunately this seems rather useless, and log_statements=all seems
like a much better / reliable approach to achieve that.

I also doubt it really allows computation of averages, e.g. queries per
second, because practical workloads usually have daily/weekly patterns
(and different queries may follow different patterns).

So I agree with Peter Geoghegan that tools regularly snapshotting
pg_stat_statements and processing that are a better approach. I've seen
a bunch of simple scripts doing just that, actually.

> I thougth that pgss rows where removed randomly when max rows was
> reached, wouldn't having last_executed information permit a kind of
> LRU removal ?
> 

No, the entries are not removed randomly. We track "usage" for each
entry (essentially +1 for each time the query got executed, with a decay
factor applied on each eviction (and we evict 5% at a time).

It's not immediately obvious why something based on time of the
first/last execution would be better than the current algorithm.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PROPOSAL] timestamp informations to pg_stat_statements

От
legrand legrand
Дата:
> No, the entries are not removed randomly. We track "usage" for each
> entry (essentially +1 for each time the query got executed, with a decay
> factor applied on each eviction (and we evict 5% at a time).

OK I apologize, I hadn't looked in source code in details, and effectively
the "Usage" algorithm based on execution_time and calls will keep the 
longest queries (and that's a good thing). In some cases (after an 
application release or a postgres patch for exemple) we may be interested 
in leastiests ones.

I'm still convinced that adding those kinds of informations 
(with a planid as done in pg_stat_plans) would be an improvement.

Maybe because having used v$sql, first_load_time, last_active_time 
and plan_hash_value for (too) many years).

It's always important to know when a new plan (good or bad) occurs, and pgss
could
give this kind of informations even without aggregation collector.

Last point (for developer), after testing this patch, in most cases when
calls=1: created and 
last_updated values are identical, they don't even differ to reflect 
execution (nor planing) duration. Is that a precision or coding problem ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html