Обсуждение: Exposing the stats snapshot timestamp to SQL

Поиск
Список
Период
Сортировка

Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:
In a previous thread Tom Lane said:

(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well.  But that's future-feature territory.)
 

It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.

Main purpose of this patch is to expose the timestamp of the current stats snapshot so that it can be leveraged by monitoring code.  The obvious case is alerting if the stats collector becomes unresponsive.  The common use case is to smooth out graphs which are built from high frequency monitoring of the stats collector.

The timestamp is already available as part of PgStat_GlobalStats.  This patch is just the boilerplate (+docs & tests) needed to expose that value to SQL.  It shouldn't impact anything else in the server.

I'm not particularly attached to the function name, but I didn't have a better idea.

The patch should apply cleanly to master.

- Matt K
Вложения

Re: Exposing the stats snapshot timestamp to SQL

От
Robert Haas
Дата:
On Thu, Jan 29, 2015 at 12:18 AM, Matt Kelly <mkellycs@gmail.com> wrote:
> In a previous thread Tom Lane said:
>
>> (I'm also wondering if it'd make sense to expose the stats timestamp
>> as a callable function, so that the case could be dealt with
>> programmatically as well.  But that's future-feature territory.)
>
>
> (http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)
>
> It seemed the appropriate scope for my first submission, and that feature
> has been on my wish list for a while, so I thought I'd grab it.
>
> Main purpose of this patch is to expose the timestamp of the current stats
> snapshot so that it can be leveraged by monitoring code.  The obvious case
> is alerting if the stats collector becomes unresponsive.  The common use
> case is to smooth out graphs which are built from high frequency monitoring
> of the stats collector.
>
> The timestamp is already available as part of PgStat_GlobalStats.  This
> patch is just the boilerplate (+docs & tests) needed to expose that value to
> SQL.  It shouldn't impact anything else in the server.
>
> I'm not particularly attached to the function name, but I didn't have a
> better idea.
>
> The patch should apply cleanly to master.

Please add your patch here so we don't forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Exposing the stats snapshot timestamp to SQL

От
Jim Nasby
Дата:
On 1/28/15 11:18 PM, Matt Kelly wrote:
> In a previous thread Tom Lane said:
>
>     (I'm also wondering if it'd make sense to expose the stats timestamp
>     as a callable function, so that the case could be dealt with
>     programmatically as well.  But that's future-feature territory.)
>
> (http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)
>
> It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I
thoughtI'd grab it.
 

I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this:

+ /* Get the timestamp of the current statistics snapshot */
+ Datum
+ pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
+ {
+     PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
+ }

Is the community OK with referencing stats_timestamp that way?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:
Robert, I'll add it to the commitfest.

Jim, I'm not sure I understand what you mean?  This new function follows the same conventions as everything else in the file.  TimestampTz is just a typedef for int64.  Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct.

- Matt K.

On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/28/15 11:18 PM, Matt Kelly wrote:
In a previous thread Tom Lane said:

    (I'm also wondering if it'd make sense to expose the stats timestamp
    as a callable function, so that the case could be dealt with
    programmatically as well.  But that's future-feature territory.)

(http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)

It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.

I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this:

+ /* Get the timestamp of the current statistics snapshot */
+ Datum
+ pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
+ {
+       PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
+ }

Is the community OK with referencing stats_timestamp that way?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:

On Thu, Jan 29, 2015 at 7:01 PM, Matt Kelly <mkellycs@gmail.com> wrote:
Robert, I'll add it to the commitfest.

Jim, I'm not sure I understand what you mean?  This new function follows the same conventions as everything else in the file.  TimestampTz is just a typedef for int64.  Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct.

- Matt K.

On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/28/15 11:18 PM, Matt Kelly wrote:
In a previous thread Tom Lane said:

    (I'm also wondering if it'd make sense to expose the stats timestamp
    as a callable function, so that the case could be dealt with
    programmatically as well.  But that's future-feature territory.)

(http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)

It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.

I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this:

+ /* Get the timestamp of the current statistics snapshot */
+ Datum
+ pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
+ {
+       PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
+ }

Is the community OK with referencing stats_timestamp that way?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Matt Kelly <mkellycs@gmail.com> writes:
> Jim, I'm not sure I understand what you mean?  This new function follows
> the same conventions as everything else in the file.  TimestampTz is just a
> typedef for int64.

... or double.  Have you checked that the code behaves properly with
--disable-integer-timestamps?
        regards, tom lane



Re: Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:
On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Kelly <mkellycs@gmail.com> writes:
> Jim, I'm not sure I understand what you mean?  This new function follows
> the same conventions as everything else in the file.  TimestampTz is just a
> typedef for int64.

... or double.  Have you checked that the code behaves properly with
--disable-integer-timestamps?

                        regards, tom lane


Well, yes int or double.  I should have been more clear about that.  Its a good point though that I should run the server with disable for completeness.
I presume you meant --disable-integer-datetimes.  I just ran that test case now, all set.

For my own edification, was there really any risk of something so trivial breaking due to that setting, or was it just for completeness? (which is a perfectly valid reason)

Thanks,
- Matt K.

Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Matt Kelly <mkellycs@gmail.com> writes:
> On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... or double.  Have you checked that the code behaves properly with
>> --disable-integer-timestamps?

> I presume you meant --disable-integer-datetimes.  I just ran that test case
> now, all set.

Right, my own sloppiness there.

> For my own edification, was there really any risk of something so trivial
> breaking due to that setting, or was it just for completeness? (which is a
> perfectly valid reason)

I hadn't looked at your code at that point, and now that I have, I agree
it's unlikely to have had a problem with this.  Still, it's a good thing
to check, particularly considering the lack of buildfarm coverage of this
option :-(.
        regards, tom lane



Re: Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:
​Actually, I just did one more code review of myself, and somehow missed that I submitted the version with the wrong oid.  The oid used in the first version is wrong (10000) and was from before I read the docs on properly picking one.

Attached is the fixed version. (hopefully with the right mime-type and wrong extension.  Alas, gmail doesn't let you set mime-types; time to find a new email client...)

- Matt K.

Вложения

Re: Exposing the stats snapshot timestamp to SQL

От
David Fetter
Дата:
On Fri, Jan 30, 2015 at 12:07:22AM -0500, Tom Lane wrote:
> Matt Kelly <mkellycs@gmail.com> writes:
> > On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ... or double.  Have you checked that the code behaves properly with
> >> --disable-integer-timestamps?
> 
> > I presume you meant --disable-integer-datetimes.  I just ran that test case
> > now, all set.
> 
> Right, my own sloppiness there.
> 
> > For my own edification, was there really any risk of something so trivial
> > breaking due to that setting, or was it just for completeness? (which is a
> > perfectly valid reason)
> 
> I hadn't looked at your code at that point, and now that I have, I agree
> it's unlikely to have had a problem with this.  Still, it's a good thing
> to check, particularly considering the lack of buildfarm coverage of this
> option :-(.

Given that this has been deprecated for years, maybe it's time we took
it out in 9.5.  That the interested parties haven't bothered to put
buildfarm members in that use the option tells me that they're not all
that interested.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Exposing the stats snapshot timestamp to SQL

От
Tomas Vondra
Дата:
Hi there,

On 30.1.2015 06:27, Matt Kelly wrote:
> ​Actually, I just did one more code review of myself, and somehow missed
> that I submitted the version with the wrong oid.  The oid used in the
> first version is wrong (10000) and was from before I read the docs on
> properly picking one.
> 
> Attached is the fixed version. (hopefully with the right mime-type and
> wrong extension.  Alas, gmail doesn't let you set mime-types; time to
> find a new email client...)

I do have a question regarding the patch, although I see the patch is
already marked as 'ready for committer' (sorry, somehow managed to miss
the patch until now).

I see the patch only works with the top-level snapshot timestamp, stored
in globalStats, but since 9.3 (when the stats were split into per-db
files) we track per-database timestamps too.

Shouldn't we make those timestamps accessible too? It's not necessary
for detecting unresponsive statistics collector (if it's stuck it's not
writing anything, so the global timestamp will be old too), but it seems
more appropriate for querying database-level stats to query
database-level timestamp too.

But maybe that's not necessary, because to query database stats you have
to be connected to that particular database and that should write fresh
stats, so the timestamps should not be very different.

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I see the patch only works with the top-level snapshot timestamp, stored
> in globalStats, but since 9.3 (when the stats were split into per-db
> files) we track per-database timestamps too.

> Shouldn't we make those timestamps accessible too? It's not necessary
> for detecting unresponsive statistics collector (if it's stuck it's not
> writing anything, so the global timestamp will be old too), but it seems
> more appropriate for querying database-level stats to query
> database-level timestamp too.

I'm inclined to say not; I think that's exposing an implementation detail
that we might regret exposing, later.  (IOW, it's not hard to think of
rearrangements that might mean there wasn't a per-database stamp anymore.)

> But maybe that's not necessary, because to query database stats you have
> to be connected to that particular database and that should write fresh
> stats, so the timestamps should not be very different.

Yeah.  The only use-case that's been suggested is detecting an
unresponsive stats collector, and the main timestamp should be plenty for
that.
        regards, tom lane



Re: Exposing the stats snapshot timestamp to SQL

От
Tomas Vondra
Дата:
On 20.2.2015 02:58, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I see the patch only works with the top-level snapshot timestamp,
>> stored in globalStats, but since 9.3 (when the stats were split
>> into per-db files) we track per-database timestamps too.
> 
>> Shouldn't we make those timestamps accessible too? It's not
>> necessary for detecting unresponsive statistics collector (if it's
>> stuck it's not writing anything, so the global timestamp will be
>> old too), but it seems more appropriate for querying database-level
>> stats to query database-level timestamp too.
> 
> I'm inclined to say not; I think that's exposing an implementation 
> detail that we might regret exposing, later. (IOW, it's not hard to 
> think of rearrangements that might mean there wasn't a per-database 
> stamp anymore.)

Fair point, but isn't the global timestamp an implementation detail too?
Although we're less likely to remove the global timestamp, no doubt
about that ...

>> But maybe that's not necessary, because to query database stats you
>> have to be connected to that particular database and that should
>> write fresh stats, so the timestamps should not be very different.
> 
> Yeah.  The only use-case that's been suggested is detecting an 
> unresponsive stats collector, and the main timestamp should be plenty
> for that.

Well, the patch also does this:


*** 28,34 **** SELECT pg_sleep_for('2 seconds'); CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read,
t.idx_scan,t.idx_tup_fetch,        (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
 
!        (b.idx_blks_read + b.idx_blks_hit) AS idx_blks   FROM pg_catalog.pg_stat_user_tables AS t,
pg_catalog.pg_statio_user_tablesAS b  WHERE t.relname='tenk2' AND b.relname='tenk2';
 
--- 28,35 ---- CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
(b.heap_blks_read+ b.heap_blks_hit) AS heap_blks,
 
!        (b.idx_blks_read + b.idx_blks_hit) AS idx_blks,
!        pg_stat_snapshot_timestamp() as snap_ts   FROM pg_catalog.pg_stat_user_tables AS t,
pg_catalog.pg_statio_user_tablesAS b  WHERE t.relname='tenk2' AND b.relname='tenk2';
 
***************

i.e. it adds the timestamp into queries selecting from database-level
catalogs. But OTOH we're usually using now() here, so the global
snapshot is an improvement here, and if the collector is stuck it's not
going to update of the timestamps.

So I'm OK with this patch.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Matt Kelly <mkellycs@gmail.com> writes:
> Attached is the fixed version. (hopefully with the right mime-type and
> wrong extension.  Alas, gmail doesn't let you set mime-types; time to find
> a new email client...)

Committed with a couple of changes:

* I changed the function name from pg_stat_snapshot_timestamp to
pg_stat_get_snapshot_timestamp, which seemed to me to be the style
in general use in the stats stuff for inquiry functions.

* The function should be marked stable not volatile, since its value
doesn't change any faster than all the other stats inquiry functions.
        regards, tom lane



Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Well, the patch also does this:


> *** 28,34 **** SELECT pg_sleep_for('2 seconds');
>   CREATE TEMP TABLE prevstats AS
>   SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
>          (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
> !        (b.idx_blks_read + b.idx_blks_hit) AS idx_blks
>     FROM pg_catalog.pg_stat_user_tables AS t,
>          pg_catalog.pg_statio_user_tables AS b
>    WHERE t.relname='tenk2' AND b.relname='tenk2';
> --- 28,35 ----
>   CREATE TEMP TABLE prevstats AS
>   SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
>          (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
> !        (b.idx_blks_read + b.idx_blks_hit) AS idx_blks,
> !        pg_stat_snapshot_timestamp() as snap_ts
>     FROM pg_catalog.pg_stat_user_tables AS t,
>          pg_catalog.pg_statio_user_tables AS b
>    WHERE t.relname='tenk2' AND b.relname='tenk2';
> ***************


That's merely a regression test to verify that the value appears to
advance from time to time ... I don't think it has any larger meaning.
(It will be interesting to see what this new test query reports in the
transient buildfarm failures we still occasionally see in the stats
test.)
        regards, tom lane



Re: Exposing the stats snapshot timestamp to SQL

От
Matt Kelly
Дата:
Yeah.  The only use-case that's been suggested is detecting an
unresponsive stats collector, and the main timestamp should be plenty for
that.
Lately, I've spent most of my time doing investigation into increasing qps.  Turned out we've been able to triple our throughput by monitoring experiments at highly granular time steps (1 to 2 seconds).  Effects that were invisible with 30 second polls of the stats were obvious with 2 second polls.

The problem with doing highly granular snapshots is that the postgres counters are monotonically increasing, but only when stats are published.  Currently you have no option except to divide by the delta of now() between the polling intervals. If you poll every 2 seconds the max error is about .5/2 or 25%.  It makes reading those numbers a bit noisy.  Using (snapshot_timestamp_new - snapshot_timestamp_old) as the denominator in that calculation should help to smooth out that noise and show a clearer picture.

However, I'm happy with the committed version. Thanks Tom.

- Matt K.


On Thu, Feb 19, 2015 at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Kelly <mkellycs@gmail.com> writes:
> Attached is the fixed version. (hopefully with the right mime-type and
> wrong extension.  Alas, gmail doesn't let you set mime-types; time to find
> a new email client...)

Committed with a couple of changes:

* I changed the function name from pg_stat_snapshot_timestamp to
pg_stat_get_snapshot_timestamp, which seemed to me to be the style
in general use in the stats stuff for inquiry functions.

* The function should be marked stable not volatile, since its value
doesn't change any faster than all the other stats inquiry functions.

                        regards, tom lane

Re: Exposing the stats snapshot timestamp to SQL

От
Tom Lane
Дата:
Matt Kelly <mkellycs@gmail.com> writes:
>> Yeah.  The only use-case that's been suggested is detecting an
>> unresponsive stats collector, and the main timestamp should be plenty for
>> that.

> The problem with doing highly granular snapshots is that the postgres
> counters are monotonically increasing, but only when stats are published.
> Currently you have no option except to divide by the delta of now() between
> the polling intervals. If you poll every 2 seconds the max error is about
> .5/2 or 25%.  It makes reading those numbers a bit noisy.  Using
> (snapshot_timestamp_new
> - snapshot_timestamp_old) as the denominator in that calculation should
> help to smooth out that noise and show a clearer picture.

Ah, interesting!  Thanks for pointing out another use case.
        regards, tom lane