Обсуждение: Additional stats for Relations

Поиск
Список
Период
Сортировка

Additional stats for Relations

От
NikhilS
Дата:
Hi, <br /><br /> Currently a "select * from pg_statio_user_tables;" displays only heap_blks_read, heap_blks_hit stats
amongstothers for the main relation. It would be good to have the following stats collected too. I think these stats
canbe used to better statistically analyze/understand the block I/O activity on the relation:<br /><br />
heap_blks_reused:The number of buffers returned by the FSM for use to store a new tuple in<br /><br />
heap_blks_extend:The number of times file extend was invoked on the relation<br /><br /> heap_blks_truncate: The total
numberof blocks that have been truncated due to vacuum activity e.g.<br /><br /> As an addendum to the truncate stats
above,we can also have the additional following stats:<br /><br /> heap_blks_maxtruncate: The max block of buffers
truncatedin one go<br /><br /> heap_blks_ntruncate: The number of times truncate was called on this relation<br
clear="all"/><br /> I can come up with a patch (already have one) for the above. Any thought/comments?<br /><br />
Regards,<br/> Nikhils <br /> (<a href="http://www.enterprisedb.com">www.enterprisedb.com</a>)<br />-- <br />All the
world'sa stage, and most of us are desperately unrehearsed.<br /> 

Re: Additional stats for Relations

От
"Jim C. Nasby"
Дата:
On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote:
> Currently a "select * from pg_statio_user_tables;" displays only
> heap_blks_read, heap_blks_hit stats amongst others for the main relation. It
> would be good to have the following stats collected too. I think these stats
> can be used to better statistically analyze/understand the block I/O
> activity on the relation:
> 
> heap_blks_reused: The number of buffers returned by the FSM for use to store
> a new tuple in
The description on this is misleading... FSM doesn't return buffers, it
returns pages that have free space on them.

I'm also not sure if this metric is what you actually want, since a
single page can be returned many times from the FSM even between
vacuums. Tracking how many pages for a relation have been put into the
FSM might be more useful...

> heap_blks_extend: The number of times file extend was invoked on the
> relation
> 
> heap_blks_truncate: The total number of blocks that have been truncated due
> to vacuum activity e.g.
> 
> As an addendum to the truncate stats above, we can also have the additional
> following stats:
> 
> heap_blks_maxtruncate: The max block of buffers truncated in one go
> 
> heap_blks_ntruncate: The number of times truncate was called on this
> relation
> 
> I can come up with a patch (already have one) for the above. Any
> thought/comments?

Do you have a use-case for this info? I can see where it might be neat
to know, but I'm not sure how you'd actually use it in the real world.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Additional stats for Relations

От
NikhilS
Дата:
Hi Jim,

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:
On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote:
> Currently a "select * from pg_statio_user_tables;" displays only
> heap_blks_read, heap_blks_hit stats amongst others for the main relation. It
> would be good to have the following stats collected too. I think these stats
> can be used to better statistically analyze/understand the block I/O
> activity on the relation:
>
> heap_blks_reused: The number of buffers returned by the FSM for use to store
> a new tuple in

The description on this is misleading... FSM doesn't return buffers, it
returns pages that have free space on them.
 
<Nikhils>
FSM returns the block number from which we fetch the buffer. This is similar to the way we track buffer_read stats in ReadBuffer.
<Nikhils>

 
I'm also not sure if this metric is what you actually want, since a
single page can be returned many times from the FSM even between
vacuums. Tracking how many pages for a relation have been put into the
FSM might be more useful...
 
<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the actual usage of the pages from the FSM? Agreed a single page can be returned multiple times, but since it serves a new tuple, shouldn't we track it?
<Nikhils>

 
> heap_blks_extend: The number of times file extend was invoked on the
> relation
>
> heap_blks_truncate: The total number of blocks that have been truncated due
> to vacuum activity e.g.
>
> As an addendum to the truncate stats above, we can also have the additional
> following stats:
>
> heap_blks_maxtruncate: The max block of buffers truncated in one go
>
> heap_blks_ntruncate: The number of times truncate was called on this
> relation
>
> I can come up with a patch (already have one) for the above. Any
> thought/comments?

Do you have a use-case for this info? I can see where it might be neat
to know, but I'm not sure how you'd actually use it in the real world.
 
<Nikhils>
The use-case according to me is that these stats help prove the effectiveness of autovacuum/vacuum operations. By varying some autovac guc variables, and doing subsequent (pgbench e.g.) runs, one can find out the optimum values for these variables using these stats.
<Nikhils>
 
Regards,
Nikhils
EnterpriseDB         http://www.enterprisedb.com
--
All the world's a stage, and most of us are desperately unrehearsed.
 

Re: Additional stats for Relations

От
"Simon Riggs"
Дата:
On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

> On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:


>         I'm also not sure if this metric is what you actually want,
>         since a
>         single page can be returned many times from the FSM even
>         between 
>         vacuums. Tracking how many pages for a relation have been put
>         into the
>         FSM might be more useful...
>  
> <Nikhils>
> Pages might be put into the FSM, but by this metric don't we get the
> actual usage of the pages from the FSM? Agreed a single page can be
> returned multiple times, but since it serves a new tuple, shouldn't we
> track it?
> <Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

>         > heap_blks_extend: The number of times file extend was
>         invoked on the
>         > relation

Sounds good

>         > heap_blks_truncate: The total number of blocks that have
>         been truncated due 
>         > to vacuum activity e.g.

Sounds good

>         > As an addendum to the truncate stats above, we can also have
>         the additional
>         > following stats:
>         >
>         > heap_blks_maxtruncate: The max block of buffers truncated in
>         one go 
>         >
>         > heap_blks_ntruncate: The number of times truncate was called
>         on this
>         > relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

>         Do you have a use-case for this info? I can see where it might
>         be neat 
>         to know, but I'm not sure how you'd actually use it in the
>         real world.
>  
> <Nikhils>
> The use-case according to me is that these stats help prove the
> effectiveness of autovacuum/vacuum operations. By varying some autovac
> guc variables, and doing subsequent (pgbench e.g.) runs, one can find
> out the optimum values for these variables using these stats. 
> <Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Additional stats for Relations

От
NikhilS
Дата:
Hi,

So:
heap_blks_reused (with Jim's semantics), heap_blks_extend, heap_blks_truncate are the "interesting" stats? Will try to work up a patch for this.

Regards,
Nikhils
EnterpriseDB   http://www.enterprisedb.com
On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:
On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

> On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:


>         I'm also not sure if this metric is what you actually want,
>         since a
>         single page can be returned many times from the FSM even
>         between
>         vacuums. Tracking how many pages for a relation have been put
>         into the
>         FSM might be more useful...
>
> <Nikhils>
> Pages might be put into the FSM, but by this metric don't we get the
> actual usage of the pages from the FSM? Agreed a single page can be
> returned multiple times, but since it serves a new tuple, shouldn't we
> track it?
> <Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

>         > heap_blks_extend: The number of times file extend was
>         invoked on the
>         > relation

Sounds good

>         > heap_blks_truncate: The total number of blocks that have
>         been truncated due
>         > to vacuum activity e.g.

Sounds good

>         > As an addendum to the truncate stats above, we can also have
>         the additional
>         > following stats:
>         >
>         > heap_blks_maxtruncate: The max block of buffers truncated in
>         one go
>         >
>         > heap_blks_ntruncate: The number of times truncate was called
>         on this
>         > relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

>         Do you have a use-case for this info? I can see where it might
>         be neat
>         to know, but I'm not sure how you'd actually use it in the
>         real world.
>
> <Nikhils>
> The use-case according to me is that these stats help prove the
> effectiveness of autovacuum/vacuum operations. By varying some autovac
> guc variables, and doing subsequent (pgbench e.g.) runs, one can find
> out the optimum values for these variables using these stats.
> <Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com





--
All the world's a stage, and most of us are desperately unrehearsed.

Re: Additional stats for Relations

От
"Jim C. Nasby"
Дата:
Also how many times a relation has been vacuumed (which puts all the
other numbers in more perspective... good catch Simon). And I think
number of pages that could not be added to the FSM would also be
extremely valuable.

On Wed, Oct 18, 2006 at 11:27:39AM +0530, NikhilS wrote:
> Hi,
> 
> So:
> heap_blks_reused (with Jim's semantics), heap_blks_extend,
> heap_blks_truncate are the "interesting" stats? Will try to work up a patch
> for this.
> 
> Regards,
> Nikhils
> EnterpriseDB   http://www.enterprisedb.com
> On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> >On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:
> >
> >> On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:
> >
> >
> >>         I'm also not sure if this metric is what you actually want,
> >>         since a
> >>         single page can be returned many times from the FSM even
> >>         between
> >>         vacuums. Tracking how many pages for a relation have been put
> >>         into the
> >>         FSM might be more useful...
> >>
> >> <Nikhils>
> >> Pages might be put into the FSM, but by this metric don't we get the
> >> actual usage of the pages from the FSM? Agreed a single page can be
> >> returned multiple times, but since it serves a new tuple, shouldn't we
> >> track it?
> >> <Nikhils>
> >
> >This makes sense for indexes, but only makes sense for heaps when we
> >know that the backend will keep re-accessing the block until it is full
> >- so only of interest in steady-state workloads.
> >
> >IMHO Jim's proposal makes more sense for general use.
> >
> >>         > heap_blks_extend: The number of times file extend was
> >>         invoked on the
> >>         > relation
> >
> >Sounds good
> >
> >>         > heap_blks_truncate: The total number of blocks that have
> >>         been truncated due
> >>         > to vacuum activity e.g.
> >
> >Sounds good
> >
> >>         > As an addendum to the truncate stats above, we can also have
> >>         the additional
> >>         > following stats:
> >>         >
> >>         > heap_blks_maxtruncate: The max block of buffers truncated in
> >>         one go
> >>         >
> >>         > heap_blks_ntruncate: The number of times truncate was called
> >>         on this
> >>         > relation
> >
> >Those last 2 sound too complex for normal use and ntruncate is most
> >likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
> >a more interesting metric? We've got last vacuum date, but no indication
> >of how frequently a vacuum has run.
> >
> >>         Do you have a use-case for this info? I can see where it might
> >>         be neat
> >>         to know, but I'm not sure how you'd actually use it in the
> >>         real world.
> >>
> >> <Nikhils>
> >> The use-case according to me is that these stats help prove the
> >> effectiveness of autovacuum/vacuum operations. By varying some autovac
> >> guc variables, and doing subsequent (pgbench e.g.) runs, one can find
> >> out the optimum values for these variables using these stats.
> >> <Nikhils>
> >
> >This should be useful for tuning space allocation/deallocation. If we
> >get this patch in early it should help get feedback on this area.
> >
> >--
> >  Simon Riggs
> >  EnterpriseDB   http://www.enterprisedb.com
> >
> >
> >
> 
> 
> -- 
> All the world's a stage, and most of us are desperately unrehearsed.

-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Additional stats for Relations

От
NikhilS
Дата:
Hi Jim,

On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote:
Also how many times a relation has been vacuumed (which puts all the
other numbers in more perspective... good catch Simon). And I think
number of pages that could not be added to the FSM would also be
extremely valuable.

By the above, do you mean the number of pages that could not be added to the FSM because they had freespace which was less than the threshold for this particular relation?

Regards,
Nikhils
EnterpriseDB   http://www.enterprisedb.com
 

On Wed, Oct 18, 2006 at 11:27:39AM +0530, NikhilS wrote:
> Hi,
>
> So:
> heap_blks_reused (with Jim's semantics), heap_blks_extend,
> heap_blks_truncate are the "interesting" stats? Will try to work up a patch
> for this.
>
> Regards,
> Nikhils
> EnterpriseDB   http://www.enterprisedb.com
> On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> >On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:
> >
> >> On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:
> >
> >
> >>         I'm also not sure if this metric is what you actually want,
> >>         since a
> >>         single page can be returned many times from the FSM even
> >>         between
> >>         vacuums. Tracking how many pages for a relation have been put
> >>         into the
> >>         FSM might be more useful...
> >>
> >> <Nikhils>
> >> Pages might be put into the FSM, but by this metric don't we get the
> >> actual usage of the pages from the FSM? Agreed a single page can be
> >> returned multiple times, but since it serves a new tuple, shouldn't we
> >> track it?
> >> <Nikhils>
> >
> >This makes sense for indexes, but only makes sense for heaps when we
> >know that the backend will keep re-accessing the block until it is full
> >- so only of interest in steady-state workloads.
> >
> >IMHO Jim's proposal makes more sense for general use.
> >
> >>         > heap_blks_extend: The number of times file extend was
> >>         invoked on the
> >>         > relation
> >
> >Sounds good
> >
> >>         > heap_blks_truncate: The total number of blocks that have
> >>         been truncated due
> >>         > to vacuum activity e.g.
> >
> >Sounds good
> >
> >>         > As an addendum to the truncate stats above, we can also have
> >>         the additional
> >>         > following stats:
> >>         >
> >>         > heap_blks_maxtruncate: The max block of buffers truncated in
> >>         one go
> >>         >
> >>         > heap_blks_ntruncate: The number of times truncate was called
> >>         on this
> >>         > relation
> >
> >Those last 2 sound too complex for normal use and ntruncate is most
> >likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
> >a more interesting metric? We've got last vacuum date, but no indication
> >of how frequently a vacuum has run.
> >
> >>         Do you have a use-case for this info? I can see where it might
> >>         be neat
> >>         to know, but I'm not sure how you'd actually use it in the
> >>         real world.
> >>
> >> <Nikhils>
> >> The use-case according to me is that these stats help prove the
> >> effectiveness of autovacuum/vacuum operations. By varying some autovac
> >> guc variables, and doing subsequent (pgbench e.g.) runs, one can find
> >> out the optimum values for these variables using these stats.
> >> <Nikhils>
> >
> >This should be useful for tuning space allocation/deallocation. If we
> >get this patch in early it should help get feedback on this area.
> >
> >--
> >  Simon Riggs
> >  EnterpriseDB   http://www.enterprisedb.com
> >
> >
> >
>
>
> --
> All the world's a stage, and most of us are desperately unrehearsed.

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



--
All the world's a stage, and most of us are desperately unrehearsed.

Re: Additional stats for Relations

От
"Jim C. Nasby"
Дата:
On Thu, Oct 19, 2006 at 04:10:46PM +0530, NikhilS wrote:
> Hi Jim,
> 
> On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote:
> >
> >Also how many times a relation has been vacuumed (which puts all the
> >other numbers in more perspective... good catch Simon). And I think
> >number of pages that could not be added to the FSM would also be
> >extremely valuable.
> 
> 
> By the above, do you mean the number of pages that could not be added to the
> FSM because they had freespace which was less than the threshold for this
> particular relation?

Yes... but... :)

We want to ignore pages that have less than the average request size,
because vacuum will never try and put them in the FSM anyway. We only
care about pages that were dropped because MaxFSMPages was less than
DesiredFSMPages (see freespace.c for more info).

It would also be useful to keep track of what relations have been bumped
out of the FSM (or never got recorded, though I'm not sure if that's
possible) because we've run into the MaxFSMRelations limit.

BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
and analyze, and NikhilS has a patch we're finalizing that would add 3
more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
we should have a separate view for vacuum/FSM statistics?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Additional stats for Relations

От
Tom Lane
Дата:
"Jim C. Nasby" <jim@nasby.net> writes:
> BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
> and analyze, and NikhilS has a patch we're finalizing that would add 3
> more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
> we should have a separate view for vacuum/FSM statistics?

I've seen no demonstration of a need for *any* of them, actually,
and am pretty dubious that we want to add so much collection overhead.
At least not without a major redesign of the stats reporting mechanism.
If we just drop in another seven counters, we'll create an immediate 50%
increase in the stats-file I/O volume, even when no vacuuming is
happening at all.
        regards, tom lane


Re: Additional stats for Relations

От
"Jim C. Nasby"
Дата:
On Thu, Oct 19, 2006 at 11:47:53AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
> > and analyze, and NikhilS has a patch we're finalizing that would add 3
> > more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
> > we should have a separate view for vacuum/FSM statistics?
> 
> I've seen no demonstration of a need for *any* of them, actually,
> and am pretty dubious that we want to add so much collection overhead.
> At least not without a major redesign of the stats reporting mechanism.
> If we just drop in another seven counters, we'll create an immediate 50%
> increase in the stats-file I/O volume, even when no vacuuming is
> happening at all.

Yeah, for stuff like vacuuming the current stats system may not make any
sense. Almost anything dealing with vacuum can really just be put into a
table, because it doesn't happen all that often.

My concern is that there's enough useful data to collect about vacuuming
and the FSM that it should probably get it's own set of tables/views,
rather than piggy-backing on pg_stat_*. But that ship has pretty much
sailed, so we're probably stuck with at least the last_* stuff in
pg_stat_* for the immediate future.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)