Обсуждение: Re: Stats collector performance improvement
[ moving to -hackers ]
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
> [ ... proposed patch that seems pretty klugy to me ... ]
I wonder whether we shouldn't consider something more drastic, like
getting rid of the intermediate stats buffer process entirely.
The original design for the stats communication code was based on the
premise that it's better to drop data than to make backends wait on
the stats collector.  However, as things have turned out I think this
notion is a flop: the people who are using stats at all want the stats
to be reliable.  We've certainly seen plenty of gripes from people who
are unhappy that backend-exit messages got dropped, and anyone who's
using autovacuum would really like the tuple update counts to be pretty
solid too.
If we abandoned the unreliable-communication approach, could we build
something with less overhead?
        regards, tom lane
			
		"Tom Lane" <tgl@sss.pgh.pa.us> wrote > > I wonder whether we shouldn't consider something more drastic, like > getting rid of the intermediate stats buffer process entirely. > > The original design for the stats communication code was based on the > premise that it's better to drop data than to make backends wait on > the stats collector. However, as things have turned out I think this > notion is a flop: the people who are using stats at all want the stats > to be reliable. We've certainly seen plenty of gripes from people who > are unhappy that backend-exit messages got dropped, and anyone who's > using autovacuum would really like the tuple update counts to be pretty > solid too. > AFAICS if we can maintain the stats counts solid, then it may hurt performance dramatically. Think if we maintain pgstat_count_heap_insert()/pgstat_count_heap_delete() pretty well, then we get a replacement of count(*). To do so, I believe that will add another lock contention on the target table stats. Regards, Qingqing
Ühel kenal päeval, E, 2006-01-02 kell 15:20, kirjutas Tom Lane: > [ moving to -hackers ] > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I did some research on this because the numbers Tom quotes indicate there > > is something wrong in the way we process stats_command_string > > statistics. > > [ ... proposed patch that seems pretty klugy to me ... ] > > I wonder whether we shouldn't consider something more drastic, like > getting rid of the intermediate stats buffer process entirely. > > The original design for the stats communication code was based on the > premise that it's better to drop data than to make backends wait on > the stats collector. However, as things have turned out I think this > notion is a flop: the people who are using stats at all want the stats > to be reliable. We've certainly seen plenty of gripes from people who > are unhappy that backend-exit messages got dropped, and anyone who's > using autovacuum would really like the tuple update counts to be pretty > solid too. > > If we abandoned the unreliable-communication approach, could we build > something with less overhead? Weell, at least it should be non-WAL, and probably non-fsync, at least optionally . Maybe also inserts inserts + offline aggregator (instead of updates) to avoid lock contention. Something that collects data in blocks of local or per-backend shared memory in each backend and then gives complete blocks to aggregator process. Maybe use 2 alternating blocks per backend - 1 for ongoing stats collection and another given to aggregator. this has a little time shift, but will deliver accurate starts in the end. Things that need up-to-date stats (like pg_stat_activity), should look (and lock) also the ongoing satas collection blocks if needed (how do we know know the *if*) and delay each backend process momentaryly by looking. ----------------- Hannu
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> AFAICS if we can maintain the stats counts solid, then it may hurt 
> performance dramatically. Think if we maintain 
> pgstat_count_heap_insert()/pgstat_count_heap_delete() pretty well, then we 
> get a replacement of count(*).
Not at all.  For one thing, the stats don't attempt to maintain
per-transaction state, so they don't have the MVCC issues of count(*).
I'm not suggesting any fundamental changes in what is counted or when.
The two compromises that were made in the original stats design to make
it fast were (1) stats updates lag behind reality, and (2) some updates
may be missed entirely.  Now that we have a couple of years' field
experience with the code, it seems that (1) is acceptable for real usage
but (2) not so much.  And it's not even clear that we are buying any
performance gain from (2), considering that it's adding the overhead of
passing the data through an extra process.
        regards, tom lane
			
		On 1/2/2006 3:20 PM, Tom Lane wrote: > [ moving to -hackers ] > > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I did some research on this because the numbers Tom quotes indicate there >> is something wrong in the way we process stats_command_string >> statistics. >> [ ... proposed patch that seems pretty klugy to me ... ] > > I wonder whether we shouldn't consider something more drastic, like > getting rid of the intermediate stats buffer process entirely. > > The original design for the stats communication code was based on the > premise that it's better to drop data than to make backends wait on The original design was geared towards searching for useless/missing indexes and tuning activity like that. This never happened, but instead people tried to use it as a reliable debugging or access statistics aid ... which is fine but not what it originally was intended for. So yes, I think looking at what it usually is used for, a message passing system like SysV message queues (puke) or similar would do a better job. Jan > the stats collector. However, as things have turned out I think this > notion is a flop: the people who are using stats at all want the stats > to be reliable. We've certainly seen plenty of gripes from people who > are unhappy that backend-exit messages got dropped, and anyone who's > using autovacuum would really like the tuple update counts to be pretty > solid too. > > If we abandoned the unreliable-communication approach, could we build > something with less overhead? > > regards, tom lane -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Mon, 2006-01-02 at 16:48 -0500, Tom Lane wrote: > The two compromises that were made in the original stats design to make > it fast were (1) stats updates lag behind reality, and (2) some updates > may be missed entirely. Now that we have a couple of years' field > experience with the code, it seems that (1) is acceptable for real usage > but (2) not so much. We decided that the stats update had to occur during execution, in case the statement aborted and row versions were not notified. That means we must notify things as they happen, yet could use a reliable queuing system that could suffer a delay in the stats becoming available. But how often do we lose a backend? Could we simply buffer that a little better? i.e. don't send message to stats unless we have altered at least 10 rows? So we would buffer based upon the importance of the message, not the actual size of the message. That way singleton-statements won't generate the same stats traffic, but we risk losing a buffers worth of row changes should we crash - everything would still work if we lost a few small row change notifications. We can also save lots of cycles on the current statement overhead, which is currently the worst part of the stats, performance-wise. That definitely needs redesign. AFAICS we only ever need to know the SQL statement via the stats system if the statement has been running for more than a few minutes - the main use case is for an admin to be able to diagnose a rogue or hung statement. Pushing the statement to stats every time is just a big overhead. That suggests we should either have a pull or a deferred push (longer-than-X-secs) approach. Best Regards, Simon Riggs
On Tue, Jan 03, 2006 at 09:40:53AM +0000, Simon Riggs wrote: > On Mon, 2006-01-02 at 16:48 -0500, Tom Lane wrote: > We can also save lots of cycles on the current statement overhead, which > is currently the worst part of the stats, performance-wise. That > definitely needs redesign. AFAICS we only ever need to know the SQL > statement via the stats system if the statement has been running for > more than a few minutes - the main use case is for an admin to be able > to diagnose a rogue or hung statement. Pushing the statement to stats > every time is just a big overhead. That suggests we should either have a > pull or a deferred push (longer-than-X-secs) approach. I would argue that minutes is too long, but of course this could be user-adjustable. I suspect that even waiting just a second could be a huge win, since this only matters if you're executing a lot of statements and you won't be doing that if those statements are taking more than a second or two to execute. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > I would argue that minutes is too long, but of course this could be > user-adjustable. I suspect that even waiting just a second could be a > huge win, since this only matters if you're executing a lot of > statements and you won't be doing that if those statements are taking > more than a second or two to execute. That's not necessarily true at all. You could just as easily have a performance problem caused by a quick statement that is being executed many times as a slow statement that is being executed few times. That is, you could be executing dozens of queries that take seconds or minutes once a second but none of those might be the problem. The problem might be the query that's taking only 300ms that you're executing hundreds of of times a minute. Moreover, if you're not gathering stats for queries that are fast then how will you know whether they're performing properly when you look at them when they do show up? -- greg
Ühel kenal päeval, T, 2006-01-03 kell 09:40, kirjutas Simon Riggs: > We can also save lots of cycles on the current statement overhead, which > is currently the worst part of the stats, performance-wise. That > definitely needs redesign. AFAICS we only ever need to know the SQL > statement via the stats system if the statement has been running for > more than a few minutes - the main use case is for an admin to be able > to diagnose a rogue or hung statement. Interestingly I use pg_stat_activity view to watch for stuck backends, "stuck" in the sense that they have not noticed when client want away and are now waitin the TCP timeout to happen. I query for backends which have been in "<IDLE>" state for longer than XX seconds. I guess that at least some kind of indication for this should be available. Of course this would be much less of a problem if there was a possibility for sime kind of keepalive system to detect when client/frontend goes away. > Pushing the statement to stats > every time is just a big overhead. That suggests we should either have a > pull I could live with "push", where pg_stat_activity would actually ask each live backend for its "current query". This surely happens less often than queries are performed (up to few thousand per sec) ------------- Hannu
Hannu Krosing <hannu@skype.net> writes: > Interestingly I use pg_stat_activity view to watch for stuck backends, > "stuck" in the sense that they have not noticed when client want away > and are now waitin the TCP timeout to happen. I query for backends which > have been in "<IDLE>" state for longer than XX seconds. I guess that at > least some kind of indication for this should be available. You mean like the tcp_keepalives_idle option? http://www.postgresql.org/docs/8.1/interactive/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE -- greg
Ühel kenal päeval, P, 2006-01-08 kell 11:49, kirjutas Greg Stark: > Hannu Krosing <hannu@skype.net> writes: > > > Interestingly I use pg_stat_activity view to watch for stuck backends, > > "stuck" in the sense that they have not noticed when client want away > > and are now waitin the TCP timeout to happen. I query for backends which > > have been in "<IDLE>" state for longer than XX seconds. I guess that at > > least some kind of indication for this should be available. > > You mean like the tcp_keepalives_idle option? > > http://www.postgresql.org/docs/8.1/interactive/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE > Kind of, only I'd like to be able to set timeouts less than 120 minutes. from: http://developer.apple.com/documentation/mac/NetworkingOT/NetworkingWOT-390.html#HEADING390-0 kp_timeout Set the requested timeout value, in minutes. Specify a value of T_UNSPEC to use the default value.You may specify any positive value for this field of 120 minutes or greater. The timeout value is not anabsolute requirement; if you specify a value less than 120 minutes, TCP will renegotiate a timeout of 120 minutes. ----------- Hannu