Обсуждение: Configurable Additional Stats
I've got a requirement to produce some additional stats from the server while it executes. Specifically, I'm looking at table interaction stats to make it easier to determine replication sets accurately for a given transaction mix. On brief discussion, seems like a good approach would be to put in a user exit/plugin/hook in AtEOXact_PgStat(). That way we don't need to add more log_* parameters for this and every additional need. Something like this... if (stats_hook)(* stats_hook)(pgStatTabList); Any objections to sliding this in? BTW, do we have a section in the docs on what plugin points are now available? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > if (stats_hook) > (* stats_hook)(pgStatTabList); > Any objections to sliding this in? Only that it's useless. What are you going to do in such a hook? Not send more info to the stats collector, because the message format is predetermined. AFAICS, if you want to extend the set of stats collected, you need much more invasive changes than this, and are really going to have to resort to changing the server source code. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> if (stats_hook) >> (* stats_hook)(pgStatTabList); > >> Any objections to sliding this in? > > Only that it's useless. What are you going to do in such a hook? Simon left for camping before you sent this. My understanding is he wants to gather the data elsewhere. So either he'll just elog when it and grovel the logs or send it via some other method. Alternatively he could aggregate the data himself and save it in a file kind of gprof-style. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> "Simon Riggs" <simon@2ndquadrant.com> writes: >>> if (stats_hook) >>> (* stats_hook)(pgStatTabList); >>> Any objections to sliding this in? >> Only that it's useless. What are you going to do in such a hook? > > Simon left for camping before you sent this. My understanding is he wants to > gather the data elsewhere. So either he'll just elog when it and grovel the > logs or send it via some other method. Alternatively he could aggregate the > data himself and save it in a file kind of gprof-style. > Yes, it's not intended to insert more stats, but to get the raw data out for external analysis during development and testing of applications and systems etc. Regards, Dave.
Dave Page <dpage@postgresql.org> writes: > Yes, it's not intended to insert more stats, but to get the raw data out > for external analysis during development and testing of applications and > systems etc. Mph --- the proposal was very poorly titled then. In any case, it still sounds like a one-off hack that would be equally well served by a local patch. regards, tom lane
Tom Lane wrote: > Dave Page <dpage@postgresql.org> writes: >> Yes, it's not intended to insert more stats, but to get the raw data out >> for external analysis during development and testing of applications and >> systems etc. > > Mph --- the proposal was very poorly titled then. In any case, it still > sounds like a one-off hack that would be equally well served by a local > patch. It's certainly not intended as a one-off hack, but as a way of analysing the behaviour of existing and new applications. It would certainly be useful for us in the future, and most likely others as well once a plugin or two has been released. Regards, Dave.
Dave Page <dpage@postgresql.org> writes: > Tom Lane wrote: >> Mph --- the proposal was very poorly titled then. In any case, it still >> sounds like a one-off hack that would be equally well served by a local >> patch. > It's certainly not intended as a one-off hack, but as a way of analysing > the behaviour of existing and new applications. It would certainly be > useful for us in the future, and most likely others as well once a > plugin or two has been released. I'd be more interested if I saw a believable spec for a plugin using this. So far it sounds terribly badly designed --- for starters, apparently it's intending to ignore the stats aggregation/reporting infrastructure and reinvent that wheel in some unspecified way, for unspecified reasons. If you think you can improve on that mechanism (which I'm prepared to believe, though I'd like to see some details) why wouldn't the proposed hook include a way to turn off the overhead of the regular stats collector? If that's not the point, then what is the point? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > So far it sounds terribly badly designed --- for starters, apparently it's > intending to ignore the stats aggregation/reporting infrastructure and > reinvent that wheel in some unspecified way, for unspecified reasons. One way to accomplish the original goal by using the stats aggregation/reporting infrastructure directly would be to add a stats_domain guc which stats messages get tagged with. So you could have each application domain set the guc to a different value and have the stats collector keep table stats separately for each stats_domain/table pair. That could be interesting for other purposes such as marking a single transaction with a new stats_domain so you can look at the stats for just that transaction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Dave Page <dpage@postgresql.org> writes: >> Tom Lane wrote: >>> Mph --- the proposal was very poorly titled then. In any case, it still >>> sounds like a one-off hack that would be equally well served by a local >>> patch. > >> It's certainly not intended as a one-off hack, but as a way of analysing >> the behaviour of existing and new applications. It would certainly be >> useful for us in the future, and most likely others as well once a >> plugin or two has been released. > > I'd be more interested if I saw a believable spec for a plugin using > this. So far it sounds terribly badly designed --- for starters, > apparently it's intending to ignore the stats aggregation/reporting > infrastructure and reinvent that wheel in some unspecified way, for > unspecified reasons. If you think you can improve on that mechanism > (which I'm prepared to believe, though I'd like to see some details) > why wouldn't the proposed hook include a way to turn off the overhead > of the regular stats collector? If that's not the point, then what > is the point? For any kind of design we'll need to wait for Simon, but the initial application we were discussing was to allow an unknown/closed source user app to be traced to reveal it's usage patterns for each relation in the database. By monitoring at that point we can look at data for that specific backend (on which we are appropriately exercising the app), and iirc Simon also said we could see the activity on a per-transaction basis. As a disclaimer, I reserve the right to have misremembered some of that - someone (<cough>Magnus</cough>) kept distracting me on IM during the conversation :-) Regards, Dave.
Gregory Stark <stark@enterprisedb.com> writes: > One way to accomplish the original goal by using the stats > aggregation/reporting infrastructure directly would be to add a stats_domain > guc which stats messages get tagged with. So you could have each application > domain set the guc to a different value and have the stats collector keep > table stats separately for each stats_domain/table pair. > That could be interesting for other purposes such as marking a single > transaction with a new stats_domain so you can look at the stats for > just that transaction. Hmm. That has some possibilities. You'd want a way to get the stats collector to discard a domain when you didn't want those numbers anymore, but that seems doable enough. Also, most likely you'd want the collector to keep both "global" and per-domain counters, else resetting a domain loses state --- which'd be bad from autovac's point of view, if nothing else. regards, tom lane
On Fri, 2007-06-29 at 14:43 -0400, Tom Lane wrote: > Dave Page <dpage@postgresql.org> writes: > > Yes, it's not intended to insert more stats, but to get the raw data out > > for external analysis during development and testing of applications and > > systems etc. > > Mph --- the proposal was very poorly titled then. In any case, it still > sounds like a one-off hack that would be equally well served by a local > patch. Well, I want it to a) be configurable b) provide additional stats, so the title was fine, but we can call this whatever you like; I don't have a fancy name for it. The purpose is to get access to the stats data while we still know the username, transactionId and other information. Once it is sent to the stats collector it is anonymised and summarised. Examples of the potential uses of such plug-ins would be: 1) Which tables have been touched by this transaction? The purpose of this is to profile table interactions to allow: i) an accurate assessment of the replication sets for use with Slony. If you define the replication set incorrectly then you may not be able to recover all of your data. ii) determining whether it is possible to split a database that serves two applications into two distinct databases (or not), allowing you to scale out the Data Tier in a Service Oriented Application. 2) Charge-back accounting. Keep track by userid, user group, time of access etc of all accesses to the system, so we can provide chargeback facilities to users. You can put your charging rules into the plugin and have it spit out appropriate chargeback log records, when/if required. e.g. log a chargeback record every time a transaction touches > 100 blocks, to keep track of heavy queries but ignore OLTP workloads. 3) Tracing individual transaction types, as Greg suggests. 4) Anything else you might dream up... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > 2) Charge-back accounting. Keep track by userid, user group, time of > access etc of all accesses to the system, so we can provide chargeback > facilities to users. You can put your charging rules into the plugin and > have it spit out appropriate chargeback log records, when/if required. > e.g. log a chargeback record every time a transaction touches > 100 > blocks, to keep track of heavy queries but ignore OLTP workloads. Sure, but I think Tom's question is how do you get from the plugin to wherever you want this data to be? There's not much you can do with the data at that point. You would end up having to reconstruct the entire stats collector infrastructure to ship the data you want out via some communication channel and then aggregate it somewhere else. Perhaps your plugin entry point is most useful *alongside* my stats-domain idea. If you wanted to you could write a plugin which set the stats domain based on whatever criteria you want whether that's time-of-day, userid, load on the system, etc. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Sure, but I think Tom's question is how do you get from the plugin to wherever > you want this data to be? There's not much you can do with the data at that > point. You would end up having to reconstruct the entire stats collector > infrastructure to ship the data you want out via some communication channel > and then aggregate it somewhere else. Right, and I don't see any reasonable way for a plug-in to establish such an infrastructure --- how's it going to cause the postmaster to shepherd a second stats collector process, for instance? The proposal seems to be in the very early handwaving stage, because issues like this obviously haven't been thought about. I would suggest building a working prototype plugin, and then you'll really know what hooks you need. (Comparison point: we'd never have invented the correct hooks for the index advisor if we'd tried to define them in advance of having rough working code to look at.) > Perhaps your plugin entry point is most useful *alongside* my stats-domain > idea. If you wanted to you could write a plugin which set the stats domain > based on whatever criteria you want whether that's time-of-day, userid, load > on the system, etc. +1. I'm also thinking that hooks inside the stats collector process itself might be needed, though I have no idea exactly what. regards, tom lane
On Mon, 2007-07-02 at 17:41 +0100, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > 2) Charge-back accounting. Keep track by userid, user group, time of > > access etc of all accesses to the system, so we can provide chargeback > > facilities to users. You can put your charging rules into the plugin and > > have it spit out appropriate chargeback log records, when/if required. > > e.g. log a chargeback record every time a transaction touches > 100 > > blocks, to keep track of heavy queries but ignore OLTP workloads. > > Sure, but I think Tom's question is how do you get from the plugin to wherever > you want this data to be? There's not much you can do with the data at that > point. You would end up having to reconstruct the entire stats collector > infrastructure to ship the data you want out via some communication channel > and then aggregate it somewhere else. I just want to LOG a few extra pieces of information in this simplest possible way, <sigh/> There are no more steps in that process than there are for using log_min_duration_statement and a performance analysis tool. Outside-the-dbms processing is already required to use PostgreSQL effectively, so this can't be an argument against the logging of additional stats. Logging to the dbms means we have to change table definitions etc, which will ultimately not work as well. > Perhaps your plugin entry point is most useful *alongside* my stats-domain > idea. If you wanted to you could write a plugin which set the stats domain > based on whatever criteria you want whether that's time-of-day, userid, load > on the system, etc. Your stats domain idea is great, but it doesn't solve my problem (1). I don't want this solved, I *need* it solved, since there's no other way to get this done accurately with a large and complex application. We could just go back to havinglog_tables_in_transaction = on | off which would produce output like this: LOG: transaction-id: 3456 table list {32456, 37456, 85345, 19436} I don't expect everybody to like that, but its what I want, so I'm proposing it in a way that is more acceptable. If somebody has a better way of doing this, please say. The plugin looks pretty darn simple to me... and hurts nobody. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com