Обсуждение: How could we make it simple to access the log as a table?
Hackers, There's a lot of great information in the postgres logs. While we eventually want to get more sophisticated about providing users with status and history information, for 9.3 it would be really nice to just offer the current logs in system view format. Certainly this can be done ad-hoc using CSV format and csv_fdw. However, such setups are fragile due to log rotation and other issues. It seems like we could come up with a better way. Ideas? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh, * Josh Berkus (josh@agliodbs.com) wrote: > Certainly this can be done ad-hoc using CSV format and csv_fdw. > However, such setups are fragile due to log rotation and other issues. > It seems like we could come up with a better way. Ideas? This is really where I was hoping to eventually get to with the logging changes that have been discussed over the past couple of years. We need to have a mechanism to allow logging to different places, based on information included in the log message and/or context, where one of those places could potentially be some kind of receiver process that then dumps the message into a table, either on the local system or on a remote system. What exactly all of that looks like and how it works is a pretty big discussion, but I'd definitely love to hear from others who are interested in changes to our logging infrastructure. Thanks, Stephen
On Mon, May 21, 2012 at 05:53:55PM -0700, Josh Berkus wrote: > Hackers, > > There's a lot of great information in the postgres logs. While we > eventually want to get more sophisticated about providing users with > status and history information, for 9.3 it would be really nice to > just offer the current logs in system view format. > > Certainly this can be done ad-hoc using CSV format and csv_fdw. > However, such setups are fragile due to log rotation and other > issues. It seems like we could come up with a better way. Ideas? Would it help to automate this stuff with that CSV format? The trick, as I see it, is to make those on-disk log files be partitions of a table. Stephen Frost's point is well-taken, but I'm not sure we need to make it a blocker for this. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Stephen Frost <sfrost@snowman.net> writes: > This is really where I was hoping to eventually get to with the logging > changes that have been discussed over the past couple of years. We need > to have a mechanism to allow logging to different places, based on > information included in the log message and/or context, where one of > those places could potentially be some kind of receiver process that > then dumps the message into a table, either on the local system or on a > remote system. That looks like syslog features? > What exactly all of that looks like and how it works is a pretty big > discussion, but I'd definitely love to hear from others who are > interested in changes to our logging infrastructure. I though about exposing the log CSV files as partitions of a log table, each partition being a FDW able to read the file. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, May 28, 2012 at 11:39 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> This is really where I was hoping to eventually get to with the logging >> changes that have been discussed over the past couple of years. We need >> to have a mechanism to allow logging to different places, based on >> information included in the log message and/or context, where one of >> those places could potentially be some kind of receiver process that >> then dumps the message into a table, either on the local system or on a >> remote system. > > That looks like syslog features? > >> What exactly all of that looks like and how it works is a pretty big >> discussion, but I'd definitely love to hear from others who are >> interested in changes to our logging infrastructure. > > I though about exposing the log CSV files as partitions of a log table, > each partition being a FDW able to read the file. All interesting... We've got several models as to how logs are likely to be captured, which mean that it'll be difficult to have one uniform answer. 1. An easy traditional default is to capture logs in a log directory. An FDW might be able to get at this, with the cost of some additional configuration. Unfortunately, that injures the simplicity of this way of logging. 2. Ask Syslog My favorite way to configure *my* PG instances (e.g. - those that I use for testing) is for them to forward messages to syslog. That way they, and my Slony test instances, are all logging to one common place, rather than the logs getting strewn in a bunch of places. An FDW that could talk to syslog would be a nifty idea, though there are enough different syslog implementations around to, again, injure the simplicity of this. [Also, mumble, mumble, syslog might be forwarding to a remote server, further complications...] 3. Lossy logging is desired by some doing high performance systems where they can't afford to capture everything http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php One approach that I know Theo has used has been to throw events onto a Spread channel, and have a listener pulling and aggregating the events on a best-efforts basis. I'm not sure if I should treat that as a separate answer, or as part of the same. 4. For a while, I had my syslog set up to capture logs into a Postgres table. Very cool, but pretty big slowdown. What's notably messy, right now, is that we've got a bunch of logging targets where there's nothing resembling a uniform way of *accessing* the logs. It seems to me that the messiness and non-uniformity are the tough part of the problem. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On May 28, 2012, at 11:57 AM, Christopher Browne <cbbrowne@gmail.com> wrote: > All interesting... > > We've got several models as to how logs are likely to be captured, > which mean that it'll be difficult to have one uniform answer. > > 1. An easy traditional default is to capture logs in a log directory. > > An FDW might be able to get at this, with the cost of some additional > configuration. Unfortunately, that injures the simplicity of this way > of logging. > > 2. Ask Syslog > > My favorite way to configure *my* PG instances (e.g. - those that I > use for testing) is for them to forward messages to syslog. That way > they, and my Slony test instances, are all logging to one common > place, rather than the logs getting strewn in a bunch of places. > > An FDW that could talk to syslog would be a nifty idea, though there > are enough different syslog implementations around to, again, injure > the simplicity of this. What does "talk to syslog" mean in this context? Syslog doesn't store any data; it just routes it around. > [Also, mumble, mumble, syslog might be forwarding to a remote server, > further complications...] > > 3. Lossy logging is desired by some doing high performance systems > where they can't afford to capture everything > > http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php > > One approach that I know Theo has used has been to throw events onto a > Spread channel, and have a listener pulling and aggregating the events > on a best-efforts basis. I'm not sure if I should treat that as a > separate answer, or as part of the same. > > 4. For a while, I had my syslog set up to capture logs into a > Postgres table. Very cool, but pretty big slowdown. > > What's notably messy, right now, is that we've got a bunch of logging > targets where there's nothing resembling a uniform way of *accessing* > the logs. It seems to me that the messiness and non-uniformity are > the tough part of the problem. Yeah, I agree. I think what is missing here is something that can be read (and maybe indexed?) like a table, but writtenby a pretty dumb process. It's not terribly workable to have PG log to PG, because there are too many situationswhere the problem you're trying to report would frustrate your attempt to report it. At the other end of the spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety. The CSV format isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully queryable. We need something that looks more like a big red button. ...Robert
On Mon, May 28, 2012 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On May 28, 2012, at 11:57 AM, Christopher Browne <cbbrowne@gmail.com> wrote: >> 2. Ask Syslog >> >> My favorite way to configure *my* PG instances (e.g. - those that I >> use for testing) is for them to forward messages to syslog. That way >> they, and my Slony test instances, are all logging to one common >> place, rather than the logs getting strewn in a bunch of places. >> >> An FDW that could talk to syslog would be a nifty idea, though there >> are enough different syslog implementations around to, again, injure >> the simplicity of this. > > What does "talk to syslog" mean in this context? Syslog doesn't store any data; it just routes it around. Right, I guess that's a bit like saying, "let's have something listening to procmail," when that's really just a filter. If there was some output form that was particularly amenable to our use, [e.g. - simple to configure via the "big red button" that you suggest], that would be nice. >> [Also, mumble, mumble, syslog might be forwarding to a remote server, >> further complications...] >> >> 3. Lossy logging is desired by some doing high performance systems >> where they can't afford to capture everything >> >> http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php >> >> One approach that I know Theo has used has been to throw events onto a >> Spread channel, and have a listener pulling and aggregating the events >> on a best-efforts basis. I'm not sure if I should treat that as a >> separate answer, or as part of the same. >> >> 4. For a while, I had my syslog set up to capture logs into a >> Postgres table. Very cool, but pretty big slowdown. >> >> What's notably messy, right now, is that we've got a bunch of logging >> targets where there's nothing resembling a uniform way of *accessing* >> the logs. It seems to me that the messiness and non-uniformity are >> the tough part of the problem. > > Yeah, I agree. I think what is missing here is something that can be read (and maybe indexed?) like a table, but writtenby a pretty dumb process. It's not terribly workable to have PG log to PG, because there are too many situationswhere the problem you're trying to report would frustrate your attempt to report it. At the other end of the spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety. The CSV format isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully queryable. We need something that looks more like a big red button. There's a case to be made for some lossier "NoSQL-y" thing here. But I'm not sure what size fits enough. I hate the idea of requiring the deployment of *another* DBMS (however "lite"), but reading from text files isn't particularly nice either. Perhaps push the logs into an unlogged table on an extra PG instance, where an FDW tries to make that accessible? A fair bit of process needs to live behind that "big red button," and that's at least a plausible answer. What's needed is to figure out what restrictions are acceptable to impose to have something that's "button-worthy." -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Mon, May 28, 2012 at 2:21 PM, Christopher Browne <cbbrowne@gmail.com> wrote: >> Yeah, I agree. I think what is missing here is something that can be read (and maybe indexed?) like a table, but writtenby a pretty dumb process. It's not terribly workable to have PG log to PG, because there are too many situationswhere the problem you're trying to report would frustrate your attempt to report it. At the other end of the spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety. The CSV format isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully queryable. We need something that looks more like a big red button. > > There's a case to be made for some lossier "NoSQL-y" thing here. But > I'm not sure what size fits enough. I hate the idea of requiring the > deployment of *another* DBMS (however "lite"), but reading from text > files isn't particularly nice either. > > Perhaps push the logs into an unlogged table on an extra PG instance, > where an FDW tries to make that accessible? A fair bit of process > needs to live behind that "big red button," and that's at least a > plausible answer. > > What's needed is to figure out what restrictions are acceptable to > impose to have something that's "button-worthy." I am not fired up about needing a second instance of PG; it seems to me that that requirement by itself makes it considerably more involved than pushing a big red button. I agree with you that deploying another DBMS, even a lightweight one, is also not a good solution. As far as CSV goes, I think the biggest deficiency is that there's a mismatch between the way that log files are typically named (e.g. one per day, or one per hour) and the way that a CSV foreign table is created (you've got to point it at one particular file). Maybe we could have a CSV reader that understands PostgreSQL-format CSV logs, but you point it at a directory, rather than a single file, and it reads all the CSV files in the directory. And maybe it could also be smart enough that if you've got a WHERE clause that filter by date, it uses that to skip any files that can be proven irrelevant. So the user can just turn on CSV logging, point the FDW at the log directory, and away they go. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> The CSV format is both rich and > machine-parseable (good start!) but it takes an unreasonable amount of > work to make it usefully queryable. We need something that looks more > like a big red button. Hello, The Pg logs consist of a rich soup with many different information kind. It's comfortable for reading to have all that stuff sequentially available, but not as input for an ETL/reporting tool. Moreover, there is a will to group many different sources in single reporting tools, which may exclude Postgres as solution in many cases. And everybody with more than 1 Postgres instance will like to have a single point of access for all of them. Having the logs available as Postgres table is nice, but it is only a tiny part toward reporting and not such complicated to implement with the current facilities, except maybe for true time monitoring. I think that more users would get satisfied with a utility tool that transform the log content into a more etl friendly format. Otherwise every reporting or monitoring project will have to implement the "same" data parsing again and again. Here a few loose ideas: This transformation could take place on the fly (e.g.: log_destination='pg_etl.sh'), or later on at some other place, using the csv output as source. this "pg_etl" tool could possibly generate more than one csv logs, e.g.: - query_durations- checkpoint_logs currently getting those kind of data requires string matching of the log messages. Alternatively, an additional "target" column in the log output would make sense.Some enhancements of the current format would also help.examples:- a "normed" format of the logged queries (http://postgresql.1045698.n5.nabble.com/pg-stat-statements-with-query-t ree-based-normalization-td4989745.html)- add placeholders for the host, ip and port in log_filename. e.g. log_filename ='pg-$HOST_$PORT-%Y%m%d_%H%M.log' - posssibly in the log_line_prefix & csv content too. - use a constant logname for the active log and add the timestamp only when switching e.g. pg-myhost_3332-20120530_1200.csv pg-myhost_3332-20120530_1300.csv pg-myhost_3332.csv <= current best regards, Marc Mamin
On 5/28/12 2:55 PM, Robert Haas wrote: > As far as CSV goes, I think the biggest deficiency is that there's a > mismatch between the way that log files are typically named (e.g. one > per day, or one per hour) and the way that a CSV foreign table is > created (you've got to point it at one particular file). Maybe we > could have a CSV reader that understands PostgreSQL-format CSV logs, > but you point it at a directory, rather than a single file, and it > reads all the CSV files in the directory. And maybe it could also be > smart enough that if you've got a WHERE clause that filter by date, it > uses that to skip any files that can be proven irrelevant. So the > user can just turn on CSV logging, point the FDW at the log directory, > and away they go. The idea is a really good one, except for one thing: this pattern is undoubtedly not unique to PG CSV logs. ISTM it would be extremely useful to have a generic ability to define a filename pattern and have all files matching thatpattern be pulled in via FDW; possibly be treating each file as a partition, possibly complete with exclusion constraints. ISTM that what's really needed here are a bunch of separate improvements to our logging and/or FDW, each of which will addusefulness. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net