Обсуждение: How could we make it simple to access the log as a table?

Поиск
Список
Период
Сортировка

How could we make it simple to access the log as a table?

От
Josh Berkus
Дата:
Hackers,

There's a lot of great information in the postgres logs.  While we
eventually want to get more sophisticated about providing users with
status and history information, for 9.3 it would be really nice to just
offer the current logs in system view format.

Certainly this can be done ad-hoc using CSV format and csv_fdw.
However, such setups are fragile due to log rotation and other issues.
It seems like we could come up with a better way.  Ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: How could we make it simple to access the log as a table?

От
Stephen Frost
Дата:
Josh,

* Josh Berkus (josh@agliodbs.com) wrote:
> Certainly this can be done ad-hoc using CSV format and csv_fdw.
> However, such setups are fragile due to log rotation and other issues.
> It seems like we could come up with a better way.  Ideas?

This is really where I was hoping to eventually get to with the logging
changes that have been discussed over the past couple of years.  We need
to have a mechanism to allow logging to different places, based on
information included in the log message and/or context, where one of
those places could potentially be some kind of receiver process that
then dumps the message into a table, either on the local system or on a
remote system.

What exactly all of that looks like and how it works is a pretty big
discussion, but I'd definitely love to hear from others who are
interested in changes to our logging infrastructure.
Thanks,
    Stephen

Re: How could we make it simple to access the log as a table?

От
David Fetter
Дата:
On Mon, May 21, 2012 at 05:53:55PM -0700, Josh Berkus wrote:
> Hackers,
> 
> There's a lot of great information in the postgres logs.  While we
> eventually want to get more sophisticated about providing users with
> status and history information, for 9.3 it would be really nice to
> just offer the current logs in system view format.
> 
> Certainly this can be done ad-hoc using CSV format and csv_fdw.
> However, such setups are fragile due to log rotation and other
> issues.  It seems like we could come up with a better way.  Ideas?

Would it help to automate this stuff with that CSV format?  The trick,
as I see it, is to make those on-disk log files be partitions of a
table.

Stephen Frost's point is well-taken, but I'm not sure we need to make
it a blocker for this.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: How could we make it simple to access the log as a table?

От
Dimitri Fontaine
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> This is really where I was hoping to eventually get to with the logging
> changes that have been discussed over the past couple of years.  We need
> to have a mechanism to allow logging to different places, based on
> information included in the log message and/or context, where one of
> those places could potentially be some kind of receiver process that
> then dumps the message into a table, either on the local system or on a
> remote system.

That looks like syslog features? 
> What exactly all of that looks like and how it works is a pretty big
> discussion, but I'd definitely love to hear from others who are
> interested in changes to our logging infrastructure.

I though about exposing the log CSV files as partitions of a log table,
each partition being a FDW able to read the file.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: How could we make it simple to access the log as a table?

От
Christopher Browne
Дата:
On Mon, May 28, 2012 at 11:39 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> This is really where I was hoping to eventually get to with the logging
>> changes that have been discussed over the past couple of years.  We need
>> to have a mechanism to allow logging to different places, based on
>> information included in the log message and/or context, where one of
>> those places could potentially be some kind of receiver process that
>> then dumps the message into a table, either on the local system or on a
>> remote system.
>
> That looks like syslog features?
>
>> What exactly all of that looks like and how it works is a pretty big
>> discussion, but I'd definitely love to hear from others who are
>> interested in changes to our logging infrastructure.
>
> I though about exposing the log CSV files as partitions of a log table,
> each partition being a FDW able to read the file.

All interesting...

We've got several models as to how logs are likely to be captured,
which mean that it'll be difficult to have one uniform answer.

1.  An easy traditional default is to capture logs in a log directory.

An FDW might be able to get at this, with the cost of some additional
configuration.  Unfortunately, that injures the simplicity of this way
of logging.

2.  Ask Syslog

My favorite way to configure *my* PG instances (e.g. - those that I
use for testing) is for them to forward messages to syslog.  That way
they, and my Slony test instances, are all logging to one common
place, rather than the logs getting strewn in a bunch of places.

An FDW that could talk to syslog would be a nifty idea, though there
are enough different syslog implementations around to, again, injure
the simplicity of this.

[Also, mumble, mumble, syslog might be forwarding to a remote server,
further complications...]

3.  Lossy logging is desired by some doing high performance systems
where they can't afford to capture everything

http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php

One approach that I know Theo has used has been to throw events onto a
Spread channel, and have a listener pulling and aggregating the events
on a best-efforts basis.  I'm not sure if I should treat that as a
separate answer, or as part of the same.

4.  For a while, I had my syslog set up to capture logs into a
Postgres table.  Very cool, but pretty big slowdown.

What's notably messy, right now, is that we've got a bunch of logging
targets where there's nothing resembling a uniform way of *accessing*
the logs.  It seems to me that the messiness and non-uniformity are
the tough part of the problem.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: How could we make it simple to access the log as a table?

От
Robert Haas
Дата:
On May 28, 2012, at 11:57 AM, Christopher Browne <cbbrowne@gmail.com> wrote:
> All interesting...
>
> We've got several models as to how logs are likely to be captured,
> which mean that it'll be difficult to have one uniform answer.
>
> 1.  An easy traditional default is to capture logs in a log directory.
>
> An FDW might be able to get at this, with the cost of some additional
> configuration.  Unfortunately, that injures the simplicity of this way
> of logging.
>
> 2.  Ask Syslog
>
> My favorite way to configure *my* PG instances (e.g. - those that I
> use for testing) is for them to forward messages to syslog.  That way
> they, and my Slony test instances, are all logging to one common
> place, rather than the logs getting strewn in a bunch of places.
>
> An FDW that could talk to syslog would be a nifty idea, though there
> are enough different syslog implementations around to, again, injure
> the simplicity of this.

What does "talk to syslog" mean in this context?  Syslog doesn't store any data; it just routes it around.

> [Also, mumble, mumble, syslog might be forwarding to a remote server,
> further complications...]
>
> 3.  Lossy logging is desired by some doing high performance systems
> where they can't afford to capture everything
>
> http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php
>
> One approach that I know Theo has used has been to throw events onto a
> Spread channel, and have a listener pulling and aggregating the events
> on a best-efforts basis.  I'm not sure if I should treat that as a
> separate answer, or as part of the same.
>
> 4.  For a while, I had my syslog set up to capture logs into a
> Postgres table.  Very cool, but pretty big slowdown.
>
> What's notably messy, right now, is that we've got a bunch of logging
> targets where there's nothing resembling a uniform way of *accessing*
> the logs.  It seems to me that the messiness and non-uniformity are
> the tough part of the problem.

Yeah, I agree.  I think what is missing here is something that can be read (and maybe indexed?) like a table, but
writtenby a pretty dumb process.  It's not terribly workable to have PG log to PG, because there are too many
situationswhere the problem you're trying to report would frustrate your attempt to report it.  At the other end of the
spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default
and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety.  The  CSV format
isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully
queryable. We need something that looks more like a big red button. 

...Robert

Re: How could we make it simple to access the log as a table?

От
Christopher Browne
Дата:
On Mon, May 28, 2012 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On May 28, 2012, at 11:57 AM, Christopher Browne <cbbrowne@gmail.com> wrote:
>> 2.  Ask Syslog
>>
>> My favorite way to configure *my* PG instances (e.g. - those that I
>> use for testing) is for them to forward messages to syslog.  That way
>> they, and my Slony test instances, are all logging to one common
>> place, rather than the logs getting strewn in a bunch of places.
>>
>> An FDW that could talk to syslog would be a nifty idea, though there
>> are enough different syslog implementations around to, again, injure
>> the simplicity of this.
>
> What does "talk to syslog" mean in this context?  Syslog doesn't store any data; it just routes it around.

Right, I guess that's a bit like saying, "let's have something
listening to procmail," when that's really just a filter.

If there was some output form that was particularly amenable to our
use, [e.g. - simple to configure via the "big red button" that you
suggest], that would be nice.

>> [Also, mumble, mumble, syslog might be forwarding to a remote server,
>> further complications...]
>>
>> 3.  Lossy logging is desired by some doing high performance systems
>> where they can't afford to capture everything
>>
>> http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php
>>
>> One approach that I know Theo has used has been to throw events onto a
>> Spread channel, and have a listener pulling and aggregating the events
>> on a best-efforts basis.  I'm not sure if I should treat that as a
>> separate answer, or as part of the same.
>>
>> 4.  For a while, I had my syslog set up to capture logs into a
>> Postgres table.  Very cool, but pretty big slowdown.
>>
>> What's notably messy, right now, is that we've got a bunch of logging
>> targets where there's nothing resembling a uniform way of *accessing*
>> the logs.  It seems to me that the messiness and non-uniformity are
>> the tough part of the problem.
>
> Yeah, I agree.  I think what is missing here is something that can be read (and maybe indexed?) like a table, but
writtenby a pretty dumb process.  It's not terribly workable to have PG log to PG, because there are too many
situationswhere the problem you're trying to report would frustrate your attempt to report it.  At the other end of the
spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default
and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety.  The  CSV format
isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully
queryable. We need something that looks more like a big red button. 

There's a case to be made for some lossier "NoSQL-y" thing here.  But
I'm not sure what size fits enough.  I hate the idea of requiring the
deployment of *another* DBMS (however "lite"), but reading from text
files isn't particularly nice either.

Perhaps push the logs into an unlogged table on an extra PG instance,
where an FDW tries to make that accessible?  A fair bit of process
needs to live behind that "big red button," and that's at least a
plausible answer.

What's needed is to figure out what restrictions are acceptable to
impose to have something that's "button-worthy."
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: How could we make it simple to access the log as a table?

От
Robert Haas
Дата:
On Mon, May 28, 2012 at 2:21 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
>> Yeah, I agree.  I think what is missing here is something that can be read (and maybe indexed?) like a table, but
writtenby a pretty dumb process.  It's not terribly workable to have PG log to PG, because there are too many
situationswhere the problem you're trying to report would frustrate your attempt to report it.  At the other end of the
spectrum,our default log format is easy to generate but (a) impoverished, not even including a time stamp by default
and(b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety.  The  CSV format
isboth rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully
queryable. We need something that looks more like a big red button. 
>
> There's a case to be made for some lossier "NoSQL-y" thing here.  But
> I'm not sure what size fits enough.  I hate the idea of requiring the
> deployment of *another* DBMS (however "lite"), but reading from text
> files isn't particularly nice either.
>
> Perhaps push the logs into an unlogged table on an extra PG instance,
> where an FDW tries to make that accessible?  A fair bit of process
> needs to live behind that "big red button," and that's at least a
> plausible answer.
>
> What's needed is to figure out what restrictions are acceptable to
> impose to have something that's "button-worthy."

I am not fired up about needing a second instance of PG; it seems to
me that that requirement by itself makes it considerably more involved
than pushing a big red button.  I agree with you that deploying
another DBMS, even a lightweight one, is also not a good solution.

As far as CSV goes, I think the biggest deficiency is that there's a
mismatch between the way that log files are typically named (e.g. one
per day, or one per hour) and the way that a CSV foreign table is
created (you've got to point it at one particular file).  Maybe we
could have a CSV reader that understands PostgreSQL-format CSV logs,
but you point it at a directory, rather than a single file, and it
reads all the CSV files in the directory.  And maybe it could also be
smart enough that if you've got a WHERE clause that filter by date, it
uses that to skip any files that can be proven irrelevant.  So the
user can just turn on CSV logging, point the FDW at the log directory,
and away they go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How could we make it simple to access the log as a table?

От
"Marc Mamin"
Дата:
> The  CSV format is both rich and
> machine-parseable (good start!) but it takes an unreasonable amount of
> work to make it usefully queryable.  We need something that looks more
> like a big red button.

Hello,

The Pg logs consist of a rich soup with many different information kind.
It's comfortable for reading to have all that stuff sequentially
available, but not as input for an ETL/reporting tool.

Moreover, there is a will to group many different sources in single
reporting tools, which may exclude Postgres as solution in many cases.

And everybody with more than 1 Postgres instance will like to have a
single point of access for all of them.

Having the logs available as Postgres table is nice, but it is only a
tiny part toward reporting
and not such complicated to implement with the current facilities,
except maybe for true time monitoring.

I think that more users would get satisfied with a  utility tool that
transform the log content into a more etl friendly format.
Otherwise every reporting or monitoring project will have to implement
the "same" data parsing again and again.

Here a few loose ideas:

This transformation could take place on the fly (e.g.:
log_destination='pg_etl.sh'), or later on at some other place, using the
csv output as source.

this "pg_etl" tool could possibly generate more than one csv logs,
e.g.:
- query_durations- checkpoint_logs

currently getting those kind of data requires string matching of the log
messages.
Alternatively, an additional "target" column in the log output would
make sense.Some  enhancements of the current format would also help.examples:- a "normed" format of the logged queries
(http://postgresql.1045698.n5.nabble.com/pg-stat-statements-with-query-t
ree-based-normalization-td4989745.html)- add placeholders for the host, ip and port in log_filename.  e.g. log_filename
='pg-$HOST_$PORT-%Y%m%d_%H%M.log'
- posssibly in the log_line_prefix & csv content too.
  - use a constant logname for the active log and add the timestamp only
when switching   e.g.  pg-myhost_3332-20120530_1200.csv        pg-myhost_3332-20120530_1300.csv
pg-myhost_3332.csv    <= current   

best regards,

Marc Mamin



Re: How could we make it simple to access the log as a table?

От
Jim Nasby
Дата:
On 5/28/12 2:55 PM, Robert Haas wrote:
> As far as CSV goes, I think the biggest deficiency is that there's a
> mismatch between the way that log files are typically named (e.g. one
> per day, or one per hour) and the way that a CSV foreign table is
> created (you've got to point it at one particular file).  Maybe we
> could have a CSV reader that understands PostgreSQL-format CSV logs,
> but you point it at a directory, rather than a single file, and it
> reads all the CSV files in the directory.  And maybe it could also be
> smart enough that if you've got a WHERE clause that filter by date, it
> uses that to skip any files that can be proven irrelevant.  So the
> user can just turn on CSV logging, point the FDW at the log directory,
> and away they go.

The idea is a really good one, except for one thing: this pattern is undoubtedly not unique to PG CSV logs.

ISTM it would be extremely useful to have a generic ability to define a filename pattern and have all files matching
thatpattern be pulled in via FDW; possibly be treating each file as a partition, possibly complete with exclusion
constraints.

ISTM that what's really needed here are a bunch of separate improvements to our logging and/or FDW, each of which will
addusefulness.
 
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net