Обсуждение: On Logging

Поиск
Список
Период
Сортировка

On Logging

От
David Fetter
Дата:
Folks,

I've run into something that concerns me.  It's pretty much an 8.2
issue, but I'm hoping to stimulate some discussion on it.  It's
PostgreSQL's log files.  Right now, they're (sometimes just barely ;)
human-readable, but they take significant effort to parse.  For
example, pqa, a very clever piece of code, is mostly devoted to
parsing said files and works only with significant tweaking and
restrictions on log file formats in 8.0.

Simple logging is a default that should probably not change, but I'm
thinking that for people who want to find something out from the logs,
we could see about a kind of plugin architecture which would enable
things like:

* CSV
* YAML
* XML
* Piped logs, as Apache can do
* DB handle.  I know this one will be controversial.

I'm thinking that a GUC variable (or should there be a class of them?)
called log_format would be part of the user interface to this and
would be able to switch from the cheap default code path to one that's
more expensive, just as log_statement does.

So, a few questions:

1.  Am I the only one who would wants an option for machine-readable logs?
2.  Am I way off with the idea for an architecture for same?
3.  What big things am I missing here?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: On Logging

От
Christopher Petrilli
Дата:
On 9/26/05, David Fetter <david@fetter.org> wrote:
> I've run into something that concerns me.  It's pretty much an 8.2
> issue, but I'm hoping to stimulate some discussion on it.  It's
> PostgreSQL's log files.  Right now, they're (sometimes just barely ;)
> human-readable, but they take significant effort to parse.  For
> example, pqa, a very clever piece of code, is mostly devoted to
> parsing said files and works only with significant tweaking and
> restrictions on log file formats in 8.0.

In a previous life (oh, like 6 months ago), I spent all my time
working on parsing log files from dozens of different software
products, and I learned something that made parsing some files orders
of magnitude easier than others:
   Always use message codes.

Cisco does this, and it helps a lot. A few other vendors do this, and
it helps a lot. While this might seem an old mainframeism, it's
terribly useful to have something at the beginning that tells you what
the message is, what it means, and most importantly, how to parse the
rest.

I would be happy to help create this catalog, though it's definately a
big step to implement. It would also require identifying every message
that could be generated -- something few open source projects do, but
it is critical to those of us who have to process the output!

> Simple logging is a default that should probably not change, but I'm
> thinking that for people who want to find something out from the logs,
> we could see about a kind of plugin architecture which would enable
> things like:
>
> * CSV

CSV is the best format, ever. Trivially simple to parse, it requires
no extra processing so long as you abide by a few extra rules, such as
escaping.

> * YAML

Nice, but I think perhaps not the best format for logging.  It's more
of a configuration file format in my mind, and it requires a bit more
oompf to parse.  Not going to happen in AWK. :-)

> * Piped logs, as Apache can do

Useful, but doesn't create any new capabilities, just simplifies some
of them.  Focus on "new capabilities" first, then added functionality
if required.

> * DB handle.  I know this one will be controversial.

I can't imagine why. :-)

> 1.  Am I the only one who would wants an option for machine-readable logs?

Not likely. I'd love it. It makes monitoring and reporting easier.

Chris
--
| Christopher Petrilli
| petrilli@gmail.com


Re: On Logging

От
Andrew Dunstan
Дата:

David Fetter wrote:

>
>Simple logging is a default that should probably not change, but I'm
>thinking that for people who want to find something out from the logs,
>we could see about a kind of plugin architecture which would enable
>things like:
>
>* CSV
>* YAML
>* XML
>* Piped logs, as Apache can do
>* DB handle.  I know this one will be controversial.
>
>
>  
>

This list doesn't seem to be to be all in the same category. The first 3 
concern format, the last 2 concern destination (and as such probably 
don't belong in this discussion)

ISTM what we need is a proposal for an abstract structure that will 
account for all the possible logging messages. i.e. the important issue 
is not what structuring mechanism is used, but what structure it 
reflects. For example, we might decide that there are 10 message types 
and the each has certain fields.

(And much as I know you like YAML, I don't think its use is sufficiently 
widespread to belong here anyway).

cheers

andrew


Re: On Logging

От
Bruce Momjian
Дата:
Interesting. I am thinking we could put markers like '|' in the log
output, and then have some secondary process either remove them or add
special formatting to match the requested output format.

---------------------------------------------------------------------------

David Fetter wrote:
> Folks,
> 
> I've run into something that concerns me.  It's pretty much an 8.2
> issue, but I'm hoping to stimulate some discussion on it.  It's
> PostgreSQL's log files.  Right now, they're (sometimes just barely ;)
> human-readable, but they take significant effort to parse.  For
> example, pqa, a very clever piece of code, is mostly devoted to
> parsing said files and works only with significant tweaking and
> restrictions on log file formats in 8.0.
> 
> Simple logging is a default that should probably not change, but I'm
> thinking that for people who want to find something out from the logs,
> we could see about a kind of plugin architecture which would enable
> things like:
> 
> * CSV
> * YAML
> * XML
> * Piped logs, as Apache can do
> * DB handle.  I know this one will be controversial.
> 
> I'm thinking that a GUC variable (or should there be a class of them?)
> called log_format would be part of the user interface to this and
> would be able to switch from the cheap default code path to one that's
> more expensive, just as log_statement does.
> 
> So, a few questions:
> 
> 1.  Am I the only one who would wants an option for machine-readable logs?
> 2.  Am I way off with the idea for an architecture for same?
> 3.  What big things am I missing here?
> 
> Cheers,
> D
> -- 
> David Fetter david@fetter.org http://fetter.org/
> phone: +1 510 893 6100   mobile: +1 415 235 3778
> 
> Remember to vote!
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: On Logging

От
Andreas Pflug
Дата:
David Fetter wrote:
> Folks,
> 
> I've run into something that concerns me.  It's pretty much an 8.2
> issue, but I'm hoping to stimulate some discussion on it.  It's
> PostgreSQL's log files.  Right now, they're (sometimes just barely ;)
> human-readable, but they take significant effort to parse.  For
> example, pqa, a very clever piece of code, is mostly devoted to
> parsing said files and works only with significant tweaking and
> restrictions on log file formats in 8.0.
> 
> Simple logging is a default that should probably not change, but I'm
> thinking that for people who want to find something out from the logs,
> we could see about a kind of plugin architecture which would enable
> things like:

There are two other restrictions about the log files:
- There's no means of restricting logging on some patterns (e.g. 
specific backends only, certain clients, certain events except for 
log_duration)
- query is truncated due to UDP restrictions.

I'd call this not necessarily a logging issue, but a profiling issue. I 
regularly use MSSQL's profiler to tap an application's query traffic, to 
find out what's going on, and I'd like the same feature on pgsql.

This issue comes up on -hackers regularly, e.g. named logging to 
tables/logging as inserts, and several others (I can cite them if 
necessary).

What I'd like is an extended logging/profiling facility that can be 
en/disabled with finer granularity (performance/data volume issues), 
going to an intermediate file/whatever and regularly converted to table 
data for easier evaluation (which would fix the format question in the 
most pgsql like way).

Regards,
Andreas


Re: On Logging

От
Andreas Pflug
Дата:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>- query is truncated due to UDP restrictions.
> 
> 
> Are you confusing the logs with pg_stat_activity?

Not confused. I'm talking about the case where statement logging is 
enabled, I could have mentioned that...


Regards,
Andreas


Re: On Logging

От
Ron Mayer
Дата:
David Fetter wrote:
> ...log file formats in 8.0....
> 
> * CSV
> * YAML
> * XML
> * Piped logs, as Apache can do
> * DB handle.  I know this one will be controversial.
> [...]
> 1.  Am I the only one who would wants an option for machine-readable logs?

I'd very much like a format that can be easily loaded into
a database (not necessarily the same one producing the logs :-) )
in real time and/or be visible as a table through something
like dbi-link.

I suppose any of the first three formats you suggest could work
with dbi-link; or another alternate format  * sql insert statements
would work if piped logs were supported by sending it to psql.


Re: On Logging

От
Tom Lane
Дата:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> - query is truncated due to UDP restrictions.

Are you confusing the logs with pg_stat_activity?
        regards, tom lane


Re: On Logging

От
David Fetter
Дата:
On Mon, Sep 26, 2005 at 01:13:08PM -0400, Christopher Petrilli wrote:
> On 9/26/05, David Fetter <david@fetter.org> wrote:
> > I've run into something that concerns me.  It's pretty much an 8.2
> > issue, but I'm hoping to stimulate some discussion on it.  It's
> > PostgreSQL's log files.  Right now, they're (sometimes just barely
> > ;) human-readable, but they take significant effort to parse.  For
> > example, pqa, a very clever piece of code, is mostly devoted to
> > parsing said files and works only with significant tweaking and
> > restrictions on log file formats in 8.0.
> 
> In a previous life (oh, like 6 months ago), I spent all my time
> working on parsing log files from dozens of different software
> products, and I learned something that made parsing some files
> orders of magnitude easier than others:
> 
>     Always use message codes.

Could you elucidate a bit on this as to how this might affect
PostgreSQL logging?

> Cisco does this, and it helps a lot.  A few other vendors do this,
> and it helps a lot.  While this might seem an old mainframeism,
^^^^^^^^^^^^^^^^^
You say that like it's a *bad* thing.  I think some fruitful
communication is possible and has been missed over the decades between
mainframe people and *n*x people.  The same applies to supercomputing
people and *n*x people, but that's a story for another day.

> it's terribly useful to have something at the beginning that tells
> you what the message is, what it means, and most importantly, how to
> parse the rest.

OK

> I would be happy to help create this catalog, though it's definately
> a big step to implement.  It would also require identifying every
> message that could be generated -- something few open source
> projects do, but it is critical to those of us who have to process
> the output!

Right.  How big a project is this, and what kind of framework would we
need in order assure that new messages come with new message codes?

> > Simple logging is a default that should probably not change, but
> > I'm thinking that for people who want to find something out from
> > the logs, we could see about a kind of plugin architecture which
> > would enable things like:
> >
> > * CSV
> 
> CSV is the best format, ever.  Trivially simple to parse, it
> requires no extra processing so long as you abide by a few extra
> rules, such as escaping.

I agree that it's nice, but seeing as how many smart people have
stubbed their toes on the various incarnations of "CSV," I must
disagree as to its simplicity.

> > * YAML
> 
> Nice, but I think perhaps not the best format for logging.  It's
> more of a configuration file format in my mind, and it requires a
> bit more oompf to parse.  Not going to happen in AWK. :-)

It's not bad for logging, partly because it's a lot fewer bytes than
XML or SGML, but it maintains a structure.  Of course, it's not as
"simple" in some sense as CSV.

> > * Piped logs, as Apache can do
> 
> Useful, but doesn't create any new capabilities, just simplifies
> some of them.  Focus on "new capabilities" first, then added
> functionality if required.

Fair enough :)

> > * DB handle.  I know this one will be controversial.
> 
> I can't imagine why. :-)

Heh

> > 1.  Am I the only one who would wants an option for machine-readable
> > logs?
> 
> Not likely.  I'd love it.  It makes monitoring and reporting easier.

That's where I've run across this :)

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: On Logging

От
"Jim C. Nasby"
Дата:
On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote:
> David Fetter wrote:
> >...log file formats in 8.0....
> >
> >* CSV
> >* YAML
> >* XML
> >* Piped logs, as Apache can do
> >* DB handle.  I know this one will be controversial.
> >[...]
> >1.  Am I the only one who would wants an option for machine-readable logs?
> 
> I'd very much like a format that can be easily loaded into
> a database (not necessarily the same one producing the logs :-) )
> in real time and/or be visible as a table through something
> like dbi-link.
> 
> I suppose any of the first three formats you suggest could work
> with dbi-link; or another alternate format
>   * sql insert statements
> would work if piped logs were supported by sending it to psql.

Apache seems to have the best, most flexible logging of anything out
there, and should probably be used as a model. It's pretty easy to have
it actually log to a database.

Whatever method we decide on, I think it would be very useful if we
supported multiple logging streams. I certainly wouldn't want to give up
a human-readable log to get a CSV one.

Is a logging mechanism the best way to do profiling? Seems like it might
be better to have a more efficient, dedicated method. But I'm not
against adding capabilities like per-backend logging, etc.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: On Logging

От
David Fetter
Дата:
On Fri, Sep 30, 2005 at 05:54:49PM -0500, Jim C. Nasby wrote:
> On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote:
> > David Fetter wrote:
> > >...log file formats in 8.0....
> > >
> > >* CSV
> > >* YAML
> > >* XML
> > >* Piped logs, as Apache can do
> > >* DB handle.  I know this one will be controversial.
> > >[...]
> > >1.  Am I the only one who would wants an option for
> > >machine-readable logs?
> > 
> > I'd very much like a format that can be easily loaded into a
> > database (not necessarily the same one producing the logs :-) ) in
> > real time and/or be visible as a table through something like
> > dbi-link.
> > 
> > I suppose any of the first three formats you suggest could work
> > with dbi-link; or another alternate format
> >   * sql insert statements
> > would work if piped logs were supported by sending it to psql.
> 
> Apache seems to have the best, most flexible logging of anything out
> there, and should probably be used as a model.  It's pretty easy to
> have it actually log to a database.

Great :)

> Whatever method we decide on, I think it would be very useful if we
> supported multiple logging streams.  I certainly wouldn't want to
> give up a human-readable log to get a CSV one.

Excellent idea.

> Is a logging mechanism the best way to do profiling?  Seems like it
> might be better to have a more efficient, dedicated method.

I'm not totally confident in my ability to think up everything I'd
want to look at, every way I'd want to look at it in advance.  I'm
pretty sure I can't come up with every way everyone could want to do
profiling, though.

> But I'm not against adding capabilities like per-backend logging,
> etc.

How would per-backend logging work?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: On Logging

От
"Jim C. Nasby"
Дата:
On Fri, Sep 30, 2005 at 06:24:17PM -0700, David Fetter wrote:
> How would per-backend logging work?

I'd suggest having settings for a per-backend 'debug' logging mode that
could be triggered either via a SQL command or a signal to the backend.
It would be useful to be able to log this to a seperate area, based
either on PID or some identifier passed to the sql command. I think this
would cover two use cases:

long-running process that's on one back-end that you want info on (send
signal to that backend)

something using a connection pool. You'd have some one to tell the
application to enable logging based on some set of conditions. When
those conditions were met, logging would be turned on. When a connection
is first grabbed from the connection pool, logging would be forced to
off in case it had been turned on by a previous process (which might
have just disconnected suddenly).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461