Обсуждение: Proposal: Adding json logging

Поиск
Список
Период
Сортировка

Proposal: Adding json logging

От
David Arnold
Дата:
Hello,

I'm new here. I'm David and would describe myself as an ambitious newbie, so please take my suggestion with a grain of salt.

Use case:
I find it difficult to properly parse postgres logs into some kind of log aggregator (I use fluent bit). My two standard option are standard and csvlog.

I have reviewed some log samples and all DO contain some kind of multi line logs which are very uncomfortable to parse reliably in a log streamer.

I asked Michael Paquier about his solution: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
He was suggestion to take action and propose this extension again to be included in contrib:

He mentioned the argument was rised of taking too much place.
This is true under the paradigm that logs are consumed by TTY or grep, however, if those logs are to be stored in a logging solution, this is not really of concern.

Please let me know if you need more context on my use case.

That beeing said the proposal is to accept this library into postgres contrib.

Please let me know, if I should prepare a patch.

Best Regards,

David A.
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Michael Paquier
Дата:
On Sat, Apr 14, 2018 at 12:00:16AM +0000, David Arnold wrote:
> I'm new here. I'm David and would describe myself as an ambitious newbie,
> so please take my suggestion with a grain of salt.

Welcome here.

> I asked Michael Paquier about his solution:
> https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
> He was suggestion to take action and propose this extension again to be
> included in contrib:
> https://github.com/michaelpq/pg_plugins/issues/24
>
> He mentioned the argument was rised of taking too much place.
> This is true under the paradigm that logs are consumed by TTY or grep,
> however, if those logs are to be stored in a logging solution, this is not
> really of concern.

Here are the exact same words I used on this github thread to avoid
confusion:
"I proposed that a couple of years back, to be rejected as the key names
are too much repetitive and take too much place. I have personally plans
to work on other things, so if anybody wishes to take this code and send
a proposal upstream, feel free to! The code is under PostgreSQL license
and I am fine if a patch is proposed even with this code taken."

I am not sure that the concerns expressed back on community-side have
changed.  I cannot put back my finger on the -hackers thread where this
has been discussed by the way, the extra log volume caused by repetitive
key names was one.

> Please let me know if you need more context on my use case.
>
> That beeing said the proposal is to accept this library into postgres
> contrib.
>
> Please let me know, if I should prepare a patch.

It is better to gather opinions before delivering a patch.  If there is
consensus that people would like to have an in-core option to allow logs
in json format, for which I am sure that folks would *not* want a
contrib/ plugin but something as an extension of log_destination, then
of course you could move ahead and propose a patch.  Of course feel free
to reuse any code in my module if that helps!  It is released under
PostgreSQL license as well.

Please note two things though:
- Patch submission follows a particular flow, be sure to read those
notes:
https://wiki.postgresql.org/wiki/Submitting_a_Patch
- Once you have a patch, you need to send it to a commit fest, for which
the next one will likely be next September (precise schedule will be
finalized at the end of May at PGCon) for the beginning of development
of Postgres 12.  The development of Postgres 11 has just finished, so
the focus is to stabilize the release first, which consists in testing
and double-checking that everything which has been merged is stable.  So
please do not expect immediate feedback on any patch you send.

Thanks,
--
Michael

Вложения

Re: Proposal: Adding json logging

От
Craig Ringer
Дата:
On 14 April 2018 at 11:24, Michael Paquier <michael@paquier.xyz> wrote:

> "I proposed that a couple of years back, to be rejected as the key names
> are too much repetitive and take too much place.

gzip is astonishingly good at dealing with that, so I think that's
actually a bit of a silly reason to block it.

Plus it's likely only a short-lived interchange format, not something
to be retained for a long period.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: Adding json logging

От
David Arnold
Дата:
> Plus it's likely only a short-lived interchange format, not something to be retained for a long period.

Absolutely.

There might be an argument that it's not easy on the eyes in the case it would be consumed by a pair of them. It's absolutely valid. Golang community has found a solution for that called logfmt, which I personally appreciate.

It's somewhat similar to JSON, but a lot easier on the eyes, so if logs go to the stdout of a docker container and are forwarded afterwards, you still can attach to the live container logs and actually understand something.

If it's for that reason, logfmt is possibly preferable and there is already a lot of standard tooling available for it.

Any thoughts on that argument?

Best Regards


El sáb., 14 abr. 2018, 7:59 a.m., Craig Ringer <craig@2ndquadrant.com> escribió:
On 14 April 2018 at 11:24, Michael Paquier <michael@paquier.xyz> wrote:

> "I proposed that a couple of years back, to be rejected as the key names
> are too much repetitive and take too much place.

gzip is astonishingly good at dealing with that, so I think that's
actually a bit of a silly reason to block it.

Plus it's likely only a short-lived interchange format, not something
to be retained for a long period.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Proposal: Adding json logging

От
David Fetter
Дата:
On Sat, Apr 14, 2018 at 03:27:58PM +0000, David Arnold wrote:
> > Plus it's likely only a short-lived interchange format, not something to be
> retained for a long period.
> 
> Absolutely.
> 
> There might be an argument that it's not easy on the eyes in the case it
> would be consumed by a pair of them. It's absolutely valid. Golang
> community has found a solution for that called logfmt, which I personally
> appreciate.

I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable logs. We
already have human-eye-consumable logs by default.  What we don't
have, and increasingly do want, is a log format that's really easy on
machines.

As to logfmt in particular, the fact that it's not standardized is
probably a show-stopper.

Let's go with JSON.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposal: Adding json logging

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> I think a suite of json_to_* utilities would be a good bit more
> helpful in this regard than changing our human-eye-consumable logs. We
> already have human-eye-consumable logs by default.  What we don't
> have, and increasingly do want, is a log format that's really easy on
> machines.

I'm dubious that JSON is "easier on machines" than CSV.

            regards, tom lane


Re: Proposal: Adding json logging

От
David Arnold
Дата:
>I'm dubious that JSON is "easier on machines" than CSV.

Under common paradigms you are right, but if we talk of line-by-line streaming with subsequent processing, then it's a show stopper. Of course, some log aggregators have buffers for that and can do Multiline parsing on that buffer, but 
1. Not all log aggregators support it
2. Building a parser which reliably detects Multiline logs AND is easy on resources is probably not something a normal person can achieve quickly.

So normally CSV is fine but for log streaming it's not the best, nor the most standard compliant way.

El sáb., 14 abr. 2018, 10:51 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
David Fetter <david@fetter.org> writes:
> I think a suite of json_to_* utilities would be a good bit more
> helpful in this regard than changing our human-eye-consumable logs. We
> already have human-eye-consumable logs by default.  What we don't
> have, and increasingly do want, is a log format that's really easy on
> machines.

I'm dubious that JSON is "easier on machines" than CSV.

                        regards, tom lane

Re: Proposal: Adding json logging

От
David Fetter
Дата:
On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > I think a suite of json_to_* utilities would be a good bit more
> > helpful in this regard than changing our human-eye-consumable
> > logs. We already have human-eye-consumable logs by default.  What
> > we don't have, and increasingly do want, is a log format that's
> > really easy on machines.
> 
> I'm dubious that JSON is "easier on machines" than CSV.

I've found the opposite.

CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.

Best,
David.

[1] These are mostly the lack of comments and of some useful data
types like large integers, floats, and ISO-8601 dates.  PostgreSQL
continues to share that last.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposal: Adding json logging

От
David Arnold
Дата:
>As to logfmt in particular, the fact that it's not standardized is probably a show-stopper.
>Let's go with JSON.

I Agree. Though I don't want to deprecate the idea of logfmt enterly, yet. In container infrastructure it's a defacto standard and it solves a real problem. But I'm in favor to step back with that idea in favour of prioritizing JSON.

El sáb., 14 abr. 2018, 11:03 a.m., David Arnold <dar@xoe.solutions> escribió:
>I'm dubious that JSON is "easier on machines" than CSV.

Under common paradigms you are right, but if we talk of line-by-line streaming with subsequent processing, then it's a show stopper. Of course, some log aggregators have buffers for that and can do Multiline parsing on that buffer, but 
1. Not all log aggregators support it
2. Building a parser which reliably detects Multiline logs AND is easy on resources is probably not something a normal person can achieve quickly.

So normally CSV is fine but for log streaming it's not the best, nor the most standard compliant way.

El sáb., 14 abr. 2018, 10:51 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
David Fetter <david@fetter.org> writes:
> I think a suite of json_to_* utilities would be a good bit more
> helpful in this regard than changing our human-eye-consumable logs. We
> already have human-eye-consumable logs by default.  What we don't
> have, and increasingly do want, is a log format that's really easy on
> machines.

I'm dubious that JSON is "easier on machines" than CSV.

                        regards, tom lane

Re: Proposal: Adding json logging

От
David Arnold
Дата:
Given we have the following LOG_DESTIONATION...


/* Log destination bitmap */
#define LOG_DESTINATION_STDERR 1
#define LOG_DESTINATION_SYSLOG 2
#define LOG_DESTINATION_EVENTLOG 4
#define LOG_DESTINATION_CSVLOG 8


Something confuses me about CSVLOG...
Isn't log destination and log formatting tow different kinds? How to deal with that mix?
I was somewhat expecting to find a log formatting hook somewhere around, but it seems more complicated than that.

El sáb., 14 abr. 2018 a las 11:51, Chapman Flack (<chap@anastigmatix.net>) escribió:
On 04/14/18 12:05, David Fetter wrote:
> On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
>> I'm dubious that JSON is "easier on machines" than CSV.
>
> I've found the opposite.
>
> CSV is very poorly specified, which makes it at best complicated to
> build correct parsing libraries.

I was just about to say the same thing. Based on my experience, I can infer
the history of CSV as a format was something like this:

"we'll use commas to separate the values"

- some implementations released

"but what if a value has a comma?"

- some new implementations released

"what if it has a quote?"

- some newer implementations released

"a newline?"

- ...

> JSON, whatever gripes I have about
> the format[1] is extremely well specified, and hence has excellent
> parsing libraries.

It has, if nothing else, the benefit of coming around later and seeing
what happened with CSV.

-Chap
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Andres Freund
Дата:
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
> On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
> > David Fetter <david@fetter.org> writes:
> > > I think a suite of json_to_* utilities would be a good bit more
> > > helpful in this regard than changing our human-eye-consumable
> > > logs. We already have human-eye-consumable logs by default.  What
> > > we don't have, and increasingly do want, is a log format that's
> > > really easy on machines.
> > 
> > I'm dubious that JSON is "easier on machines" than CSV.
> 
> I've found the opposite.
> 
> CSV is very poorly specified, which makes it at best complicated to
> build correct parsing libraries. JSON, whatever gripes I have about
> the format[1] is extremely well specified, and hence has excellent
> parsing libraries.

Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format where newlines
have non-standard record separator meaning.

Greetings,

Andres Freund


Re: Proposal: Adding json logging

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-04-14 18:05:18 +0200, David Fetter wrote:
>> CSV is very poorly specified, which makes it at best complicated to
>> build correct parsing libraries. JSON, whatever gripes I have about
>> the format[1] is extremely well specified, and hence has excellent
>> parsing libraries.

> Worth to notice that useful json formats for logging also kinda don't
> follow standards. Either you end up with entire logfiles as one big
> array, which most libraries won't parse and makes logrotate etc really
> complicated, or you end up with some easy to parse format where newlines
> have non-standard record separator meaning.

Hmm .. that, actually, seems like a pretty serious objection.  If the beef
with CSV is that it's poorly specified and inconsistently implemented
(which is surely true), then using some nonstandard variant of JSON
doesn't seem like it's going to lead to a big step forward.

"The wonderful thing about standards is there are so many to choose from."
(variously attributed to Hopper, Tanenbaum, and others)

            regards, tom lane


Re: Proposal: Adding json logging

От
David Fetter
Дата:
On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
> On 2018-04-14 18:05:18 +0200, David Fetter wrote:
> > On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
> > > David Fetter <david@fetter.org> writes:
> > > > I think a suite of json_to_* utilities would be a good bit more
> > > > helpful in this regard than changing our human-eye-consumable
> > > > logs. We already have human-eye-consumable logs by default.  What
> > > > we don't have, and increasingly do want, is a log format that's
> > > > really easy on machines.
> > > 
> > > I'm dubious that JSON is "easier on machines" than CSV.
> > 
> > I've found the opposite.
> > 
> > CSV is very poorly specified, which makes it at best complicated to
> > build correct parsing libraries. JSON, whatever gripes I have about
> > the format[1] is extremely well specified, and hence has excellent
> > parsing libraries.
> 
> Worth to notice that useful json formats for logging also kinda don't
> follow standards. Either you end up with entire logfiles as one big
> array, which most libraries won't parse and makes logrotate etc really
> complicated, or you end up with some easy to parse format where newlines
> have non-standard record separator meaning.

I don't see this as a big problem.  The smallest-lift thing is to put
something along the lines of:

    When you log as JSON, those logs are JSON objects, one per output
    event.  They are not guaranteed to break on newlines.

A slightly larger lift would include escaping newlines and ensuring
that JSON output is always single lines, however long.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposal: Adding json logging

От
Andres Freund
Дата:
On 2018-04-15 00:31:14 +0200, David Fetter wrote:
> On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
> > On 2018-04-14 18:05:18 +0200, David Fetter wrote:
> > > CSV is very poorly specified, which makes it at best complicated to
> > > build correct parsing libraries. JSON, whatever gripes I have about
> > > the format[1] is extremely well specified, and hence has excellent
> > > parsing libraries.
> > 
> > Worth to notice that useful json formats for logging also kinda don't
> > follow standards. Either you end up with entire logfiles as one big
> > array, which most libraries won't parse and makes logrotate etc really
> > complicated, or you end up with some easy to parse format where newlines
> > have non-standard record separator meaning.
> 
> I don't see this as a big problem.  The smallest-lift thing is to put
> something along the lines of:
> 
>     When you log as JSON, those logs are JSON objects, one per output
>     event.  They are not guaranteed to break on newlines.
> 
> A slightly larger lift would include escaping newlines and ensuring
> that JSON output is always single lines, however long.

Still obliterates your "standard standard standard" line of
argument. There seem to valid arguments for adding json regardless, but
that line is just bogus.

Greetings,

Andres Freund


Re: Proposal: Adding json logging

От
Ryan Pedela
Дата:
On Sat, Apr 14, 2018, 4:33 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-04-15 00:31:14 +0200, David Fetter wrote:
> On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
> > On 2018-04-14 18:05:18 +0200, David Fetter wrote:
> > > CSV is very poorly specified, which makes it at best complicated to
> > > build correct parsing libraries. JSON, whatever gripes I have about
> > > the format[1] is extremely well specified, and hence has excellent
> > > parsing libraries.
> >
> > Worth to notice that useful json formats for logging also kinda don't
> > follow standards. Either you end up with entire logfiles as one big
> > array, which most libraries won't parse and makes logrotate etc really
> > complicated, or you end up with some easy to parse format where newlines
> > have non-standard record separator meaning.
>
> I don't see this as a big problem.  The smallest-lift thing is to put
> something along the lines of:
>
>     When you log as JSON, those logs are JSON objects, one per output
>     event.  They are not guaranteed to break on newlines.
>
> A slightly larger lift would include escaping newlines and ensuring
> that JSON output is always single lines, however long.

Still obliterates your "standard standard standard" line of
argument. There seem to valid arguments for adding json regardless, but
that line is just bogus.

Greetings,

Andres Freund

The format is known as JSON Lines.

Ryan

Re: Proposal: Adding json logging

От
Jordan Deitch
Дата:
I would suggest that the community consider whether postgres will log multidimensional data. That will weigh into the
decisionof  json vs. another format quite significantly. I am a fan of the json5 spec (https://json5.org/), though
adoptionof this is quite poor. 
 


---
Jordan Deitch
https://id.rsa.pub


Re: Proposal: Adding json logging

От
David Arnold
Дата:
>A slightly larger lift would include escaping newlines and ensuring that JSON output is always single lines, however long.

I think that's necessary, actually I was implicitly assuming that as a prerequisite. I cannot imagine anything else beeing actually useful.

Alternatively, I'm sure logfmt has a well thought-through solution for that :-)

I would suggest that the community consider whether postgres will log multidimensional data. That will weigh into the decision of  json vs. another format quite significantly. I am a fan of the json5 spec (https://json5.org/), though adoption of this is quite poor. 

What do you mean by multidimensional data? Arrays/maps?

I think there is no advantage of multidimensional vs prefixed flat logging unless data structure gets really nastily nested.

What case where you thinking of?

El sáb., 14 abr. 2018, 6:25 p.m., Jordan Deitch <jd@rsa.pub> escribió:
I would suggest that the community consider whether postgres will log multidimensional data. That will weigh into the decision of  json vs. another format quite significantly. I am a fan of the json5 spec (https://json5.org/), though adoption of this is quite poor.


---
Jordan Deitch
https://id.rsa.pub

Re: Proposal: Adding json logging

От
Jordan Deitch
Дата:
> > I would suggest that the community consider whether postgres will log
> multidimensional data. That will weigh into the decision of  json vs.
> another format quite significantly. I am a fan of the json5 spec (
> https://json5.org/), though adoption of this is quite poor.
> 
> What do you mean by multidimensional data? Arrays/maps?
> 
> I think there is no advantage of multidimensional vs prefixed flat logging
> unless data structure gets really nastily nested.
> 
> What case where you thinking of?

Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more
andmore sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying
jsonon the command line is simple and user friendly, and using json for logging capture and aggregation is widely
supportingand embraced.
 


Re: Proposal: Adding json logging

От
Dave Cramer
Дата:

On 15 April 2018 at 11:27, Jordan Deitch <jd@rsa.pub> wrote:
> > I would suggest that the community consider whether postgres will log
> multidimensional data. That will weigh into the decision of  json vs.
> another format quite significantly. I am a fan of the json5 spec (
> https://json5.org/), though adoption of this is quite poor.
>
> What do you mean by multidimensional data? Arrays/maps?
>
> I think there is no advantage of multidimensional vs prefixed flat logging
> unless data structure gets really nastily nested.
>
> What case where you thinking of?

Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.

Exactly what are you logging here ??? Why would I need to see a multi-dimensional array in the log ? 


Re: Proposal: Adding json logging

От
Jordan Deitch
Дата:
> Exactly what are you logging here ??? Why would I need to see a
> multi-dimensional array in the log ?

If I wanted to capture the location of errors my clients are encountering on their postgres clusters in detail, I would
needto parse the 'LOCATION' string in their log entries, parse out the filename by splitting on the ':' character of
thatsame line, and parse out the line number. Essentially any programmatic analysis of logs, as it stands today, would
requirestring parsing. I'd rather have an organized, logical representation of information which I suggest is not
possiblein a flat, single dimensional structure. 
 

{
"level":"ERROR",
"meta":{
 "line_number":23,
 "file": "parse_relation.c",
},
"detail:{
 "condition_name"...,
 "error_code"...,
},
time:....
}


Re: Proposal: Adding json logging

От
David Arnold
Дата:
Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.

Existence and adoption of jq does certainly make a point here over grep friendly but less structured data.

But I recommend reading over this blog post https://brandur.org/logfmt
It's got some strong arguments over a good log format.
Basically, a good log format is both, human (tail stdaout) and machine (log aggregators) readable. Note how logrus solves the conflict.

El dom., 15 abr. 2018, 10:27 a.m., Jordan Deitch <jd@rsa.pub> escribió:
> > I would suggest that the community consider whether postgres will log
> multidimensional data. That will weigh into the decision of  json vs.
> another format quite significantly. I am a fan of the json5 spec (
> https://json5.org/), though adoption of this is quite poor.
>
> What do you mean by multidimensional data? Arrays/maps?
>
> I think there is no advantage of multidimensional vs prefixed flat logging
> unless data structure gets really nastily nested.
>
> What case where you thinking of?

Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.

Re: Proposal: Adding json logging

От
David Arnold
Дата:
Does everyone more or less agree with the following intermediate résumé?

1. Throughout this vivid discussion a good portion of support has already been manifested for the need of a more structured (machine readable) logging format. There has been no substantial objection to this need.

2. It has been proposed one JSON object per logging event and alternatively or complementary one logfmt object per event (although with much less resonance).

3. Doubts about space have been addressed by the following arguments:
- Data is short lived, as most likely consumed directly without much of a persistence
- GZIP compression could easily lift space requirements to a near deduplicacion format's requirements.

4. Doubts about standard conformance are still ongoing discussion, however there seems not to be in existence yet the ultimate logging standard which addresses all the shortcomings of existing approaches in an pareto optimal way. It's most likely something that requires a compromise or options for the user to choose from.

I hope this is a truthful account of the current state about this thread.

I'd propose, in order to move forward, to wait during a "call for objections" of a one week period until next Friday if there are any substantial objections to the direction of this discussion.

Afterwards, I'd like to get, hopefully not alone, some hands dirty.

I hope that's ok for everyone, please let me know if you think this mail is not a legitimate intention to take a step forward.

Best Regards, David Arnold

El dom., 15 abr. 2018, 10:49 a.m., David Arnold <dar@xoe.solutions> escribió:
Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.

Existence and adoption of jq does certainly make a point here over grep friendly but less structured data.

But I recommend reading over this blog post https://brandur.org/logfmt
It's got some strong arguments over a good log format.
Basically, a good log format is both, human (tail stdaout) and machine (log aggregators) readable. Note how logrus solves the conflict.

El dom., 15 abr. 2018, 10:27 a.m., Jordan Deitch <jd@rsa.pub> escribió:
> > I would suggest that the community consider whether postgres will log
> multidimensional data. That will weigh into the decision of  json vs.
> another format quite significantly. I am a fan of the json5 spec (
> https://json5.org/), though adoption of this is quite poor.
>
> What do you mean by multidimensional data? Arrays/maps?
>
> I think there is no advantage of multidimensional vs prefixed flat logging
> unless data structure gets really nastily nested.
>
> What case where you thinking of?

Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.

Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 15, 2018, at 09:51, David Arnold <dar@xoe.solutions> wrote:
>
> 1. Throughout this vivid discussion a good portion of support has already been manifested for the need of a more
structured(machine readable) logging format. There has been no substantial objection to this need. 

I'm afraid I don't see that.  While it's true that as a standard, CSV is relatively ill-defined, as a practical matter
inPostgreSQL it is very easy to write code that parses .csv format. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 15, 2018, at 10:07, Christophe Pettus <xof@thebuild.com> wrote:
>
>
>> On Apr 15, 2018, at 09:51, David Arnold <dar@xoe.solutions> wrote:
>>
>> 1. Throughout this vivid discussion a good portion of support has already been manifested for the need of a more
structured(machine readable) logging format. There has been no substantial objection to this need. 
>
> I'm afraid I don't see that.  While it's true that as a standard, CSV is relatively ill-defined, as a practical
matterin PostgreSQL it is very easy to write code that parses .csv format. 

More specifically, JSON logging does seem to be a solution in search of a problem.  PostgreSQL's CSV logs are very easy
tomachine-parse, and if there are corrupt lines being emitted there, the first step should be to fix those, rather than
introducea new "this time, for sure" logging method. 

It's a matter of a few lines of code to convert CSV logs to a JSON format, if you need JSON format for something else.

Remember, also, that every new logging format introduces a burden on downstream tools to support it.  This is (still)
anissue with JSON format plans, which had a much more compelling advantage over standard-format plans than JSON logs do
overCSV. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
David Arnold
Дата:
>More specifically, JSON logging does seem to be a solution in search of a problem.  PostgreSQL's CSV logs are very easy to machine-parse, and if there are corrupt lines being emitted there, the first step should be to fix those, rather than introduce a new "this time, for sure" logging method.

>It's a matter of a few lines of code to convert CSV logs to a JSON format, if you need JSON format for something else.

In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

If it is fixing csv logs to guarantee to emit one line per event, then this is equally a solution to the problem.
  • It would be preferable under the light of "minimal code change"
  • It would probably break some downstream parsers already in place.
  • It would not try to solve this problem: https://brandur.org/logfmt (machine AND human)
  • I don't know a lot of libraries that concluded csv logging is the best way to move forward. (They could be all wrong, though)
  • No off-the-shelve parser exists (you need to write code, as small as it might be, it becomes a component in your stack and therefore a SPOF)
First point is definitely a strong one. Did I miss any additional arguments?

El dom., 15 abr. 2018 a las 12:24, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 15, 2018, at 10:07, Christophe Pettus <xof@thebuild.com> wrote:
>
>
>> On Apr 15, 2018, at 09:51, David Arnold <dar@xoe.solutions> wrote:
>>
>> 1. Throughout this vivid discussion a good portion of support has already been manifested for the need of a more structured (machine readable) logging format. There has been no substantial objection to this need.
>
> I'm afraid I don't see that.  While it's true that as a standard, CSV is relatively ill-defined, as a practical matter in PostgreSQL it is very easy to write code that parses .csv format.

More specifically, JSON logging does seem to be a solution in search of a problem.  PostgreSQL's CSV logs are very easy to machine-parse, and if there are corrupt lines being emitted there, the first step should be to fix those, rather than introduce a new "this time, for sure" logging method.

It's a matter of a few lines of code to convert CSV logs to a JSON format, if you need JSON format for something else.

Remember, also, that every new logging format introduces a burden on downstream tools to support it.  This is (still) an issue with JSON format plans, which had a much more compelling advantage over standard-format plans than JSON logs do over CSV.

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 15, 2018, at 10:39, David Arnold <dar@xoe.solutions> wrote:
>
> In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

It looks like the thread skipped over the problem space for the solution space pretty fast; I see your note:

> I have reviewed some log samples and all DO contain some kind of multi line logs which are very uncomfortable to
parsereliably in a log streamer. 

... but I don't see any actual examples of those.  Can you elaborate?

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
John W Higgins
Дата:


On Sun, Apr 15, 2018 at 10:39 AM, David Arnold <dar@xoe.solutions> wrote:
>More specifically, JSON logging does seem to be a solution in search of a problem.  PostgreSQL's CSV logs are very easy to machine-parse, and if there are corrupt lines being emitted there, the first step should be to fix those, rather than introduce a new "this time, for sure" logging method.

>It's a matter of a few lines of code to convert CSV logs to a JSON format, if you need JSON format for something else.

In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

This would appear to solve multiline issues within Fluent.....

https://docs.fluentd.org/v0.12/articles/parser_multiline

John

Re: Proposal: Adding json logging

От
David Arnold
Дата:
>It looks like the thread skipped over the problem space for the solution space pretty fast

OK, I apologize, it seemed to me from the feedback that the problem was already uncontested. To verify/falsify that was the objective of my previous mail :)

>Can you elaborate?


Sure.

CSV shows line breaks, STDOUT shows ERROR/FATAL and detail on different lines, not an easy problem to stream-parse reliably (without some kind of a buffer, etc)...

El dom., 15 abr. 2018 a las 12:46, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 15, 2018, at 10:39, David Arnold <dar@xoe.solutions> wrote:
>
> In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

It looks like the thread skipped over the problem space for the solution space pretty fast; I see your note:

> I have reviewed some log samples and all DO contain some kind of multi line logs which are very uncomfortable to parse reliably in a log streamer.

... but I don't see any actual examples of those.  Can you elaborate?

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Andres Freund
Дата:
On 2018-04-15 18:00:05 +0000, David Arnold wrote:
> CSV shows line breaks, STDOUT shows ERROR/FATAL and detail on different
> lines, not an easy problem to stream-parse reliably (without some kind of a
> buffer, etc)...

Why? The newlines aren't meaningfully different from other characters
you need to parse? The data isn't actually stored in a newline separated
fashion, that's just one byte with that meaning.

- Andres


Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 15, 2018, at 11:00, David Arnold <dar@xoe.solutions> wrote:
> 
> CSV Logs: https://pastebin.com/uwfmRdU7

Is the issue that there are line breaks in things like lines 7-9?

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
David Arnold
Дата:
>This would appear to solve multiline issues within Fluent.....
>https://docs.fluentd.org/v0.12/articles/parser_multiline

I definitely looked at that, but what guarantees do I have that the sequence is always ERROR/STATEMENT/DETAIL? And not the other way round?
And it only works with tail logging from log file so I cannot use a native docker logging driver which streams event by event.
This again prohibits me the usage of host-global docker logging driver configuration as my standard option for host provisioning.

>Is the issue that there are line breaks in things like lines 7-9?
No way to parse that with a line by line regex cleanly. Fluent-multi line is no real multi line regex, it's just some logic to emulate multi line regex. I believe the reason might be that true multi line regex would be way to resource demanding to run intercalatingly on a moving set of lines of unknown cardinality.

El dom., 15 abr. 2018 a las 13:00, David Arnold (<dar@xoe.solutions>) escribió:
>It looks like the thread skipped over the problem space for the solution space pretty fast

OK, I apologize, it seemed to me from the feedback that the problem was already uncontested. To verify/falsify that was the objective of my previous mail :)

>Can you elaborate?


Sure.

CSV shows line breaks, STDOUT shows ERROR/FATAL and detail on different lines, not an easy problem to stream-parse reliably (without some kind of a buffer, etc)...

El dom., 15 abr. 2018 a las 12:46, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 15, 2018, at 10:39, David Arnold <dar@xoe.solutions> wrote:
>
> In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

It looks like the thread skipped over the problem space for the solution space pretty fast; I see your note:

> I have reviewed some log samples and all DO contain some kind of multi line logs which are very uncomfortable to parse reliably in a log streamer.

... but I don't see any actual examples of those.  Can you elaborate?

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
David Arnold
Дата:
>Why? The newlines aren't meaningfully different from other characters
you need to parse? The data isn't actually stored in a newline separated
fashion, that's just one byte with that meaning

I miss the details, but I believe that stdout is usually parsed and streamed simply line by line. Like in: 
"Important: do not attempt to add multiline support in your regular expressions if you are using Tail input plugin since each line is handled as a separated entity. Instead use Tail Multiline support configuration feature."

El dom., 15 abr. 2018 a las 13:08, David Arnold (<dar@xoe.solutions>) escribió:
>This would appear to solve multiline issues within Fluent.....
>https://docs.fluentd.org/v0.12/articles/parser_multiline

I definitely looked at that, but what guarantees do I have that the sequence is always ERROR/STATEMENT/DETAIL? And not the other way round?
And it only works with tail logging from log file so I cannot use a native docker logging driver which streams event by event.
This again prohibits me the usage of host-global docker logging driver configuration as my standard option for host provisioning.

>Is the issue that there are line breaks in things like lines 7-9?
No way to parse that with a line by line regex cleanly. Fluent-multi line is no real multi line regex, it's just some logic to emulate multi line regex. I believe the reason might be that true multi line regex would be way to resource demanding to run intercalatingly on a moving set of lines of unknown cardinality.

El dom., 15 abr. 2018 a las 13:00, David Arnold (<dar@xoe.solutions>) escribió:
>It looks like the thread skipped over the problem space for the solution space pretty fast

OK, I apologize, it seemed to me from the feedback that the problem was already uncontested. To verify/falsify that was the objective of my previous mail :)

>Can you elaborate?


Sure.

CSV shows line breaks, STDOUT shows ERROR/FATAL and detail on different lines, not an easy problem to stream-parse reliably (without some kind of a buffer, etc)...

El dom., 15 abr. 2018 a las 12:46, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 15, 2018, at 10:39, David Arnold <dar@xoe.solutions> wrote:
>
> In the light of the specific use case / problem for this thread to be born, what exactly would you suggest?

It looks like the thread skipped over the problem space for the solution space pretty fast; I see your note:

> I have reviewed some log samples and all DO contain some kind of multi line logs which are very uncomfortable to parse reliably in a log streamer.

... but I don't see any actual examples of those.  Can you elaborate?

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
John W Higgins
Дата:


On Sun, Apr 15, 2018 at 11:08 AM, David Arnold <dar@xoe.solutions> wrote:
>This would appear to solve multiline issues within Fluent.....
>https://docs.fluentd.org/v0.12/articles/parser_multiline

I definitely looked at that, but what guarantees do I have that the sequence is always ERROR/STATEMENT/DETAIL? And not the other way round?

Have you asked that question? You seem to at least have opened the source code - did you try to figure out what the logging format is?
 
And it only works with tail logging from log file so I cannot use a native docker logging driver which streams event by event.
This again prohibits me the usage of host-global docker logging driver configuration as my standard option for host provisioning.

What does JSON logging have to do with "event by event" streaming?

Docker also lists Fluent as a standard driver for logging [1] - along with syslog and a variety of others. It's outstanding that JSON is their default - but they seem perfectly happy to accomodate plenty of other options. I don't seem how there is a conflict here.

I was also under the impression things like Fluent existed for the sole purpose of taking disparate logging solutions and bringing them under one roof - it would seem like it wants nothing more than for PostgreSQL to do as it pleases with the logs and they will pick it up and run with it.

If there is an actual ambiguity with how logs are produced - I'm certain plenty of folks on here would like to solve that issue immediately. But I don't see anything stopping Docker/Fluent from using what is currently on the table.

Re: Proposal: Adding json logging

От
David Arnold
Дата:
>Have you asked that question? You seem to at least have opened the source code - did you try to figure out what the logging format is?
1. -> No. 2. -> Yes.

I might be wrong, but something in my head tells me to have them seen the other way round. Unfortunately, I'm not experienced enough to be able to tell from the code and execution context if that guarantee exists. Nothing about such guarantees in the docs neither [1]. I let myself guide by serverfault questions like this one [2]. The existence of a log id on the standard format (if I got that one right) enticed me to think such guarantees do NOT exist.

Other than that, still this is not "compliant" with one-event-one-line semantics and forces me to use tailing logs [3] from file system, so no docker logging driver at all to that end with all it's implications on provisioning log aggregation in my clusters, in general. Just to make it clear: That's primarily my problem and not the problem of postgres in general. But it doesn't help neither.

>What does JSON logging have to do with "event by event" streaming?
It's a commonly chosen option in that problem space, just as it's alternative logfmt, but has no direct causality chain, more like in "wisdom of the multitude" (which can be terribly wrong at times). As said before, CSV without newlines would be equally a "fix" to my problem.

>Docker also lists Fluent as a standard driver for logging [1] - along with syslog and a variety of others. It's outstanding that JSON is their default - but they seem perfectly happy to accomodate plenty of other options. I don't seem how there is a conflict here. 
>I was also under the impression things like Fluent existed for the sole purpose of taking disparate logging solutions and bringing them under one roof - it would seem like it wants nothing more than for PostgreSQL to do as it pleases with the logs and they will pick it up and run with it.
>If there is an actual ambiguity with how logs are produced - I'm certain plenty of folks on here would like to solve that issue immediately. But I don't see anything stopping Docker/Fluent from using what is currently on the table.

While this describes well my intended take on log aggregation, multi line is breaking things (or at least making them a lot more complicated than they need to be) at the parsing stage as in [4].

Do you think there is a chance of an alternative solution to the exposed problem? I'm happy to dig further.
json/logfmt still seems a promising option, thinking the problem from it's end. For now I would define the problem like so:

Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some (commonly used) logging aggregation flows."
2nd-order Problem: "Logging space increasingly moves towards the adoption of structured logging formats around json/logfmt. Compatibly options (plural!) with main stream (not necessarily standard) tooling is a value proposition of it's own kind. It helps increase odds of responsible deployments and improves the overall experience in adopting PostgreSQL."

Best, David


El dom., 15 abr. 2018 a las 13:29, John W Higgins (<wishdev@gmail.com>) escribió:
On Sun, Apr 15, 2018 at 11:08 AM, David Arnold <dar@xoe.solutions> wrote:
>This would appear to solve multiline issues within Fluent.....
>https://docs.fluentd.org/v0.12/articles/parser_multiline

I definitely looked at that, but what guarantees do I have that the sequence is always ERROR/STATEMENT/DETAIL? And not the other way round?

Have you asked that question? You seem to at least have opened the source code - did you try to figure out what the logging format is?
 
And it only works with tail logging from log file so I cannot use a native docker logging driver which streams event by event.
This again prohibits me the usage of host-global docker logging driver configuration as my standard option for host provisioning.

What does JSON logging have to do with "event by event" streaming?

Docker also lists Fluent as a standard driver for logging [1] - along with syslog and a variety of others. It's outstanding that JSON is their default - but they seem perfectly happy to accomodate plenty of other options. I don't seem how there is a conflict here.

I was also under the impression things like Fluent existed for the sole purpose of taking disparate logging solutions and bringing them under one roof - it would seem like it wants nothing more than for PostgreSQL to do as it pleases with the logs and they will pick it up and run with it.

If there is an actual ambiguity with how logs are produced - I'm certain plenty of folks on here would like to solve that issue immediately. But I don't see anything stopping Docker/Fluent from using what is currently on the table.

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 15, 2018, at 12:16, David Arnold <dar@xoe.solutions> wrote:
>
> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some
(commonlyused) logging aggregation flows." 

I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are making
someunwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a
particularbizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for
example. You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for
example,needs to be emitted to the logs. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
David Arnold
Дата:
> I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are making some unwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a particular bizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for example.  You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for example, needs to be emitted to the logs.

I believe the (valid) root reason behind this assumption is to care for resource consumption during stream processing. There are solutions to that problem (as the multi line mode of fuent bit already exposed before in this thread), but they are not specifically reliable, nor easy to maintain. Not claiming this assumption does imply parsing of a rolling set of log lines with previously unkown cardinality. That's expensive on computing resources. I don't have actual numbers, but it doesn't seem too far fetched, neither.

I filed a question to the author of fluent-bit to that extend which you can consult here: https://github.com/fluent/fluent-bit/issues/564
Let's see what Eduardo has to inform us about this...

El dom., 15 abr. 2018 a las 16:05, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 15, 2018, at 12:16, David Arnold <dar@xoe.solutions> wrote:
>
> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some (commonly used) logging aggregation flows."

I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are making some unwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a particular bizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for example.  You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for example, needs to be emitted to the logs.

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Andrew Dunstan
Дата:

On 04/15/2018 05:05 PM, Christophe Pettus wrote:
>> On Apr 15, 2018, at 12:16, David Arnold <dar@xoe.solutions> wrote:
>>
>> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of
some(commonly used) logging aggregation flows."
 
> I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are making
someunwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a
particularbizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for
example. You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for
example,needs to be emitted to the logs.
 
>


In JSON newlines would have to be escaped, since literal newlines are
not legal in JSON strings. Postgres' own CSV parser has no difficulty at
all in handling newlines embedded in the fields of CSV logs.

I'm not necessarily opposed to providing for JSON logs, but the overhead
of named keys could get substantial. Abbreviated keys might help, but
generally I think I would want to put such logs on a compressed ZFS
drive or some such.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: Adding json logging

От
David Fetter
Дата:
On Mon, Apr 16, 2018 at 10:06:29AM -0400, Andrew Dunstan wrote:
> On 04/15/2018 05:05 PM, Christophe Pettus wrote:
> >> On Apr 15, 2018, at 12:16, David Arnold <dar@xoe.solutions> wrote:
> >>
> >> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of
some(commonly used) logging aggregation flows."
 
> > I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are
makingsome unwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a
particularbizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for
example. You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for
example,needs to be emitted to the logs.
 
> 
> 
> In JSON newlines would have to be escaped, since literal newlines are
> not legal in JSON strings. Postgres' own CSV parser has no difficulty at
> all in handling newlines embedded in the fields of CSV logs.

True, and anything that malloc()s in the process of doing that
escaping could fail on OOM, and hilarity would ensue. I don't see
these as show-stoppers, or even as super relevant to the vast majority
of users. If you're that close to the edge, you were going to crash
anyhow.

> I'm not necessarily opposed to providing for JSON logs, but the
> overhead of named keys could get substantial. Abbreviated keys might
> help, but generally I think I would want to put such logs on a
> compressed ZFS drive or some such.

Frequently at places I've worked, the end destination is of less
concern immediate than the ability to process those logs for
near-real-time monitoring.  This is where formats like JSON really
shine.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposal: Adding json logging

От
David Arnold
Дата:
Hi all,

This discussion has made big steps forward. It is very encouraging to see this amount of interest. It seems that this has been around at the back of many minds for some time already...

Thanks to Chrisophe friendly reminder, I aim to try to define the problem space as concise as possible.
I think what we have left to clarify is the root reason of the observation that logs streaming/processing tools do not embrace multi lines.
Then, hopefully, a final decision can be obtained on whether this is a admissible problem worth addressing or not from postgres side.

In case it is, it can be additionally decided, whether it's also an UX that should be improved (-> JSON/logfmt) by occasion of this opportunity.

Let's hope Eduardo, the maker of fluent-bit finds time soon to tell us what he has to say about the multi line problem in log parsing.

Best, David

El lun., 16 abr. 2018 a las 9:41, David Fetter (<david@fetter.org>) escribió:
On Mon, Apr 16, 2018 at 10:06:29AM -0400, Andrew Dunstan wrote:
> On 04/15/2018 05:05 PM, Christophe Pettus wrote:
> >> On Apr 15, 2018, at 12:16, David Arnold <dar@xoe.solutions> wrote:
> >>
> >> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some (commonly used) logging aggregation flows."
> > I'd argue that the first line of attack on that should be to explain to those consumers of logs that they are making some unwarranted assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a particular bizarre format, or very difficult to parse.  The standard Python CSV library handles them just file, for example.  You have to handle newlines that are part of a log message somehow; a newline in a PostgreSQL query, for example, needs to be emitted to the logs.
>
>
> In JSON newlines would have to be escaped, since literal newlines are
> not legal in JSON strings. Postgres' own CSV parser has no difficulty at
> all in handling newlines embedded in the fields of CSV logs.

True, and anything that malloc()s in the process of doing that
escaping could fail on OOM, and hilarity would ensue. I don't see
these as show-stoppers, or even as super relevant to the vast majority
of users. If you're that close to the edge, you were going to crash
anyhow.

> I'm not necessarily opposed to providing for JSON logs, but the
> overhead of named keys could get substantial. Abbreviated keys might
> help, but generally I think I would want to put such logs on a
> compressed ZFS drive or some such.

Frequently at places I've worked, the end destination is of less
concern immediate than the ability to process those logs for
near-real-time monitoring.  This is where formats like JSON really
shine.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
"Daniel Verite"
Дата:
    David Arnold wrote:

> Not claiming this assumption does imply parsing of a *rolling* set
> of log lines with *previously unkown cardinality*. That's expensive
> on computing resources. I don't have actual numbers, but it doesn't
> seem too far fetched, neither.
> I filed a question to the author of fluent-bit to that extend which
> you can consult here:
> https://github.com/fluent/fluent-bit/issues/564 Let's see what
> Eduardo has to inform us about this...

fluent-bit does not appear to support CSV, as mentioned in
https://github.com/fluent/fluent-bit/issues/459
which got flagged as an enhancement request some time ago.

In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):

  "Fields containing line breaks (CRLF), double quotes, and commas
    should be enclosed in double-quotes"

So there is no look ahead to do. In a character-by-character loop,
when encountering a line break, either the current field did not
start with a double quote and the line break is part of the content, or it
did
start with a double quote and the line break ends the current record.

What doesn't quite work is to parse CSV with a regex, it's
discussed in some detail here for instance:
https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Proposal: Adding json logging

От
David Arnold
Дата:
In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):
>"Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes"

Interesting, does that implicitly mean the whole log event would get transmitted as a "line" (with CRLF) in CSV. Nor do I know how to confirm of falsify that...

In the affirmative scenario, this then would work for a true streaming aggregator (if CSV was supported).

Wouldn't it continue to be resource expensive if for some reason someone wants to continue to do log tailing out of a log file, which as I understand it is lined by line semantics?

I do not plan to do that, but there are people who prefer to also have a local copy of their logfile just in case.

But if CSV, as emitted by postgres, would be supported by fluentbit I would be equally happy with that solution.

The second order problem, after all is just that: of second order...

El lun., 16 abr. 2018, 1:28 p.m., Daniel Verite <daniel@manitou-mail.org> escribió:
        David Arnold wrote:

> Not claiming this assumption does imply parsing of a *rolling* set
> of log lines with *previously unkown cardinality*. That's expensive
> on computing resources. I don't have actual numbers, but it doesn't
> seem too far fetched, neither.
> I filed a question to the author of fluent-bit to that extend which
> you can consult here:
> https://github.com/fluent/fluent-bit/issues/564 Let's see what
> Eduardo has to inform us about this...

fluent-bit does not appear to support CSV, as mentioned in
https://github.com/fluent/fluent-bit/issues/459
which got flagged as an enhancement request some time ago.

In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):

  "Fields containing line breaks (CRLF), double quotes, and commas
    should be enclosed in double-quotes"

So there is no look ahead to do. In a character-by-character loop,
when encountering a line break, either the current field did not
start with a double quote and the line break is part of the content, or it
did
start with a double quote and the line break ends the current record.

What doesn't quite work is to parse CSV with a regex, it's
discussed in some detail here for instance:
https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Proposal: Adding json logging

От
Peter Eisentraut
Дата:
On 4/13/18 20:00, David Arnold wrote:
> I have reviewed some log samples and all DO contain some kind of multi
> line logs which are very uncomfortable to parse reliably in a log streamer.
> 
> I asked Michael Paquier about his
> solution: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
> He was suggestion to take action and propose this extension again to be
> included in contrib:
> https://github.com/michaelpq/pg_plugins/issues/24 

I have used https://github.com/mpihlak/pg_logforward in the past, which
seems to be about the same thing.

I have also had good success using syslog.  While syslog is not very
structured, the setting syslog_split_messages allows sending log entries
that include newlines in one piece, which works well if you have some
kind of full-text search engine at the receiving end.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal: Adding json logging

От
Michael Paquier
Дата:
On Mon, Apr 16, 2018 at 07:52:58PM -0400, Peter Eisentraut wrote:
> I have used https://github.com/mpihlak/pg_logforward in the past, which
> seems to be about the same thing.

Didn't know this one.  Thanks.

> I have also had good success using syslog.  While syslog is not very
> structured, the setting syslog_split_messages allows sending log entries
> that include newlines in one piece, which works well if you have some
> kind of full-text search engine at the receiving end.

syslog suffers from the possibility to lose messages if I recall
correctly, right?  This may matter for some critical environments.

Upstream code has escape_json() directly included, which is able to do
the job and makes sure that a single JSON entry is not broken into
multiple lines.  That's what my jsonlog uses to format the strings used,
and what I can see pg_logforward does as well witha custom copy.

As a whole model, producing one JSON object per line and per log-entry
is the most natural format in my opinion.

One thing which is perhaps sensitive for JSON is the timestamp format.
The JSON specification does not decide what should be the format of
timestamps, still parser facilities are somewhat all pointing into using
ISO 8601 with stuff like Javascript Date's toJSON method.  There are
some side issues with the use of UTC..  So the thing is sorta of messy.

However, as JSON entries are usually larger than normal log entries,
getting log entries broken into multiple lines is easier if not using
logging_collector.  People normally don't do that, but I received
complains on the matter as well when using Postgres in Docker container
for example.  So documenting that logging_collector needs to be enabled
is important if this log format shows up in Postgres.
--
Michael

Вложения

Re: Proposal: Adding json logging

От
Peter Eisentraut
Дата:
On 4/16/18 23:12, Michael Paquier wrote:
>> I have also had good success using syslog.  While syslog is not very
>> structured, the setting syslog_split_messages allows sending log entries
>> that include newlines in one piece, which works well if you have some
>> kind of full-text search engine at the receiving end.
> syslog suffers from the possibility to lose messages if I recall
> correctly, right?  This may matter for some critical environments.

Depends on whether you configure it to use TCP or UDP.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal: Adding json logging

От
"Daniel Verite"
Дата:
    David Arnold wrote:

> Interesting, does that implicitly mean the whole log event would get
> transmitted as a "line" (with CRLF) in CSV.

To me it's implied by the doc at:
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

> In the affirmative scenario, this then would work for a true streaming
> aggregator (if CSV was supported).

Assuming a real CSV parser tailing the log, there shouldn't be any trouble
with that.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Proposal: Adding json logging

От
David Arnold
Дата:

Additionally this still depends on the way some middleware might choose to stream data. Can we really be sure the risk is minimal that any middleware would have chosen to treat new line as an entity delimitator?

Can we even be sure that NO existing middleware would treat newline as a entity delimitator?

I'm not that confident about that.

Anticipating the possible argument, that the "others are wrong": This arguement, though valid, seems sometimes is tought it's limits in very mondane practicability and efficiency needs.

El mar., 17 abr. 2018, 6:55 a.m., Daniel Verite <daniel@manitou-mail.org> escribió:
        David Arnold wrote:

> Interesting, does that implicitly mean the whole log event would get
> transmitted as a "line" (with CRLF) in CSV.

To me it's implied by the doc at:
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

> In the affirmative scenario, this then would work for a true streaming
> aggregator (if CSV was supported).

Assuming a real CSV parser tailing the log, there shouldn't be any trouble
with that.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Proposal: Adding json logging

От
David Arnold
Дата:
This discussion is thriving, and long and behold, we've got an opinion from Eduardo (fluent-bit):

https://github.com/fluent/fluent-bit/issues/564#issuecomment-381844419

>Also consider that in not all scenarios full multiline logs are flushed right away, sometimes there are delays.

I think this line supports the view for multi line being a sub optimal logging strategy (assessed globally and embedded into an deployment ecosystem).

> Btw, we would be happy to work together with PostreSQL community to support an official parser for your multiline logs

Although, fluent-bit seem to strive to work with postgres log "as it pleases", I think it is still a valid problem definition against all contrary arguments to maintain:

Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some (commonly used) logging aggregation flows."
2nd-order Problem: "Logging space increasingly moves towards the adoption of structured logging formats around json/logfmt. Compatibly options (plural!) with main stream (not necessarily standard) tooling is a value proposition of it's own kind. It helps increase odds of responsible deployments and improves the overall experience in adopting PostgreSQL."

Please share you thoughts, if you still feel there are material objections to the core problem? JSON or not JSON, as Christophe recalled, then is a question in the solution space.
Note that part of the problem definition is "unnecessary", which implies judgment on responsibilities and ecosystems working together, rather than a broken system.

El mar., 17 abr. 2018 a las 7:31, David Arnold (<dar@xoe.solutions>) escribió:
Additionally this still depends on the way some middleware might choose to stream data. Can we really be sure the risk is minimal that any middleware would have chosen to treat new line as an entity delimitator?

Can we even be sure that NO existing middleware would treat newline as a entity delimitator?

I'm not that confident about that.

Anticipating the possible argument, that the "others are wrong": This arguement, though valid, seems sometimes is tought it's limits in very mondane practicability and efficiency needs.


El mar., 17 abr. 2018, 6:55 a.m., Daniel Verite <daniel@manitou-mail.org> escribió:
        David Arnold wrote:

> Interesting, does that implicitly mean the whole log event would get
> transmitted as a "line" (with CRLF) in CSV.

To me it's implied by the doc at:
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

> In the affirmative scenario, this then would work for a true streaming
> aggregator (if CSV was supported).

Assuming a real CSV parser tailing the log, there shouldn't be any trouble
with that.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Alvaro Herrera
Дата:
One issue I haven't seen mentioned in this thread is the translation
status of the server message (as well as its encoding): it's possible to
receive messages in some random language if the lc_message setting is
changed.  Requiring that lc_messages must always be set to some English
locale seems like a poor answer to this problem.
IMO the untranslated server message should be part of the event also.
I don't know what to think of %-expansions of the message.

The character encoding can be changed per database.  Log files where the
encoding differs across databases cannot be processed in any sane way.
You can try some heuristics (try to read each message as utf8 first, and
if that fails, then it must be Latin1!  If that doesn't work for you,
... tough luck), but that's a pretty poor answer too.  Not sure what is
a good solution to this problem.  Maybe ensure that these things are
always UTF8?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal: Adding json logging

От
David Arnold
Дата:
Alvaro, just to clarify for me, do you refer to the messages generated by https://github.com/postgres/postgres/blob/master/src/backend/utils/error/elog.c or other messages?

Standardizing on UTF8 seems a good option. Assuming it is a problem, I would classify this as another second-order problem, though, because it would be also an issue right now, so we can sum up:

Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and are not compatible with the design of some (commonly used) logging aggregation flows."
2nd-order Problem 1: "Logging space increasingly moves towards the adoption of structured logging formats around json/logfmt. Compatibly options (plural!) with main stream (not necessarily standard) tooling is a value proposition of it's own kind. It helps increase odds of responsible deployments and improves the overall experience in adopting PostgreSQL."
2nd-order Problem 2: "Encoding of logging can differ per database, this inhibits the objective of reliable log stream parsing"

El mar., 17 abr. 2018 a las 9:26, Alvaro Herrera (<alvherre@alvh.no-ip.org>) escribió:
One issue I haven't seen mentioned in this thread is the translation
status of the server message (as well as its encoding): it's possible to
receive messages in some random language if the lc_message setting is
changed.  Requiring that lc_messages must always be set to some English
locale seems like a poor answer to this problem.
IMO the untranslated server message should be part of the event also.
I don't know what to think of %-expansions of the message.

The character encoding can be changed per database.  Log files where the
encoding differs across databases cannot be processed in any sane way.
You can try some heuristics (try to read each message as utf8 first, and
if that fails, then it must be Latin1!  If that doesn't work for you,
... tough luck), but that's a pretty poor answer too.  Not sure what is
a good solution to this problem.  Maybe ensure that these things are
always UTF8?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Robert Haas
Дата:
On Sun, Apr 15, 2018 at 1:07 PM, Christophe Pettus <xof@thebuild.com> wrote:
>> On Apr 15, 2018, at 09:51, David Arnold <dar@xoe.solutions> wrote:
>> 1. Throughout this vivid discussion a good portion of support has already been manifested for the need of a more
structured(machine readable) logging format. There has been no substantial objection to this need.
 
>
> I'm afraid I don't see that.  While it's true that as a standard, CSV is relatively ill-defined, as a practical
matterin PostgreSQL it is very easy to write code that parses .csv format.
 

I'm not sure exactly how you intended to this comment, but it seems to
me that whether CSV is ease or hard to parse, somebody might
legitimately find JSON more convenient.  For example, and as has been
discussed on this thread, if you have a system that is consuming the
logs that already knows how to parse JSON but does not know how to
parse CSV, then you will find the JSON format to be convenient.

For the record, I'm tentatively in favor of including something like
this in contrib.  I think it's useful to have more examples of how to
use our existing hooks in contrib, and I think this is useful on
principle.

I am a little concerned about this bit from the README, though:

====
Note that logging_collector should be enabled in postgresql.conf to
ensure consistent log outputs.  As JSON strings are longer than normal
logs generated by PostgreSQL, this module increases the odds of malformed
log entries.
====

I'm not sure I understand the issue, but I don't like malformed log entries.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Proposal: Adding json logging

От
Christophe Pettus
Дата:
> On Apr 18, 2018, at 11:59, Robert Haas <robertmhaas@gmail.com> wrote:
>
> I'm not sure exactly how you intended to this comment, but it seems to
> me that whether CSV is ease or hard to parse, somebody might
> legitimately find JSON more convenient.

Of course.  The specific comment I was replying to made a couple of jumps that I wanted to unwind: The first is that we
don'thave a machine-readable format for PostgreSQL (we do, CSV), and that there was "no substantial objection to this
need."

If the requirement is: "There is a large class of log analysis tool out there that has trouble with multiline formats
andwe should be good ecosystem players," that's fine.  (I'm a bit sour about the number of tools being written with
one-line-per-eventbaked into them and whose solution to any other format is "use regex," but that's neither here nor
there,I suppose.) 

My primary objection to creating new output formats is that it creates an implicit burden on downstream tools to adopt
them. For example, a log of query analysis tools don't yet process JSON-format plans, and they've been around for a
while. By introducing a new format in core (which was the starting proposal), we're essentially telling all the tools
(suchas pgbadger) that might absorb them that we expect them to adopt that too. 

> For the record, I'm tentatively in favor of including something like
> this in contrib.

I'm much less fussed by this in contrib/ (with the same concern you noted), at minimum as an example of how to do
loggingin other formats. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Proposal: Adding json logging

От
David Arnold
Дата:
Excellent phrasing (thanks to Christophe!): "There is a large class of log analysis tool out there that has trouble with multiline formats and we should be good ecosystem players"

I'm much less fussed by this in contrib/ (with the same concern you noted), at a minimum as an example of how to do logging in other formats.

This would be a very well balanced compromise, almost every distribution flavour also packages contrib so installing contrib and loading such module as a convenient pre-packaged shared library would just be an excellent solution to the big majority of Postgres users. 

Now we are moving to the solutions space :) I wanted to wait this week, though, to give sufficient time to comment on all aspects and would do another wrap-up next weekend. This suggestion would definitely be part of it.

El mié., 18 abr. 2018 a las 14:10, Christophe Pettus (<xof@thebuild.com>) escribió:

> On Apr 18, 2018, at 11:59, Robert Haas <robertmhaas@gmail.com> wrote:
>
> I'm not sure exactly how you intended to this comment, but it seems to
> me that whether CSV is ease or hard to parse, somebody might
> legitimately find JSON more convenient.

Of course.  The specific comment I was replying to made a couple of jumps that I wanted to unwind: The first is that we don't have a machine-readable format for PostgreSQL (we do, CSV), and that there was "no substantial objection to this need."

If the requirement is: "There is a large class of log analysis tool out there that has trouble with multiline formats and we should be good ecosystem players," that's fine.  (I'm a bit sour about the number of tools being written with one-line-per-event baked into them and whose solution to any other format is "use regex," but that's neither here nor there, I suppose.)

My primary objection to creating new output formats is that it creates an implicit burden on downstream tools to adopt them.  For example, a log of query analysis tools don't yet process JSON-format plans, and they've been around for a while.  By introducing a new format in core (which was the starting proposal), we're essentially telling all the tools (such as pgbadger) that might absorb them that we expect them to adopt that too.

> For the record, I'm tentatively in favor of including something like
> this in contrib.

I'm much less fussed by this in contrib/ (with the same concern you noted), at minimum as an example of how to do logging in other formats.

--
-- Christophe Pettus
   xof@thebuild.com

--
XOE SolutionsDAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
Confidentiality Note: This email may contain confidential and/or private information. If you received this email in error please delete and notify sender.
Environmental Consideration: Please avoid printing this email on paper, unless really necessary.

Re: Proposal: Adding json logging

От
Alvaro Herrera
Дата:
John W Higgins wrote:
> On Sun, Apr 15, 2018 at 11:08 AM, David Arnold <dar@xoe.solutions> wrote:
> 
> > >This would appear to solve multiline issues within Fluent.....
> > >https://docs.fluentd.org/v0.12/articles/parser_multiline
> >
> > I definitely looked at that, but what guarantees do I have that the
> > sequence is always ERROR/STATEMENT/DETAIL? And not the other way round?
> 
> Have you asked that question? You seem to at least have opened the source
> code - did you try to figure out what the logging format is?

I looked at this a couple of days ago.  I think parsing with this
library is possible to a certain extent, and the problems stem from
limitations of the library.  So, it turns out that the firstline can be
set to a single regex that searches for PANIC, FATAL, ERROR, WARNING,
LOG, NOTICE, DEBUG.  That's always the first line in any postgres log
event.

A log event contains some subsequent lines.  Those start either with a
tab (which is a continuation of the previous line) or with one of
DETAIL, CONTEXT, HINT, STATEMENT, QUERY.  This seems very simple to
parse (just add lineN patterns for those), *except* that the messages
can be multiline too; and where would you assign the continuation lines
for each of those?  parser_multiline does not support that.

Another thing worth keeping in mind is that you need to change the regex
depending on log_line_prefix, which sounds very painful.

All in all, the best approach might be to create a specific
parser_postgresql.rb plugin.  Seems much easier to write two dozen lines
of Ruby than change all of PostgreSQL's logging infrastructure.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal: Adding json logging

От
Michael Paquier
Дата:
On Wed, Apr 18, 2018 at 02:59:26PM -0400, Robert Haas wrote:
> ====
> Note that logging_collector should be enabled in postgresql.conf to
> ensure consistent log outputs.  As JSON strings are longer than normal
> logs generated by PostgreSQL, this module increases the odds of malformed
> log entries.
> ====
>
> I'm not sure I understand the issue, but I don't like malformed log entries.

If logging_collector is disabled, then all the log entries would just go
to stderr (that's mentioned in the docs).  While that may be fine for
low volumes of logs, for many concurrent processes generating logs then
a process can overwrite another process log entry.  The logs generated by
the JSON format are longer in length, increased mainly by the repetitive
use of the key names in the blob, which increase in turn the volume
generated.

So in this case this can cause JSON blobs to look broken.  I had a
report on github not long ago about that:
https://github.com/michaelpq/pg_plugins/issues/17
--
Michael

Вложения

Re: Proposal: Adding json logging

От
Michael Paquier
Дата:
On Wed, Apr 18, 2018 at 12:10:47PM -0700, Christophe Pettus wrote:
> On Apr 18, 2018, at 11:59, Robert Haas <robertmhaas@gmail.com> wrote:
>> For the record, I'm tentatively in favor of including something like
>> this in contrib.
>
> I'm much less fussed by this in contrib/ (with the same concern you
> noted), at minimum as an example of how to do logging in other
> formats.

Using a contrib module for logging format has also a side effect. When
the logging collector is disabled, all the log entries which are created
by the postmaster have junk data as it is sort of impossible to make the
loaded module know that the logging collector is enabled in
configuration but that the log entries cannot use the pipe protocol
yet.  In short, you finish with a couple of entries which are formatted
for the pipe protocol used by the syslogger but are redirected to
stderr.  There are only a couple of entries which enter in this
category, like a misconfiguration of the server, or the ports the server
is listening to (look for redirection_done in elog.c).  One simple fix
would be to pass down the value of redirection_done to emit_log_hook,
and this requires patching the server.
--
Michael

Вложения