Обсуждение: Markdown format output for psql, design notes

Поиск
Список
Период
Сортировка

Markdown format output for psql, design notes

От
Lætitia Avrot
Дата:
Hi all,

# What I'd like to do
I've been working on the idea of a markdown format for psql as I had said in that thread : https://www.postgresql.org/message-id/flat/CAB_COdiiwTmBcrmjXWCKiqkcPgf_bLodrUyb4GYE6pfKeoK2eg%40mail.gmail.com

An attempt was made a year ago (see here : https://www.postgresql.org/message-id/flat/CAAYBy8bs%3D8vz6Ps_nLW24NJhqcxz4bsWBLawAiwWSPSLdWSmvA%40mail.gmail.com#eb7b6eb6daa60aac1f5fa001f934f89a), but didn't end up with something commitable.

What's more, I quite disagree with `\pset linestyle markdown` option to have a markdown output in psql, I prefer `\pset format markdown`.
# Some official doc about markdown So here are my thoughts (before writing any code) :
  1. "Official" markdown seems to be the daring fireball project (see Aaron Schwartz's note here http://www.aaronsw.com/weblog/001189)
  2. Official markdown doesn't support table formatting. Authors said we could just use HTML inside markdown to do so (it's quite not readable for a human, that's why I don't like this option) -> see here: https://daringfireball.net/projects/markdown/syntax#html
  3. Table markdown is introduced in "Markdown Extra" that was first implemented in PHP (see here https://michelf.ca/projects/php-markdown/extra/#table)
  4. I want to make the patch as simple as possible, so I won't implement cell alignment
# The result I want
From points 3 and 4, here is what I'd like to see :

| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| content | content | content |
| content | content | content |
(2 rows)

**'|' at beginning and end of line are optional in markdown extra, but it seems as a consensus to always add them. You may challenge this choice, I'm open to discussion.**

From the fireball project (https://daringfireball.net/projects/markdown/syntax#backslash) and markdown extra (https://michelf.ca/projects/php-markdown/extra/#backslash), it seems we need to backslash escape all of those characters:

~~~
\   backslash
`   backtick
*   asterisk
_   underscore
{}  curly braces
[]  square brackets
()  parentheses
#   hash mark
+   plus sign
-   minus sign (hyphen)
.   dot
!   exclamation mark
:   colon
|   pipe
~~~

# psql syntax to get that
It feels to me that we should use the `\pset format` (or `-P` or `--pset=` in batch mode) syntax to tell psql we want markdown. So any of that one should provide a markdown output :
  • `psql -P format=markdown`
  • `psql --pset=format=markdown`
  • `\pset format markdown` (in psql prompt command)
# Code to change
If I want to code that patch, here are the files I think I'll need to change :
  • Documentation
    • doc/src/sgml/rel/psql-ref.sgml
    • src/bin/psql/help.c
  • Tests
    • src/test/regress/expected/psql.out
    • src/test/regress/sql/psql.sql
  • Code
    • src/bin/psql/command.c
    • src/bin/psql/tab-complete.c
**You're welcome to add any other file that I missed in that list!**

# What I'd like you to do
First, thanks to have read that whole mail and sorry I didn't mean to make it so long...
Then I'd like to know **what you think about what I'm about to do** before heading in a wrong direction.

Have a nice day,

Lætitia
--
Think! Do you really need to print this email ?
There is no Planet B.

Re: Markdown format output for psql, design notes

От
Pavel Stehule
Дата:


st 28. 11. 2018 v 9:59 odesílatel Lætitia Avrot <laetitia.avrot@gmail.com> napsal:
Hi all,

# What I'd like to do
I've been working on the idea of a markdown format for psql as I had said in that thread : https://www.postgresql.org/message-id/flat/CAB_COdiiwTmBcrmjXWCKiqkcPgf_bLodrUyb4GYE6pfKeoK2eg%40mail.gmail.com

An attempt was made a year ago (see here : https://www.postgresql.org/message-id/flat/CAAYBy8bs%3D8vz6Ps_nLW24NJhqcxz4bsWBLawAiwWSPSLdWSmvA%40mail.gmail.com#eb7b6eb6daa60aac1f5fa001f934f89a), but didn't end up with something commitable.

What's more, I quite disagree with `\pset linestyle markdown` option to have a markdown output in psql, I prefer `\pset format markdown`.

sure +1


# Some official doc about markdown So here are my thoughts (before writing any code) :
  1. "Official" markdown seems to be the daring fireball project (see Aaron Schwartz's note here http://www.aaronsw.com/weblog/001189)
  2. Official markdown doesn't support table formatting. Authors said we could just use HTML inside markdown to do so (it's quite not readable for a human, that's why I don't like this option) -> see here: https://daringfireball.net/projects/markdown/syntax#html
  3. Table markdown is introduced in "Markdown Extra" that was first implemented in PHP (see here https://michelf.ca/projects/php-markdown/extra/#table)
  4. I want to make the patch as simple as possible, so I won't implement cell alignment
# The result I want
From points 3 and 4, here is what I'd like to see :

| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| content | content | content |
| content | content | content |
(2 rows)

+1


**'|' at beginning and end of line are optional in markdown extra, but it seems as a consensus to always add them. You may challenge this choice, I'm open to discussion.**

From the fireball project (https://daringfireball.net/projects/markdown/syntax#backslash) and markdown extra (https://michelf.ca/projects/php-markdown/extra/#backslash), it seems we need to backslash escape all of those characters:

~~~
\   backslash
`   backtick
*   asterisk
_   underscore
{}  curly braces
[]  square brackets
()  parentheses
#   hash mark
+   plus sign
-   minus sign (hyphen)
.   dot
!   exclamation mark
:   colon
|   pipe
~~~

# psql syntax to get that
It feels to me that we should use the `\pset format` (or `-P` or `--pset=` in batch mode) syntax to tell psql we want markdown. So any of that one should provide a markdown output :
  • `psql -P format=markdown`
  • `psql --pset=format=markdown`
  • `\pset format markdown` (in psql prompt command)
# Code to change
If I want to code that patch, here are the files I think I'll need to change :
  • Documentation
    • doc/src/sgml/rel/psql-ref.sgml
    • src/bin/psql/help.c
  • Tests
    • src/test/regress/expected/psql.out
    • src/test/regress/sql/psql.sql
  • Code
    • src/bin/psql/command.c
    • src/bin/psql/tab-complete.c
**You're welcome to add any other file that I missed in that list!**

# What I'd like you to do
First, thanks to have read that whole mail and sorry I didn't mean to make it so long...
Then I'd like to know **what you think about what I'm about to do** before heading in a wrong direction.

Have a nice day,

Lætitia
--
Think! Do you really need to print this email ?
There is no Planet B.

Re: Markdown format output for psql, design notes

От
"Daniel Verite"
Дата:
    Lætitia Avrot wrote:

> # The result I want
> From points 3 and 4, here is what I'd like to see :
>
> | Header 1 | Header 2 | Header 3 |
> |----------|----------|----------|
> | content | content | content |
> | content | content | content |
> (2 rows)

What would it look like when a field or a header is made of multiple
lines?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Markdown format output for psql, design notes

От
Lætitia Avrot
Дата:

Le mer. 28 nov. 2018 à 16:25, Daniel Verite <daniel@manitou-mail.org> a écrit :
        Lætitia Avrot wrote:

> # The result I want
> From points 3 and 4, here is what I'd like to see :
>
> | Header 1 | Header 2 | Header 3 |
> |----------|----------|----------|
> | content | content | content |
> | content | content | content |
> (2 rows)

What would it look like when a field or a header is made of multiple
lines?

I suppose you mean in the standard output when the screen is too short to print the whole line ?
Because if the output is redirected to a file (with `\o myfile` for example), the line end naturally when the row ends.

That's a good question. Markdown Extra doesn't provide any solution in that case. Each newline means a new row.

I'd say that in that case markdown syntax will be broken and the user has to redirect the output in a file to have a right markdown syntax.... That case could be explained in the documentation.

So if we try with an example we'd have something like that :

| Header | Header | Header | Header | Header | Header |  Header | Header | Header | Header | Reallyreallyreallyreally
TooLongHeader | Header |
| content | content | content | content | content | content | content | content | content | content | Reallyreallyreallyreally
TooLongContent |content |
| content | content | content | content | content | content | content | content | content | content | Reallyreallyreallyreally
TooLongContent |content |
(2 rows)

I couldn't find a way to make it right. If you have a better idea, please share it :-)

Cheers,

Lætitia
--
Think! Do you really need to print this email ?
There is no Planet B.

Re: Markdown format output for psql, design notes

От
Vik Fearing
Дата:
On 28/11/2018 09:59, Lætitia Avrot wrote:
> First, thanks to have read that whole mail and sorry I didn't mean to
> make it so long...
> Then I'd like to know ***what you think about what I'm about to do***
> before heading in a wrong direction.

I'm a little bit reluctant for us to write and maintain more and more
format styles, especially one as subjective and varied as markdown.  I
imagine we will constantly be bombarded with "this isn't quite right" or
"this isn't compatible with github".

What I personally use is the excellent pandoc tool (https://pandoc.org/)
which can convert formats we already output into a multitude of other
formats.

psql -qHc "values (E'hello\nworld', 42), ('single line', 5), ('another',
null)" | pandoc -f html -t markdown

  -----------------------
  column1         column2
  ------------- ---------
  hello\               42
  world

  single line           5

  another                
  -----------------------

(3 rows)\

This handles both column alignment and the multiline issue Daniel raised.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Markdown format output for psql, design notes

От
Lætitia Avrot
Дата:
I'm a little bit reluctant for us to write and maintain more and more
format styles, especially one as subjective and varied as markdown.  I
imagine we will constantly be bombarded with "this isn't quite right" or
"this isn't compatible with github".


I understand your concern. It's a pretty good point.
 
What I personally use is the excellent pandoc tool (https://pandoc.org/)
which can convert formats we already output into a multitude of other
formats.

psql -qHc "values (E'hello\nworld', 42), ('single line', 5), ('another',
null)" | pandoc -f html -t markdown

  -----------------------
  column1         column2
  ------------- ---------
  hello\               42
  world

  single line           5

  another                
  -----------------------

(3 rows)\

This handles both column alignment and the multiline issue Daniel raised.

Well, pandoc doesn't handle line breaks for Markdown Extra. 

~~~
psql -qHc "values (E'hello world', 42), ('single line', 5), ('another',
null)" log | pandoc -f html -t markdown_phpextra
| column1     | column2 |
|-------------|---------|
| hello world | 42      |
| single line | 5       |
| another     |         |
~~~
But with a `\n`, the output is simply html without transformation (even with option --wrap=none)...

# What stays in my mind

* It's pretty difficult to handle line breaks
* Markdown is not standardised and several flavours exist for table implementation (so why favor one over the others?)

# The question I'd like to ask you
So now, I think we need to ask that fundamental question :

**Is it worth it ?**

Cheers,

Lætitia
--
Think! Do you really need to print this email ?
There is no Planet B.

Re: Markdown format output for psql, design notes

От
"Daniel Verite"
Дата:
    Lætitia Avrot wrote:

> I suppose you mean in the standard output when the screen is too short to
> print the whole line ?
> Because if the output is redirected to a file (with `\o myfile` for
> example), the line end naturally when the row ends.

No I meant independently of the screen, if there's an LF character
in a cell. Or a '|' character, since that's the same problem: an
element of structure happening to be in the contents.
The specs mentioned upthread don't seem to give any indication
about that being supported.

Say we have:
 SELECT E'foo\nbar' as "Header1", 'foo|bar' as "Header2"

If the markdown output was produced for the sole purpose of being
converted to HTML in the end, which is often the case, it would work
to use HTML entities in the output, for instance:

Header1|Header2
---|---
foo<br>bar|foo|bar

This piece seems to be properly processed and rendered by markdown
processors I can try (pandoc, grip, github).

But then we'd also need to convert < > and & in the original contents
to the equivalent HTML entities, and that would really be
markdown-for-html instead of just markdown, I guess.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Markdown format output for psql, design notes

От
Vik Fearing
Дата:
On 29/11/2018 08:26, Lætitia Avrot wrote:

> # What stays in my mind
> 
> * It's pretty difficult to handle line breaks
> * Markdown is not standardised and several flavours exist for table
> implementation (so why favor one over the others?)
> 
> # The question I'd like to ask you
> So now, I think we need to ask that fundamental question :
> 
> ***Is it worth it ?***

And my answer to that is:

No.  Markdown isn't standardized enough to support and please everyone.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Markdown format output for psql, design notes

От
Lætitia Avrot
Дата:
Hi,

No I meant independently of the screen, if there's an LF character
in a cell. Or a '|' character, since that's the same problem: an
element of structure happening to be in the contents.
The specs mentioned upthread don't seem to give any indication
about that being supported.

i've given a list of characters that needs escaping as stated in the Markdown Extra doc and `|` is certainly one of this.

For LF caracter, I'm totally ok with the fact that it will break the markdown output and my answer to that is "KISS". I don't want to handle that case. Markdown Extra obviously decided that there was no such thing as a multiline row.
 
Say we have:
 SELECT E'foo\nbar' as "Header1", 'foo|bar' as "Header2"

If the markdown output was produced for the sole purpose of being
converted to HTML in the end, which is often the case, it would work
to use HTML entities in the output

I don't use Markdown to create a HTML output. I use it to generate pdf for my customers.

But as Vik said earlier, maybe it's not worth it to provide a markdown output as pandoc can generate the markdown from the HTML output.
And if you need the markdown output to generate HTML why don't you use the HTML output ?

Cheers,

Lætitia
--
Think! Do you really need to print this email ?
There is no Planet B.

Re: Markdown format output for psql, design notes

От
"Daniel Verite"
Дата:
    Lætitia Avrot wrote:

> But as Vik said earlier, maybe it's not worth it to provide a markdown
> output as pandoc can generate the markdown from the HTML output.
> And if you need the markdown output to generate HTML why don't you use the
> HTML output ?

The round-trip through pandoc does not do any miracle.
The end result is readable to the human eye but structurally
broken. If converted back to html, it's no longer a table.

Anyway I tend to agree with Vik on this:
 "Markdown isn't standardized enough to support and please everyone."

BTW github has independently started to support '|' in the cells
by accepting the quoted version '\|' :
https://help.github.com/articles/organizing-information-with-tables/

Now that we have csv as an output format, we can suggest
custom csv-to-markdown converters to produce markdown
rather than implementing one particular flavor of markdown
in psql, or several flavors through flags. The popular script
languages have solid CSV parsers that make this relatively easy
and safe.

Personally I'd use Perl with something like below, which looks
short/simple enough to be shared on wiki.postgresql.org,
along with versions in other languages.

#!/usr/bin/perl

# Usage
# inside psql:
#  \pset format csv
#  \o |csvtomarkdown >/tmp/output.md
#  SQL commands...
#  \o

# or psql --csv -c "...query..." | csvtomarkdown

use Text::CSV;
use open qw( :std :encoding(UTF-8) );

my $csv = Text::CSV->new({ binary => 1, eol => $/ });

sub do_format {
# customize to your needs
  s/&/&/g;
  s/</</g;
  s/>/>/g;
  s/\n/<br>/g;
  s/\|/|/g;
  return $_;
}

my $header = $csv->getline(STDIN);
for (@{$header}) {
  $_ = do_format($_);
}
print join ('|', @{$header}), "\n";
print join ('|', map { "---" } @{$header}), "\n";

while (my $row = $csv->getline(STDIN)) {
  my @contents = map { do_format($_) } @{$row};
  print join('|', @contents), "\n";
}


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Markdown format output for psql, design notes

От
Pavel Stehule
Дата:


so 1. 12. 2018 v 22:11 odesílatel Daniel Verite <daniel@manitou-mail.org> napsal:
        Lætitia Avrot wrote:

> But as Vik said earlier, maybe it's not worth it to provide a markdown
> output as pandoc can generate the markdown from the HTML output.
> And if you need the markdown output to generate HTML why don't you use the
> HTML output ?

The round-trip through pandoc does not do any miracle.
The end result is readable to the human eye but structurally
broken. If converted back to html, it's no longer a table.

Anyway I tend to agree with Vik on this:
 "Markdown isn't standardized enough to support and please everyone."

BTW github has independently started to support '|' in the cells
by accepting the quoted version '\|' :
https://help.github.com/articles/organizing-information-with-tables/

Now that we have csv as an output format, we can suggest
custom csv-to-markdown converters to produce markdown
rather than implementing one particular flavor of markdown
in psql, or several flavors through flags. The popular script
languages have solid CSV parsers that make this relatively easy
and safe.

I agree with you about importance of CSV. On second hand, I don't see a reason why we should not to support some very popular markdown formats - although there can be a discussion - which

maybe github and JIRA, CONFLUENCE

Regards

Pavel


Personally I'd use Perl with something like below, which looks
short/simple enough to be shared on wiki.postgresql.org,
along with versions in other languages.

#!/usr/bin/perl

# Usage
# inside psql:
#  \pset format csv
#  \o |csvtomarkdown >/tmp/output.md
#  SQL commands...
#  \o

# or psql --csv -c "...query..." | csvtomarkdown

use Text::CSV;
use open qw( :std :encoding(UTF-8) );

my $csv = Text::CSV->new({ binary => 1, eol => $/ });

sub do_format {
# customize to your needs
  s/&/&amp;/g;
  s/</&lt;/g;
  s/>/&gt;/g;
  s/\n/<br>/g;
  s/\|/&#x7C;/g;
  return $_;
}

my $header = $csv->getline(STDIN);
for (@{$header}) {
  $_ = do_format($_);
}
print join ('|', @{$header}), "\n";
print join ('|', map { "---" } @{$header}), "\n";

while (my $row = $csv->getline(STDIN)) {
  my @contents = map { do_format($_) } @{$row};
  print join('|', @contents), "\n";
}


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Markdown format output for psql, design notes

От
Lætitia Avrot
Дата:
If I sum it up,  we're at 2 against trying to write such a patch and one for (with some modifications about which markdown format to implement).

Anyone else wants to join the vote? 

Cheers, 

Lætitia 

Le dim. 2 déc. 2018 à 05:11, Pavel Stehule <pavel.stehule@gmail.com> a écrit :


so 1. 12. 2018 v 22:11 odesílatel Daniel Verite <daniel@manitou-mail.org> napsal:
        Lætitia Avrot wrote:

> But as Vik said earlier, maybe it's not worth it to provide a markdown
> output as pandoc can generate the markdown from the HTML output.
> And if you need the markdown output to generate HTML why don't you use the
> HTML output ?

The round-trip through pandoc does not do any miracle.
The end result is readable to the human eye but structurally
broken. If converted back to html, it's no longer a table.

Anyway I tend to agree with Vik on this:
 "Markdown isn't standardized enough to support and please everyone."

BTW github has independently started to support '|' in the cells
by accepting the quoted version '\|' :
https://help.github.com/articles/organizing-information-with-tables/

Now that we have csv as an output format, we can suggest
custom csv-to-markdown converters to produce markdown
rather than implementing one particular flavor of markdown
in psql, or several flavors through flags. The popular script
languages have solid CSV parsers that make this relatively easy
and safe.

I agree with you about importance of CSV. On second hand, I don't see a reason why we should not to support some very popular markdown formats - although there can be a discussion - which

maybe github and JIRA, CONFLUENCE

Regards

Pavel


Personally I'd use Perl with something like below, which looks
short/simple enough to be shared on wiki.postgresql.org,
along with versions in other languages.

#!/usr/bin/perl

# Usage
# inside psql:
#  \pset format csv
#  \o |csvtomarkdown >/tmp/output.md
#  SQL commands...
#  \o

# or psql --csv -c "...query..." | csvtomarkdown

use Text::CSV;
use open qw( :std :encoding(UTF-8) );

my $csv = Text::CSV->new({ binary => 1, eol => $/ });

sub do_format {
# customize to your needs
  s/&/&amp;/g;
  s/</&lt;/g;
  s/>/&gt;/g;
  s/\n/<br>/g;
  s/\|/&#x7C;/g;
  return $_;
}

my $header = $csv->getline(STDIN);
for (@{$header}) {
  $_ = do_format($_);
}
print join ('|', @{$header}), "\n";
print join ('|', map { "---" } @{$header}), "\n";

while (my $row = $csv->getline(STDIN)) {
  my @contents = map { do_format($_) } @{$row};
  print join('|', @contents), "\n";
}


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite