Обсуждение: psql and tab-delimited output

Поиск
Список
Период
Сортировка

psql and tab-delimited output

От
Abelard Hoffman
Дата:
Hi.

Traditionally, to generate a TSV report, I've simply invoked psql with:
--no-align --field-separator '\t' --pset footer=off

That works in most cases, except when your column values contain tabs themselves.

I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.

So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?


I also considered handling the escaping myself within the SELECT, and then sticking with the first approach above.

Suggestions?

Thanks.


Re: psql and tab-delimited output

От
Thomas Kellerer
Дата:
Abelard Hoffman wrote on 06.09.2014 09:32:
> Traditionally, to generate a TSV report, I've simply invoked psql
> with: --no-align --field-separator '\t' --pset footer=off
>
> That works in most cases, except when your column values contain tabs
> themselves.
>
> I know that COPY() will escape tabs (as \t), and we can use that from
> psql with the \copy command, but that does not include a header row
> of the column names.
>
> So, my question is, what's the simplest way to generate tab-escaped
> TSV-formatted reports with the first line containing the list of
> column names?

Do you have to do that using psql or can you use a different tool?

The tool I maintain - SQL Workbench/J - has a command to export tables (or queries) to a wide range of export formats.

WbExport can escape of special characters by using \t, \n etc notation for "control" characters

   http://sql-workbench.net/manual/command-export.html

especially

   http://sql-workbench.net/manual/command-export.html#text-escape-switch

This can all be done from a shell/batch script if needed.

The tool needs a Java runtime and the Postgres JDBC driver.

Thomas

Re: psql and tab-delimited output

От
Adrian Klaver
Дата:
On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
> Hi.
>
> Traditionally, to generate a TSV report, I've simply invoked psql with:
> --no-align --field-separator '\t' --pset footer=off
>
> That works in most cases, except when your column values contain tabs
> themselves.
>
> I know that COPY() will escape tabs (as \t), and we can use that from
> psql with the \copy command, but that does not include a header row of
> the column names.
>
> So, my question is, what's the simplest way to generate tab-escaped
> TSV-formatted reports with the first line containing the list of column
> names?
>


create table tsv_test (id int, fld_1 varchar);

insert into tsv_test values (1, 'test    value');
insert into tsv_test values (2, 'test    value');
insert into tsv_test values (3, 'test    value');

\copy tsv_test to 'data.tsv'  with  csv header delimiter '       ';

aklaver@panda:~> cat data.tsv
id      fld_1
1       "test   value"
2       "test   value"
3       "test   value"



>
> I also considered handling the escaping myself within the SELECT, and
> then sticking with the first approach above.
>
> Suggestions?
>
> Thanks.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: psql and tab-delimited output

От
Abelard Hoffman
Дата:
On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
[snip] 
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?



create table tsv_test (id int, fld_1 varchar);

insert into tsv_test values (1, 'test    value');
insert into tsv_test values (2, 'test    value');
insert into tsv_test values (3, 'test    value');

\copy tsv_test to 'data.tsv'  with  csv header delimiter '       ';

aklaver@panda:~> cat data.tsv
id      fld_1
1       "test   value"
2       "test   value"
3       "test   value"

Thanks, Adrian. That works, but since we're using quotes to embed the delimiter, we lose the simplicity of TSV. I can't just do a split on /\t/ to get the fields and then unescape the values. At that point it's probably simpler to just switch to standard CSV.

Using your example, the output I'd prefer is:

id    fld_1
1     test\tvalue
2     test\tvalue
3     test\tvalue

I looked at the options for COPY's CSV format, but I don't see a way to disable quoting but still have escaping.

This works, although it's not exactly simple:

DROP TABLE IF EXISTS tsv_test;

CREATE TABLE tsv_test (id int, fld_1 varchar);

INSERT INTO tsv_test VALUES (1, 'test value');
INSERT INTO tsv_test VALUES (2, 'test value');
INSERT INTO tsv_test VALUES (3, 'test value');

SELECT * FROM tsv_test WHERE FALSE; -- to generate header row
COPY tsv_test TO STDOUT;

And then run that through psql with the --no-align --field-separator '\t' --pset footer=off options.
With that, I'd probably generate the report into a temp table, and then run the above to actually export that table as TSV.

@Thomas, yes, I was hoping to stick with just psql, but I'll look at other tools if necessary.

Any other ideas?

Thanks.




Re: psql and tab-delimited output

От
Adrian Klaver
Дата:
On 09/06/2014 10:34 AM, Abelard Hoffman wrote:
> On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
>
> [snip]
>
>         So, my question is, what's the simplest way to generate tab-escaped
>         TSV-formatted reports with the first line containing the list of
>         column
>         names?
>
>
>
>     create table tsv_test (id int, fld_1 varchar);
>
>     insert into tsv_test values (1, 'test    value');
>     insert into tsv_test values (2, 'test    value');
>     insert into tsv_test values (3, 'test    value');
>
>     \copy tsv_test to 'data.tsv'  with  csv header delimiter '       ';
>
>     aklaver@panda:~> cat data.tsv
>     id      fld_1
>     1       "test   value"
>     2       "test   value"
>     3       "test   value"
>
>
> Thanks, Adrian. That works, but since we're using quotes to embed the
> delimiter, we lose the simplicity of TSV. I can't just do a split on
> /\t/ to get the fields and then unescape the values. At that point it's
> probably simpler to just switch to standard CSV.


>
> Using your example, the output I'd prefer is:
>
> id    fld_1
> 1     test\tvalue
> 2     test\tvalue
> 3     test\tvalue

I guess it depends on what you are using.

In Python:

In [30]: with open('data.tsv', 'rb') as c_file:
     c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"')
     for row in c_reader:
         print row
    ....:
['id', 'fld_1']
['1', 'test\tvalue']
['2', 'test\tvalue']
['3', 'test\tvalue']


In [33]: with open('data.tsv', 'rb') as c_file:
     c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"')
     for row in c_reader:
         print row[1]
    ....:
fld_1
test    value
test    value
test    value

The Postgres docs have a good note on the CSV format:

http://www.postgresql.org/docs/9.3/interactive/sql-copy.html

"Note: Many programs produce strange and occasionally perverse CSV
files, so the file format is more a convention than a standard. Thus you
might encounter some files that cannot be imported using this mechanism,
and COPY might produce files that other programs cannot process."


So it always an adventure:)

>
> I looked at the options for COPY's CSV format, but I don't see a way to
> disable quoting but still have escaping.
>
> This works, although it's not exactly simple:
>
> DROP TABLE IF EXISTS tsv_test;
>
> CREATE TABLE tsv_test (id int, fld_1 varchar);
>
> INSERT INTO tsv_test VALUES (1, 'test value');
> INSERT INTO tsv_test VALUES (2, 'test value');
> INSERT INTO tsv_test VALUES (3, 'test value');
>
> SELECT * FROM tsv_test WHERE FALSE; -- to generate header row
> COPY tsv_test TO STDOUT;
>
> And then run that through psql with the --no-align --field-separator
> '\t' --pset footer=off options.
> With that, I'd probably generate the report into a temp table, and then
> run the above to actually export that table as TSV.
>
> @Thomas, yes, I was hoping to stick with just psql, but I'll look at
> other tools if necessary.
>
> Any other ideas?
>
> Thanks.
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: psql and tab-delimited output

От
Abelard Hoffman
Дата:



On Sat, Sep 6, 2014 at 11:43 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 09/06/2014 10:34 AM, Abelard Hoffman wrote:
On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

    On 09/06/2014 12:32 AM, Abelard Hoffman wrote:

[snip]

        So, my question is, what's the simplest way to generate tab-escaped
        TSV-formatted reports with the first line containing the list of
        column
        names?



    create table tsv_test (id int, fld_1 varchar);

    insert into tsv_test values (1, 'test    value');
    insert into tsv_test values (2, 'test    value');
    insert into tsv_test values (3, 'test    value');

    \copy tsv_test to 'data.tsv'  with  csv header delimiter '       ';

    aklaver@panda:~> cat data.tsv
    id      fld_1
    1       "test   value"
    2       "test   value"
    3       "test   value"


Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on
/\t/ to get the fields and then unescape the values. At that point it's
probably simpler to just switch to standard CSV.

Using your example, the output I'd prefer is:

id    fld_1
1     test\tvalue
2     test\tvalue
3     test\tvalue

I guess it depends on what you are using.

In Python:
[snip]

Yeah, I can parse CSV easily enough. My boss wants TSV though (I could parse CSV and split out TSV, of course). Even then, having to take the report query (which can be big), strip all the newlines and insert it into a \copy command is kind of a PITA.

I wrote this function, which does what I want. I can call this on each column that might include tabs and then safely generate TSV using just psql:

CREATE OR REPLACE FUNCTION public.tsv_escape(text) RETURNS text AS $$
  my %replace = (
    "\t" => 't',
    "\r" => 'r',
    "\n" => 'n',
    '\\' => '\\',
  );

  $_[0] =~ s/(.)/exists $replace{ $1 } ? "\\$replace{$1}" : $1/gse;
  return $_[0];
$$ LANGUAGE 'plperl';

Anyone see a way to rewrite that as an 'sql' function rather than plperl?

I also took a look at the psql source. It doesn't look like it would be very hard to add some backslash escaping logic as an option. Am I the only one that would use that? For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.

Thanks.

Re: psql and tab-delimited output

От
Alban Hertroys
Дата:
On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com> wrote:

> For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.

For reporting, usually you use the data in the database directly.

A TSV or CSV file is not a report, it’s at best a data source for your report. Going through an intermediate format is
nota particularly effective approach to create reports, but if you have to (for example because you aren’t _allowed_
accessto the database), generally preferred formats seem to be CSV, XML or JSON; as long as it’s a standard format. 
TSV is not a common choice. Are you sure your boss actually cares that it’s TSV and not, for example, CSV?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: psql and tab-delimited output

От
David G Johnston
Дата:
Alban Hertroys-4 wrote
> On 07 Sep 2014, at 10:45, Abelard Hoffman <

> abelardhoffman@

> > wrote:
>
>> For reports, everyone else mostly uses other tools? I'd like to stay away
>> from GUI-tools, if possible.
>
> For reporting, usually you use the data in the database directly.
>
> A TSV or CSV file is not a report, it’s at best a data source for your
> report. Going through an intermediate format is not a particularly
> effective approach to create reports, but if you have to (for example
> because you aren’t _allowed_ access to the database), generally preferred
> formats seem to be CSV, XML or JSON; as long as it’s a standard format.
> TSV is not a common choice. Are you sure your boss actually cares that
> it’s TSV and not, for example, CSV?

TSV is generally chosen as an interchange format because data rarely has
tabs...if your does that advantage goes away.

Same question regarding the report aspect.  If the boss wants a human
readable text format report I'd suggest using the format function and output
the the resultant single column as-is to the screen.  Fixed-width columns
are better than any kind of delimited format in that use case.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/psql-and-tab-delimited-output-tp5818019p5818097.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: psql and tab-delimited output

От
Adrian Klaver
Дата:
On 09/07/2014 01:45 AM, Abelard Hoffman wrote:
>
>
>
> On Sat, Sep 6, 2014 at 11:43 AM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 09/06/2014 10:34 AM, Abelard Hoffman wrote:
>
>         On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver
>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.__com
>         <mailto:adrian.klaver@aklaver.com>>> wrote:
>
>              On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
>
>         [snip]
>
>                  So, my question is, what's the simplest way to generate
>         tab-escaped
>                  TSV-formatted reports with the first line containing
>         the list of
>                  column
>                  names?
>
>
>
>              create table tsv_test (id int, fld_1 varchar);
>
>              insert into tsv_test values (1, 'test    value');
>              insert into tsv_test values (2, 'test    value');
>              insert into tsv_test values (3, 'test    value');
>
>              \copy tsv_test to 'data.tsv'  with  csv header delimiter '
>               ';
>
>              aklaver@panda:~> cat data.tsv
>              id      fld_1
>              1       "test   value"
>              2       "test   value"
>              3       "test   value"
>
>
>         Thanks, Adrian. That works, but since we're using quotes to
>         embed the
>         delimiter, we lose the simplicity of TSV. I can't just do a split on
>         /\t/ to get the fields and then unescape the values. At that
>         point it's
>         probably simpler to just switch to standard CSV.
>
>
>         Using your example, the output I'd prefer is:
>
>         id    fld_1
>         1     test\tvalue
>         2     test\tvalue
>         3     test\tvalue
>
>
>     I guess it depends on what you are using.
>
>     In Python:
>
> [snip]
>
> Yeah, I can parse CSV easily enough. My boss wants TSV though (I could
> parse CSV and split out TSV, of course). Even then, having to take the
> report query (which can be big), strip all the newlines and insert it
> into a \copy command is kind of a PITA.

I see CSV as a generic term that covers all separated value formats, so
TSV is just a variation. I am not sure exactly what the end point of all
this, so I am probably not going to be able to offer much more. One
heads up, in 9.3+ you have the option in \copy(COPY) to use an external
program to copy TO or FROM

>
> I also took a look at the psql source. It doesn't look like it would be
> very hard to add some backslash escaping logic as an option. Am I the
> only one that would use that? For reports, everyone else mostly uses
> other tools? I'd like to stay away from GUI-tools, if possible.

Not sure what you are asking for here. Something like this?:

http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html

4.1.2.2. String Constants with C-style Escapes

test=> select E'a\tb';
  ?column?
-----------
  a       b
(1 row)


>
> Thanks.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: psql and tab-delimited output

От
Jeff Janes
Дата:
On Sat, Sep 6, 2014 at 12:32 AM, Abelard Hoffman <abelardhoffman@gmail.com> wrote:
Hi.

Traditionally, to generate a TSV report, I've simply invoked psql with:
--no-align --field-separator '\t' --pset footer=off

That works in most cases, except when your column values contain tabs themselves.

I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.

Which is a shame.  \copy really should allow HEADER in the default format, not just CSV format.

And it on the to-do list, just hasn't be done yet:


It seems like it should be fairly easy to do.
 

So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?

I just assume that none of the column names need escaping, and so select and join them on tabs.  At one point I had a perl script that would do this for me, e.g. given a query, it would execute it once with a 'and 1=0' at the end (obviously can't be done legally/efficiently/safely with all queries) to get the column names, then again in a \COPY to get the data, but I seem to have misplaced it.    

It worked well as long as you understood it was a dirty hack and so had the limitations of one.


Cheers,

Jeff

Re: psql and tab-delimited output

От
Melvin Davidson
Дата:
>the output I'd prefer is:
> id fld_1
> 1 test\tvalue
> 2 test\tvalue
> 3 >test\tvalue


Does this work for you?

copy (SELECT id, replace(fld_1, '    ', '\t') FROM tsv_test) to stdout  with csv header delimiter '    ';

query returned copy data:
id    replace
1    test\tvalue
2    test\tvalue
3    test\tvalue

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: psql and tab-delimited output

От
Abelard Hoffman
Дата:


On Sun, Sep 7, 2014 at 12:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Sep 6, 2014 at 12:32 AM, Abelard Hoffman <abelardhoffman@gmail.com> wrote:
[snip] 
I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.

Which is a shame.  \copy really should allow HEADER in the default format, not just CSV format.

And it on the to-do list, just hasn't be done yet:


It seems like it should be fairly easy to do.

Ah, excellent. That would solve most of my issue.
 
So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?

I just assume that none of the column names need escaping, and so select and join them on tabs.  At one point I had a perl script that would do this for me, e.g. given a query, it would execute it once with a 'and 1=0' at the end (obviously can't be done legally/efficiently/safely with all queries) to get the column names, then again in a \COPY to get the data, but I seem to have misplaced it.    

It worked well as long as you understood it was a dirty hack and so had the limitations of one.

Yes, that's exactly what I need, although I'm loathe to run the queries twice just to get the headers.

Thanks to everyone for their comments and suggestions.

As additional background, I have a dozen or so "reports" that exist as plain text files, and they're just run by cron with psql and mailed to those who are interested in them. Historically, all of them have been TSV. It's worked fine since all of the data has been computed (e.g., sales counts, etc.). But with a recent report, we're including a customer feedback comment, which obviously can have tabs and newlines, etc.

On Sun, Sep 7, 2014 at 9:25 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
I also took a look at the psql source. It doesn't look like it would be
very hard to add some backslash escaping logic as an option. Am I the
only one that would use that? For reports, everyone else mostly uses
other tools? I'd like to stay away from GUI-tools, if possible.

Not sure what you are asking for here. Something like this?:
http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html

I was meaning, it seems like it would be useful to have psql be able to behave like COPY but without having to call \copy.
So IOW, something like:

  psql  --no-align --field-separator '\t' --field-escape 

Where the --field-escape option would cause the data to be escaped exactly like COPY does. Having the HEADER option to COPY's default format will largely solve this though. Then it's just a matter of slurping in the report files, making sure all newlines are removed, and passing them to the appropriate \copy command.

Re: psql and tab-delimited output

От
Abelard Hoffman
Дата:
Hi Alban.

On Sun, Sep 7, 2014 at 4:18 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com> wrote:

> For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.

For reporting, usually you use the data in the database directly.

A TSV or CSV file is not a report, it’s at best a data source for your report. Going through an intermediate format is not a particularly effective approach to create reports, but if you have to (for example because you aren’t _allowed_ access to the database), generally preferred formats seem to be CSV, XML or JSON; as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares that it’s TSV and not, for example, CSV?

Could you expand on that a bit? What sort of tools does management use to generate reports from the database directly?
You're meaning a database warehouse? We just have an OLTP db, so we've always generated reports periodically through cron jobs.

Or maybe "reports" is the wrong word. We generate a bunch of db stats which can then be used however they want (pulled into Excel, etc.). 
But would definitely be interested in learning about other approaches.

And yes, I'm sure we could convert everything over to CSV. Just an issue of inertia.

Thanks.

Re: psql and tab-delimited output

От
Abelard Hoffman
Дата:
On Sun, Sep 7, 2014 at 5:17 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>the output I'd prefer is:
> id fld_1
> 1 test\tvalue
> 2 test\tvalue
> 3 >test\tvalue


Does this work for you?

copy (SELECT id, replace(fld_1, '    ', '\t') FROM tsv_test) to stdout  with csv header delimiter '    ';

Thank you, Melvin. Yes, I think that's what I'll do for the time-being, until COPY's TSV format supports HEADER.


Re: psql and tab-delimited output

От
Adrian Klaver
Дата:
On 09/08/2014 11:45 AM, Abelard Hoffman wrote:
> Hi Alban.
>
> On Sun, Sep 7, 2014 at 4:18 AM, Alban Hertroys <haramrae@gmail.com
> <mailto:haramrae@gmail.com>> wrote:
>
>     On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com
>     <mailto:abelardhoffman@gmail.com>> wrote:
>
>     > For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.
>
>     For reporting, usually you use the data in the database directly.
>
>     A TSV or CSV file is not a report, it’s at best a data source for
>     your report. Going through an intermediate format is not a
>     particularly effective approach to create reports, but if you have
>     to (for example because you aren’t _allowed_ access to the
>     database), generally preferred formats seem to be CSV, XML or JSON;
>     as long as it’s a standard format.
>     TSV is not a common choice. Are you sure your boss actually cares
>     that it’s TSV and not, for example, CSV?
>
>
> Could you expand on that a bit? What sort of tools does management use
> to generate reports from the database directly?
> You're meaning a database warehouse? We just have an OLTP db, so we've
> always generated reports periodically through cron jobs.
>
> Or maybe "reports" is the wrong word. We generate a bunch of db stats
> which can then be used however they want (pulled into Excel, etc.).
> But would definitely be interested in learning about other approaches.

Ways I have done it:

1) In Excel use the data tools to run the query and return data directly
to the spreadsheet. This assumes a version of Excel that has the data tools.

2) I use Python, so use psycopg2 to pull data from the database and then
xlwt to write out a spreadsheet or reportlab to create a pdf or the csv
module to create whatever flavor of CSV.

>
> And yes, I'm sure we could convert everything over to CSV. Just an issue
> of inertia.
>
> Thanks.
>


--
Adrian Klaver
adrian.klaver@aklaver.com