Обсуждение: Proposal: new pg_dump options --copy-delimiter and --copy-null

Поиск
Список
Период
Сортировка

Proposal: new pg_dump options --copy-delimiter and --copy-null

От
David Fetter
Дата:
Folks,

This came up at work...

I have seed database scripts quasi-generated from pg_dump which
include COPY statements, but the data is hard to edit (especially cut
& paste operations) when the COPY delimiter is some non-visible
character like \t.  So I thought it would be handy to be able to
control the DELIMITER and NULL options in COPY statements that pg_dump
uses.

Although it would be nice to make CSV and its dependencies one of the
options, I'm not sure how pg_dump would handle the end-of-line
problem, so I've skipped that part in the patch I've put together.
The other option, FORCE QUOTE, doesn't make sense to me as a pg_dump
option, but I'm not the arbiter of these things.

With the patch, pg_dump would work exactly as usual without options,
but it now has two extra options: --copy-delimiter and --copy-null.

If set, these will be incorporated in COPY commands as appropriate.
--copy-delimiter accepts any single byte other than '\r' or '\n'.
--copy-null accepts any input.  The patched pg_dump ignores --copy-*
options in cases where COPY wouldn't happen anyway.

What do you folks think?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!


Re: Proposal: new pg_dump options --copy-delimiter and --copy-null

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> I have seed database scripts quasi-generated from pg_dump which
> include COPY statements, but the data is hard to edit (especially cut
> & paste operations) when the COPY delimiter is some non-visible
> character like \t.

This seems like an awfully weak use-case for adding to pg_dump's already
overly complicated feature set.  The difficulty of parsing COPY output
is not simplified by making the delimiter variable --- more likely the
reverse.  Furthermore, it's quite unclear why you'd use pg_dump at all
to generate a data file that you intend to feed to some other program.
Seems to me that "psql -c 'COPY ...'" is a more likely front-end for
such a process.
        regards, tom lane


Re: Proposal: new pg_dump options --copy-delimiter and --copy-null

От
David Fetter
Дата:
On Thu, Jan 26, 2006 at 10:17:05PM -0500, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > I have seed database scripts quasi-generated from pg_dump which
> > include COPY statements, but the data is hard to edit (especially
> > cut & paste operations) when the COPY delimiter is some
> > non-visible character like \t.
>
> This seems like an awfully weak use-case for adding to pg_dump's
> already overly complicated feature set.

Those who don't use it will never see it.

> The difficulty of parsing COPY output is not simplified by making
> the delimiter variable --- more likely the reverse.

It's fairly straight-forward.

> Furthermore, it's quite unclear why you'd use pg_dump at all to
> generate a data file that you intend to feed to some other program.

In my case, it's about being copy/paste friendly.

> Seems to me that "psql -c 'COPY ...'" is a more likely front-end for
> such a process.

Actually, it's not.  I'm attaching my preliminary patch, as I see I
haven't explained it well enough.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

Вложения

Re: Proposal: new pg_dump options --copy-delimiter and --copy-null

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> On Thu, Jan 26, 2006 at 10:17:05PM -0500, Tom Lane wrote:
>> Seems to me that "psql -c 'COPY ...'" is a more likely front-end for
>> such a process.

> Actually, it's not.  I'm attaching my preliminary patch, as I see I
> haven't explained it well enough.

The patch conveys nothing you didn't explain already, and I still don't
see why one would use pg_dump instead of psql.  If you use pg_dump then
you have to cope with a pile of random SQL and comments as well as the
actual data.
        regards, tom lane


Re: Proposal: new pg_dump options --copy-delimiter and

От
"Joshua D. Drake"
Дата:
> Those who don't use it will never see it.
>   
It does however add more maintenance to the code.


>> Furthermore, it's quite unclear why you'd use pg_dump at all to
>> generate a data file that you intend to feed to some other program.
>>     
>
> In my case, it's about being copy/paste friendly.
>   
David I don't get this... what are you copying from/to that would
wouldn't just script? If you throw into a script you can change
the delimiter on the fly using translation.

> Actually, it's not.  I'm attaching my preliminary patch, as I see I
> haven't explained it well enough.
>   
Perhaps a test case that shows the productivity of it? I am not
arguing whether or not this is a useful feature but I don't
see the purpose.

Sincerely,

Joshua D. Drake


> Cheers,
> D
>   

-- 
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/



Re: Proposal: new pg_dump options --copy-delimiter and

От
Greg Stark
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> David I don't get this... what are you copying from/to that would
> wouldn't just script? If you throw into a script you can change
> the delimiter on the fly using translation.

I think what he's getting at is for things like, say, a contrib package with a
README that includes an example. He wants to be able to say "just paste these
commands into psql".

The problems are a) there's no guarantee the data is safe to put through your
hypothetical tab-destroying copy/paste anyways. There could be tabs or other
unsafe characters in the data. b) We have no way of knowing which characters
are or aren't safe in your hypothetical copy/paste system. Why is tab unsafe
in the first place?

Personally I find anything that would encourage people to use anything other
than tabs evil anyways. All those people who think | is somehow a reasonable
choice or want to use commas and then get all confused trying to escape them
and invent ever more confused syntaxes for escaping the escape characters.
Just use tab separated data like man was meant to.

-- 
greg



Re: Proposal: new pg_dump options --copy-delimiter and

От
Andrew Dunstan
Дата:
On Fri, 2006-01-27 at 13:12 -0500, Greg Stark wrote:
> 
> Personally I find anything that would encourage people to use anything other
> than tabs evil anyways. All those people who think | is somehow a reasonable
> choice or want to use commas and then get all confused trying to escape them
> and invent ever more confused syntaxes for escaping the escape characters.
> Just use tab separated data like man was meant to.
> 

I could not disagree more. The invisibility of tabs makes their use as a
delimiter wholly evil. I have lost count of the number of corrupted
makefiles etc. I have seen because a tab got converted to a space and it
was impossible to tell.

More tears have been shed over tabs used than tabs unused. (Apologies to
St Theresa).

We now return you to normal -hacking.

cheers

andrew



Re: Proposal: new pg_dump options --copy-delimiter and

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I could not disagree more. The invisibility of tabs makes their use as a
> delimiter wholly evil. I have lost count of the number of corrupted
> makefiles etc. I have seen because a tab got converted to a space and it
> was impossible to tell.

> More tears have been shed over tabs used than tabs unused. (Apologies to
> St Theresa).

That line of argument leads to the suggestion that pg_dump should just
use something else (I'd vote for "|"), all the time, in order to produce
more robust dump files.  I still don't see the argument for exposing
a switch though.
        regards, tom lane


Re: Proposal: new pg_dump options --copy-delimiter and

От
Andrew Dunstan
Дата:
On Fri, 2006-01-27 at 13:43 -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > I could not disagree more. The invisibility of tabs makes their use as a
> > delimiter wholly evil. I have lost count of the number of corrupted
> > makefiles etc. I have seen because a tab got converted to a space and it
> > was impossible to tell.
> 
> > More tears have been shed over tabs used than tabs unused. (Apologies to
> > St Theresa).
> 
> That line of argument leads to the suggestion that pg_dump should just
> use something else (I'd vote for "|"), all the time, in order to produce
> more robust dump files.  I still don't see the argument for exposing
> a switch though.
> 


If we regard them as suitable for human editing for normal use, yes.

cheers

andrew



Re: Proposal: new pg_dump options --copy-delimiter and

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On Fri, 2006-01-27 at 13:43 -0500, Tom Lane wrote:
>> That line of argument leads to the suggestion that pg_dump should just
>> use something else (I'd vote for "|"), all the time, in order to produce
>> more robust dump files.  I still don't see the argument for exposing
>> a switch though.

> If we regard them as suitable for human editing for normal use, yes.

No, that actually was no part of my point.  A pg_dump file that doesn't
use tabs is more likely to survive being emailed, for instance.  I'm not
sure whether that is a large enough consideration to justify a change,
but you don't have to assume that anyone's intending to edit anything
to make the argument for it.
        regards, tom lane


Re: Proposal: new pg_dump options --copy-delimiter and

От
Andrew Dunstan
Дата:
On Fri, 2006-01-27 at 14:08 -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > On Fri, 2006-01-27 at 13:43 -0500, Tom Lane wrote:
> >> That line of argument leads to the suggestion that pg_dump should just
> >> use something else (I'd vote for "|"), all the time, in order to produce
> >> more robust dump files.  I still don't see the argument for exposing
> >> a switch though.
> 
> > If we regard them as suitable for human editing for normal use, yes.
> 
> No, that actually was no part of my point.  A pg_dump file that doesn't
> use tabs is more likely to survive being emailed, for instance.  I'm not
> sure whether that is a large enough consideration to justify a change,
> but you don't have to assume that anyone's intending to edit anything
> to make the argument for it.
> 


Not sure how much that matters in these days of MIME, but it's a fair
point nevertheless.

cheers

andrew



Re: Proposal: new pg_dump options --copy-delimiter and

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Andrew Dunstan <andrew@dunslane.net> writes:
> > On Fri, 2006-01-27 at 13:43 -0500, Tom Lane wrote:
> >> That line of argument leads to the suggestion that pg_dump should just
> >> use something else (I'd vote for "|"), all the time, in order to produce
> >> more robust dump files.  I still don't see the argument for exposing
> >> a switch though.
> 
> > If we regard them as suitable for human editing for normal use, yes.
> 
> No, that actually was no part of my point.  A pg_dump file that doesn't
> use tabs is more likely to survive being emailed, for instance.  

Except it's not at all. It's perhaps more likely to load but load incorrectly
though. There's no guarantee there aren't tabs in the data for example. Or
once we start being this paranoid, there's no guarantee that some other
character won't survive.

The only place this line of argument ends up is with pg_dump automatically
base64 encoding its entire output.

-- 
greg



Re: Proposal: new pg_dump options --copy-delimiter and --copy-null

От
Bruce Momjian
Дата:
David Fetter wrote:
> On Thu, Jan 26, 2006 at 10:17:05PM -0500, Tom Lane wrote:
> > David Fetter <david@fetter.org> writes:
> > > I have seed database scripts quasi-generated from pg_dump which
> > > include COPY statements, but the data is hard to edit (especially
> > > cut & paste operations) when the COPY delimiter is some
> > > non-visible character like \t.
> > 
> > This seems like an awfully weak use-case for adding to pg_dump's
> > already overly complicated feature set.
> 
> Those who don't use it will never see it.

Documentation itself is providing the option to the user and they have
to decide if it is useful.  No visible feature has zero cost for our
userbase.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal: new pg_dump options --copy-delimiter and

От
David Fetter
Дата:
On Fri, Jan 27, 2006 at 01:12:35PM -0500, Greg Stark wrote:
> 
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
> > David I don't get this... what are you copying from/to that would
> > wouldn't just script? If you throw into a script you can change
> > the delimiter on the fly using translation.
> 
> The problems are a) there's no guarantee the data is safe to put through your
> hypothetical tab-destroying copy/paste anyways. There could be tabs or other
> unsafe characters in the data. b) We have no way of knowing which characters
> are or aren't safe in your hypothetical copy/paste system. Why is tab unsafe
> in the first place?
> 
> Personally I find anything that would encourage people to use anything other
> than tabs evil anyways. All those people who think | is somehow a reasonable
> choice or want to use commas and then get all confused trying to escape them
> and invent ever more confused syntaxes for escaping the escape characters.
> Just use tab separated data like man was meant to.

I'd say that the multiplicity of strong opinions on The Right
Delimiter is a pretty strong argument for having a switch to control
it.

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!