Обсуждение: Add encoding support to COPY

Поиск
Список
Период
Сортировка

Add encoding support to COPY

От
David Blewett
Дата:
Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:
[ [ WITH ]         [ BINARY ]         [ OIDS ]         [ DELIMITER [ AS ] 'delimiter' ]         [ ENCODING [ AS ]
'charset']         [ NULL [ AS ] 'null string' ]         [ CSV [ HEADER ]               [ QUOTE [ AS ] 'quote' ]
      [ ESCAPE [ AS ] 'escape' ]               [ FORCE QUOTE column [, ...] ]
 

Any objections? It seems like a cleaner solution client side than
issuing multiple calls to set the client_encoding. If there are no
objections, I can attempt to prepare a patch for the next commitfest.

David Blewett


Re: Add encoding support to COPY

От
Tom Lane
Дата:
David Blewett <david@dawninglight.net> writes:
> Today on IRC, someone was wondering what the preferred method of
> exporting data in a specific encoding via COPY was. They reply was
> wrapping the COPY command in "set client_encoding='foo';", which made
> me wonder how hard it would be to add an additional WITH parameter to
> the actual COPY statement to specify the encoding, a la:

What is the point?  You'd generally have client_encoding set correctly
for your usage anyway, and if you did not, the data could confuse your
client-side code terribly.  Offering an option to let the backend send
data in the "wrong" encoding does NOT seem like a good idea to me.
        regards, tom lane


Re: Add encoding support to COPY

От
David Blewett
Дата:
On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> What is the point?  You'd generally have client_encoding set correctly
> for your usage anyway, and if you did not, the data could confuse your
> client-side code terribly.  Offering an option to let the backend send
> data in the "wrong" encoding does NOT seem like a good idea to me.

The use case was that the client connection was using one encoding,
but needed to output the file in a different encoding. So they would
have to do the "set client_encoding" dance each time they wanted to
export the file. I don't see how it's "wrong", especially considering
there is already a method to do this, albeit cumbersome. I consider it
simply syntactic sugar over existing functionality.

David Blewett


Re: Add encoding support to COPY

От
Tom Lane
Дата:
David Blewett <david@dawninglight.net> writes:
> On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> What is the point? �You'd generally have client_encoding set correctly
>> for your usage anyway, and if you did not, the data could confuse your
>> client-side code terribly. �Offering an option to let the backend send
>> data in the "wrong" encoding does NOT seem like a good idea to me.

> The use case was that the client connection was using one encoding,
> but needed to output the file in a different encoding. So they would
> have to do the "set client_encoding" dance each time they wanted to
> export the file.

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination.  I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.
        regards, tom lane


Re: Add encoding support to COPY

От
Nagy Karoly Gabriel
Дата:
David Blewett wrote:
> Today on IRC, someone was wondering what the preferred method of
> exporting data in a specific encoding via COPY was. They reply was
> wrapping the COPY command in "set client_encoding='foo';", which made
> me wonder how hard it would be to add an additional WITH parameter to
> the actual COPY statement to specify the encoding, a la:
> [ [ WITH ]
>           [ BINARY ]
>           [ OIDS ]
>           [ DELIMITER [ AS ] 'delimiter' ]
>           [ ENCODING [ AS ] 'charset' ]
>           [ NULL [ AS ] 'null string' ]
>           [ CSV [ HEADER ]
>                 [ QUOTE [ AS ] 'quote' ]
>                 [ ESCAPE [ AS ] 'escape' ]
>                 [ FORCE QUOTE column [, ...] ]
>
> Any objections? It seems like a cleaner solution client side than
> issuing multiple calls to set the client_encoding. If there are no
> objections, I can attempt to prepare a patch for the next commitfest.
>
> David Blewett
>
I think that I was the one who wondered about that. Our use case is
related to moving data between different servers which have different
encodings. Ofcourse the encoding should be an option only when COPY
involves files.


--
Nagy Karoly Gabriel
Expert Software Group SRL

(o__  417495 Sanmartin nr. 205
//\'  Bihor, Romania
V_/_  Tel./Fax: +4 0259 317 142, +4 0259 317 143

Вложения

Re: Add encoding support to COPY

От
David Blewett
Дата:
Apologies to Tom for the duplicate...

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Well, it might make sense to allow an ENCODING option attached to a COPY
> with a file source/destination.  I remain of the opinion that overriding
> client_encoding on a transfer to/from the client is a bad idea.

I really don't see how it is any different from manually flipping the
client_encoding before/after the transfer. We could of course put a
warning sign in the docs, but it seems to me it's more error prone for
clients to set the client_encoding manually rather than include an
option for a single command. What happens if an exception is thrown
during the COPY process and the client doesn't handle things
correctly? The rest of their session could be in an unexpected
encoding, whereas with this method we know to return to the original
client_encoding before doing anything else. By including the encoding
option, their explicitly saying how they want to handle the data.

I could see a use case for remote client code to do a COPY to STDOUT,
that is actually being redirected to a file. If the consensus is for
local file-based operations only, however, I can structure the patch
that way.

David


Re: Add encoding support to COPY

От
Bernd Helmle
Дата:

--On 15. Juli 2009 19:59:56 +0300 Nagy Karoly Gabriel 
<nagy.karoly@expert-erp.net> wrote:

> I think that I was the one who wondered about that. Our use case is
> related to moving data between different servers which have different
> encodings. Ofcourse the encoding should be an option only when COPY
> involves files.

I find this rather confusing: can't you just tell via client_encoding the 
correct encoding your file contains during restore?

-- 
Thanks
Bernd


Re: Add encoding support to COPY

От
Alvaro Herrera
Дата:
David Blewett wrote:

> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > Well, it might make sense to allow an ENCODING option attached to a COPY
> > with a file source/destination.  I remain of the opinion that overriding
> > client_encoding on a transfer to/from the client is a bad idea.

> I could see a use case for remote client code to do a COPY to STDOUT,
> that is actually being redirected to a file. If the consensus is for
> local file-based operations only, however, I can structure the patch
> that way.

Yeah, the problem is that reading to/from files is only allowed to
superusers ...

(I'm not sure how this affects \copy in psql; probably something you
should investigate)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Add encoding support to COPY

От
Tom Lane
Дата:
David Blewett <david@dawninglight.net> writes:
> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> Well, it might make sense to allow an ENCODING option attached to a COPY
>> with a file source/destination. �I remain of the opinion that overriding
>> client_encoding on a transfer to/from the client is a bad idea.

> I really don't see how it is any different from manually flipping the
> client_encoding before/after the transfer.

The difference is that the client-side code gets told that the encoding
changed if you do the latter.
        regards, tom lane


Re: Add encoding support to COPY

От
David Blewett
Дата:
On Wed, Jul 15, 2009 at 4:07 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> David Blewett <david@dawninglight.net> writes:
>> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> Well, it might make sense to allow an ENCODING option attached to a COPY
>>> with a file source/destination.  I remain of the opinion that overriding
>>> client_encoding on a transfer to/from the client is a bad idea.
>
>> I really don't see how it is any different from manually flipping the
>> client_encoding before/after the transfer.
>
> The difference is that the client-side code gets told that the encoding
> changed if you do the latter.

Do you mean at the protocol level?

All I was planning on having the patch do is the equivalent of the set
client_encoding dance. Wouldn't that be sufficent to notify the client
of the encoding change?

David Blewett


Re: Add encoding support to COPY

От
Andrew Dunstan
Дата:

Nagy Karoly Gabriel wrote:
> David Blewett wrote:
>> Today on IRC, someone was wondering what the preferred method of
>> exporting data in a specific encoding via COPY was. They reply was
>> wrapping the COPY command in "set client_encoding='foo';", which made
>> me wonder how hard it would be to add an additional WITH parameter to
>> the actual COPY statement to specify the encoding, a la:
>>
> I think that I was the one who wondered about that. Our use case is 
> related to moving data between different servers which have different 
> encodings. Ofcourse the encoding should be an option only when COPY 
> involves files.
>


Well, that is the case that there seems to be consensus about, and it's 
also the case that can't be done via client encoding. We tend to have a 
bias against providing lots of ways to do the same thing, so let's go 
with this case (i.e. do it for cases other than STDIN/STDOUT).

cheers

andrew