Обсуждение: COPY command file name encoding issue (UTF8/WIN1252)

Поиск
Список
Период
Сортировка

COPY command file name encoding issue (UTF8/WIN1252)

От
Pujol Mathieu
Дата:
Hi,

I have a problem using COPY command with a file name containing non
ASCII characters.
I use Postgres 9.3.5 x64 on a Windows 7.
OS local encoding is WIN1252.
My database is encoded in UTF8.
I initiate client connection with libpq, connection encoding is set to UTF8.
I build properly my file path taking care of encoding.

When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY
it creates a file named é.bin which is utf8 name interpreted as local8.
It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252').
é in UTF8 "\303\251"
é in WIN1252"\351"

This command works on a database encoded in WIN1252 (same as OS) .
So it seems that COPY command don't take care of file name encoding.
Is it a bug ? a limitation ?
Thanks for your help

Mathieu PUJOL



Re: COPY command file name encoding issue (UTF8/WIN1252)

От
Albe Laurenz
Дата:
Pujol Mathieu wrote:
> I have a problem using COPY command with a file name containing non
> ASCII characters.
> I use Postgres 9.3.5 x64 on a Windows 7.
> OS local encoding is WIN1252.
> My database is encoded in UTF8.
> I initiate client connection with libpq, connection encoding is set to UTF8.
> I build properly my file path taking care of encoding.
> 
> When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY
> it creates a file named é.bin which is utf8 name interpreted as local8.
> It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252').
> é in UTF8 "\303\251"
> é in WIN1252"\351"
> 
> This command works on a database encoded in WIN1252 (same as OS) .
> So it seems that COPY command don't take care of file name encoding.
> Is it a bug ? a limitation ?
> Thanks for your help

I didn't look at the code, but I'd say that the database encoding is
used for the file name, which is why it works when database encoding
and OS locale are the same.

I guess that it would be possible for PostgreSQL to figure out with what
OS locale the postmaster is running and to convert file names accordingly,
but it's probably not trivial since it is OS dependent.

Yours,
Laurenz Albe

Re: COPY command file name encoding issue (UTF8/WIN1252)

От
Pujol Mathieu
Дата:
Maybe a new option could be added to let caller specifies the file name
encoding, it may know it because he create the source/destination file.
I tried to give him a WIN1252 text by doing COPY "test" TO
convert_from(convert_to('C:/tmp/é.bin','UTF8'),'WIN1252') WITH BINARY
but this call is not allowed. Sending him a text containing escaped
WIN1252 hex value fails, because query parser detect invalid UTF8
sequence (which is logical).
The problem is that I can't find any way to workaround this bug.

Regards

Mathieu Pujol

Le 23/03/2015 11:46, Albe Laurenz a écrit :
> Pujol Mathieu wrote:
>> I have a problem using COPY command with a file name containing non
>> ASCII characters.
>> I use Postgres 9.3.5 x64 on a Windows 7.
>> OS local encoding is WIN1252.
>> My database is encoded in UTF8.
>> I initiate client connection with libpq, connection encoding is set to UTF8.
>> I build properly my file path taking care of encoding.
>>
>> When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY
>> it creates a file named é.bin which is utf8 name interpreted as local8.
>> It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252').
>> é in UTF8 "\303\251"
>> é in WIN1252"\351"
>>
>> This command works on a database encoded in WIN1252 (same as OS) .
>> So it seems that COPY command don't take care of file name encoding.
>> Is it a bug ? a limitation ?
>> Thanks for your help
> I didn't look at the code, but I'd say that the database encoding is
> used for the file name, which is why it works when database encoding
> and OS locale are the same.
>
> I guess that it would be possible for PostgreSQL to figure out with what
> OS locale the postmaster is running and to convert file names accordingly,
> but it's probably not trivial since it is OS dependent.
>
> Yours,
> Laurenz Albe
>