Re: Hex characters in COPY input

Поиск
Список
Период
Сортировка
От Melvin Call
Тема Re: Hex characters in COPY input
Дата
Msg-id CADGQN57rsoRL-tX7w1se0ysu2Zywsf=OJ-4GVVNRdgLNBA3LSQ@mail.gmail.com
обсуждение исходный текст
Ответ на Hex characters in COPY input  (Melvin Call <melvincall979@gmail.com>)
Список pgsql-general
On 2/27/15, Adam Hooper <adam@adamhooper.com> wrote:
> On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call <melvincall979@gmail.com>
> wrote:
>
>> So my question is, how do I sanitize the hex character in the middle of a
>> word
>> to be able to copy in Montreal with an accented e? Or am I going about
>> this at
>> the wrong point?
>
> Hi Melvin,
>
> This is not a Postgres problem, and it is not a regex problem. So yes,
> you're going about it at the wrong point: you're trying to modify a
> _character_ at a time, but you _should_ be trying to modify a _byte_
> at a time. Text replacement cannot do what you want it to do.
>
> If you're on Linux or Mac, uconv will work -- for instance, `iconv
> --from-code=windows-1252 --to-code=utf-8 < input-file.txt >
> output-file.txt`
>
> Otherwise, you can use a text editor. Be sure to open the file
> properly (such that é appears) and then save it as utf-8.
>
> Alternatively, you could tell Postgres to use your existing encoding
> -- judging from the \xe9, any of "windows-1252", "iso-8859-15" or
> "iso-8859-1" will be accurate. But I always prefer my data to be
> stored as "utf-8", and you should, too.
>
> Read up on character sets here:
> http://www.joelonsoftware.com/articles/Unicode.html
>
> Enjoy life,
> Adam


Thank you Adam. I was able to make this work by adding the ENCODING 'latin1'
option to the COPY command per Vic's suggestion, and as you correctly pointed
out as well. However iconv would probably do the trick too, now that I know
where the problem actually lies. I failed to realize that I was not dealing
with UTF8 because the MySQL data is encoded in UTF8, but you saw what I wasn't
seeing. Your suggested reading is also most appreciated. Maybe one of these
days I will actually make sense of this encoding issue. Thanks for the
link.

Regards,
Melvin


В списке pgsql-general по дате отправления:

Предыдущее
От: Melvin Call
Дата:
Сообщение: Re: Hex characters in COPY input
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Hex characters in COPY input