Re: Need magic to clean strings from unconvertible UTF8

Поиск
Список
Период
Сортировка
От Andreas
Тема Re: Need magic to clean strings from unconvertible UTF8
Дата
Msg-id 4CD6B2BA.6060802@gmx.net
обсуждение исходный текст
Ответ на Re: Need magic to clean strings from unconvertible UTF8  (John R Pierce <pierce@hogranch.com>)
Список pgsql-general
Am 07.11.2010 06:54, schrieb John R Pierce:
> On 11/06/10 9:35 PM, Andreas wrote:
>> somehow there have unconvertible characters sneaked into my DB.
>> Very probaply they came in via Imports from MS-Access.
>>
>> Access doesn't complain but when I try to export stuff with pgAdmin
>> to csv I get an error that some char is not representable in the
>> local charset.
>>
>> I can find the problematic rows.
>> How could I delete every char in a string that can't be converted to
>> WIN1252?
>
>
> One idea that comes to my mind....  issue a
>
>     SET CLIENT_ENCODING 'C';
>
> then find and fix any problems with SQL.     The C aka Posix encoding
> lets you directly manipulate the characters as binary.
>
> or set the client_encoding to whatever the database encoding is, and
> find the characters that you know aren't compatible with WIN1252 and
> change them

Actually that's the problem.
How would I delete everything, that is not win1252 compatible.
I'm certain those are junk in my case so I'd rather get rid of them and
not convert them to anything.
In some cases they even are illegal UTF8 codes that must have been
created on the way when the data was transferred between a couple of
file formats among which was excel and who knows what.

As I said, I can find the rows with such junk but I don't know how to
clean up the text-fields without doing it manually.


В списке pgsql-general по дате отправления:

Предыдущее
От: Allan Kamau
Дата:
Сообщение: Looking for a Generic lightweight job queueing (stack) implementation.
Следующее
От: Andre Lopes
Дата:
Сообщение: It is possible to update more than 1 table in the same update statement?