Обсуждение: Bad character data

Поиск
Список
Период
Сортировка

Bad character data

От
"Noah Davis"
Дата:
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?

I have a database with some bad characters in it -- some users had entered
MS Word smart quotes, em dashes, foreign characters, and they look like
gibberish coming out of the database. Most important are the smart quotes I
guess.

What's the best way to replace these characters? I thought I might be able
to run a simple SQL UPDATE command, but some of the gibberish for different
characters looks the same (at least from my client it does), and it would
clobber them all.

I have a feeling there's some sort of ASCII code or unicode solution to this
problem, but I could use am little push in the right direction.

Thanks.

Noah.


Re: Bad character data

От
douglas morrison
Дата:
The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be
stripping/converting to ASCII/unicode whichever chars are unwanted and
notifying the user if anything is removed/unusable. The client for
display should then be able to parse the chars correctly...

You might be able to use your current data if you change your SELECT to
something like:

SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
    FROM tableName;


hth,
doug



On May 3, 2004, at 3:23 PM, Noah Davis wrote:

> I posted this to the pgsql general list, but alas, I did not get any
> responses. Perhaps someone here could be of assistance?
>
> I have a database with some bad characters in it -- some users had
> entered
> MS Word smart quotes, em dashes, foreign characters, and they look like
> gibberish coming out of the database. Most important are the smart
> quotes I
> guess.
>
> What's the best way to replace these characters? I thought I might be
> able
> to run a simple SQL UPDATE command, but some of the gibberish for
> different
> characters looks the same (at least from my client it does), and it
> would
> clobber them all.
>
> I have a feeling there's some sort of ASCII code or unicode solution
> to this
> problem, but I could use am little push in the right direction.
>
> Thanks.
>
> Noah.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>


Re: Bad character data

От
"Noah Davis"
Дата:
Hi Doug,

Thanks for the response. I do realize it's not the ideal situation, but it's
the database I inherited, so not much I can do there :) . I tried the
to_ascii and it doesn't seem to help much. It may be that I can try a
different client and get more legible characters. Not sure.

-----Original Message-----
From: douglas morrison [mailto:luckycat@comcast.net]
Sent: Monday, May 03, 2004 4:38 PM
To: Noah Davis
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Bad character data

The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be stripping/converting
to ASCII/unicode whichever chars are unwanted and notifying the user if
anything is removed/unusable. The client for display should then be able to
parse the chars correctly...

You might be able to use your current data if you change your SELECT to
something like:

SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
    FROM tableName;


hth,
doug



On May 3, 2004, at 3:23 PM, Noah Davis wrote:

> I posted this to the pgsql general list, but alas, I did not get any
> responses. Perhaps someone here could be of assistance?
>
> I have a database with some bad characters in it -- some users had
> entered MS Word smart quotes, em dashes, foreign characters, and they
> look like gibberish coming out of the database. Most important are the
> smart quotes I guess.
>
> What's the best way to replace these characters? I thought I might be
> able to run a simple SQL UPDATE command, but some of the gibberish for
> different characters looks the same (at least from my client it does),
> and it would clobber them all.
>
> I have a feeling there's some sort of ASCII code or unicode solution
> to this problem, but I could use am little push in the right
> direction.
>
> Thanks.
>
> Noah.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>



Re: Bad character data

От
douglas morrison
Дата:
no problem noah,

been there inheriting bad data, suXX0r. wish i could be of more help.
if all else fails look at the translate() function to replace the
offending chars... dunno what i can do, but which type of client are
you currently using?

--
doug


On May 3, 2004, at 4:55 PM, Noah Davis wrote:

> Hi Doug,
>
> Thanks for the response. I do realize it's not the ideal situation,
> but it's
> the database I inherited, so not much I can do there :) . I tried the
> to_ascii and it doesn't seem to help much. It may be that I can try a
> different client and get more legible characters. Not sure.
>
> -----Original Message-----
> From: douglas morrison [mailto:luckycat@comcast.net]
> Sent: Monday, May 03, 2004 4:38 PM
> To: Noah Davis
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Bad character data
>
> The lack of responses is prolly because this sort of thing is usually
> handled by the client... The client for input should be
> stripping/converting
> to ASCII/unicode whichever chars are unwanted and notifying the user if
> anything is removed/unusable. The client for display should then be
> able to
> parse the chars correctly...
>
> You might be able to use your current data if you change your SELECT to
> something like:
>
> SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
>     FROM tableName;
>
>
> hth,
> doug
>
>
>
> On May 3, 2004, at 3:23 PM, Noah Davis wrote:
>
>> I posted this to the pgsql general list, but alas, I did not get any
>> responses. Perhaps someone here could be of assistance?
>>
>> I have a database with some bad characters in it -- some users had
>> entered MS Word smart quotes, em dashes, foreign characters, and they
>> look like gibberish coming out of the database. Most important are the
>> smart quotes I guess.
>>
>> What's the best way to replace these characters? I thought I might be
>> able to run a simple SQL UPDATE command, but some of the gibberish for
>> different characters looks the same (at least from my client it does),
>> and it would clobber them all.
>>
>> I have a feeling there's some sort of ASCII code or unicode solution
>> to this problem, but I could use am little push in the right
>> direction.
>>
>> Thanks.
>>
>> Noah.
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to
>> majordomo@postgresql.org)
>>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>


Re: Bad character data

От
"Noah Davis"
Дата:
I'm using psql through putty (an ssh client). Someone reported using
MindTerm as their ssh client and saw the characters ok...? Weird. But they
are definitely screwed up when viewed through the ultimate client (web
browser).

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of douglas morrison
Sent: Monday, May 03, 2004 7:11 PM
To: Noah Davis
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Bad character data

no problem noah,

been there inheriting bad data, suXX0r. wish i could be of more help.
if all else fails look at the translate() function to replace the offending
chars... dunno what i can do, but which type of client are you currently
using?

--
doug


On May 3, 2004, at 4:55 PM, Noah Davis wrote:

> Hi Doug,
>
> Thanks for the response. I do realize it's not the ideal situation,
> but it's the database I inherited, so not much I can do there :) . I
> tried the to_ascii and it doesn't seem to help much. It may be that I
> can try a different client and get more legible characters. Not sure.
>
> -----Original Message-----
> From: douglas morrison [mailto:luckycat@comcast.net]
> Sent: Monday, May 03, 2004 4:38 PM
> To: Noah Davis
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Bad character data
>
> The lack of responses is prolly because this sort of thing is usually
> handled by the client... The client for input should be
> stripping/converting to ASCII/unicode whichever chars are unwanted and
> notifying the user if anything is removed/unusable. The client for
> display should then be able to parse the chars correctly...
>
> You might be able to use your current data if you change your SELECT
> to something like:
>
> SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
>     FROM tableName;
>
>
> hth,
> doug
>
>
>
> On May 3, 2004, at 3:23 PM, Noah Davis wrote:
>
>> I posted this to the pgsql general list, but alas, I did not get any
>> responses. Perhaps someone here could be of assistance?
>>
>> I have a database with some bad characters in it -- some users had
>> entered MS Word smart quotes, em dashes, foreign characters, and they
>> look like gibberish coming out of the database. Most important are
>> the smart quotes I guess.
>>
>> What's the best way to replace these characters? I thought I might be
>> able to run a simple SQL UPDATE command, but some of the gibberish
>> for different characters looks the same (at least from my client it
>> does), and it would clobber them all.
>>
>> I have a feeling there's some sort of ASCII code or unicode solution
>> to this problem, but I could use am little push in the right
>> direction.
>>
>> Thanks.
>>
>> Noah.
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to
>> majordomo@postgresql.org)
>>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match


Re: Bad character data

От
douglas morrison
Дата:
well if that truly is the case, then maybe the different ssh clients
are expecting the chars to be in a specific encoding... The web browser
view can be fixed using php, jsp, coldfusion or whatever your
middleware tier is to translate the data properly.


On May 4, 2004, at 10:09 AM, Noah Davis wrote:

> I'm using psql through putty (an ssh client). Someone reported using
> MindTerm as their ssh client and saw the characters ok...? Weird. But
> they
> are definitely screwed up when viewed through the ultimate client (web
> browser).
>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of douglas morrison
> Sent: Monday, May 03, 2004 7:11 PM
> To: Noah Davis
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Bad character data
>
> no problem noah,
>
> been there inheriting bad data, suXX0r. wish i could be of more help.
> if all else fails look at the translate() function to replace the
> offending
> chars... dunno what i can do, but which type of client are you
> currently
> using?
>
> --
> doug
>
>
> On May 3, 2004, at 4:55 PM, Noah Davis wrote:
>
>> Hi Doug,
>>
>> Thanks for the response. I do realize it's not the ideal situation,
>> but it's the database I inherited, so not much I can do there :) . I
>> tried the to_ascii and it doesn't seem to help much. It may be that I
>> can try a different client and get more legible characters. Not sure.
>>
>> -----Original Message-----
>> From: douglas morrison [mailto:luckycat@comcast.net]
>> Sent: Monday, May 03, 2004 4:38 PM
>> To: Noah Davis
>> Cc: pgsql-admin@postgresql.org
>> Subject: Re: [ADMIN] Bad character data
>>
>> The lack of responses is prolly because this sort of thing is usually
>> handled by the client... The client for input should be
>> stripping/converting to ASCII/unicode whichever chars are unwanted and
>> notifying the user if anything is removed/unusable. The client for
>> display should then be able to parse the chars correctly...
>>
>> You might be able to use your current data if you change your SELECT
>> to something like:
>>
>> SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
>>     FROM tableName;
>>
>>
>> hth,
>> doug
>>
>>
>>
>> On May 3, 2004, at 3:23 PM, Noah Davis wrote:
>>
>>> I posted this to the pgsql general list, but alas, I did not get any
>>> responses. Perhaps someone here could be of assistance?
>>>
>>> I have a database with some bad characters in it -- some users had
>>> entered MS Word smart quotes, em dashes, foreign characters, and they
>>> look like gibberish coming out of the database. Most important are
>>> the smart quotes I guess.
>>>
>>> What's the best way to replace these characters? I thought I might be
>>> able to run a simple SQL UPDATE command, but some of the gibberish
>>> for different characters looks the same (at least from my client it
>>> does), and it would clobber them all.
>>>
>>> I have a feeling there's some sort of ASCII code or unicode solution
>>> to this problem, but I could use am little push in the right
>>> direction.
>>>
>>> Thanks.
>>>
>>> Noah.
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>>     (send "unregister YourEmailAddressHere" to
>>> majordomo@postgresql.org)
>>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>