Обсуждение: converting curly apostrophes to standard apostrophes

Поиск
Список
Период
Сортировка

converting curly apostrophes to standard apostrophes

От
CSN
Дата:
Is there a way to replace all curly apostrophes with
standard apostrophes (presumably with replace(x,y,z))?
My database is SQL_ASCII and I can't find a character
code for curly apostrophes in ASCII here:
http://www.lookuptables.com, but nevertheless there
appear to be curly apostrophes in the database.

Thanks,
CSN



____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: converting curly apostrophes to standard apostrophes

От
Michael Fuhr
Дата:
On Mon, Aug 15, 2005 at 12:14:16PM -0700, CSN wrote:
> Is there a way to replace all curly apostrophes with
> standard apostrophes (presumably with replace(x,y,z))?
> My database is SQL_ASCII and I can't find a character
> code for curly apostrophes in ASCII here:
> http://www.lookuptables.com, but nevertheless there
> appear to be curly apostrophes in the database.

The "Extended ASCII Codes" section of that page might not match
what your system uses (it doesn't match mine).  Have you tried using
the ascii() function on the offending data?

CREATE TABLE foo (testval text);
INSERT INTO foo VALUES ('ABCDEFG');

SELECT substr(testval, 4, 1), ascii(substr(testval, 4, 1)) FROM foo;
 substr | ascii
--------+-------
 D      |    68
(1 row)

--
Michael Fuhr

Re: converting curly apostrophes to standard apostrophes

От
CSN
Дата:
--- Michael Fuhr <mike@fuhr.org> wrote:

> On Mon, Aug 15, 2005 at 12:14:16PM -0700, CSN wrote:
> > Is there a way to replace all curly apostrophes
> with
> > standard apostrophes (presumably with
> replace(x,y,z))?
> > My database is SQL_ASCII and I can't find a
> character
> > code for curly apostrophes in ASCII here:
> > http://www.lookuptables.com, but nevertheless
> there
> > appear to be curly apostrophes in the database.
>
> The "Extended ASCII Codes" section of that page
> might not match
> what your system uses (it doesn't match mine).  Have
> you tried using
> the ascii() function on the offending data?

I logged back in after changing putty's charset to
UTF-8 and am now able to paste an ’ (curly apostrophe)
into psql, however that character still appears as a
grey box in psql's select output. I've tried various
queries to find which rows and fields contain it,
without success (I know it's in some of them):

db=>select ascii('’');
 ascii
-------
   226

db=>select id from news where body ilike '%’%';
(0 rows)

db=>select id from news where body ilike '%' ||
chr(226) || '%';
db'>
db'>^C
db=>


CSN


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: converting curly apostrophes to standard apostrophes

От
Michael Fuhr
Дата:
On Mon, Aug 15, 2005 at 01:48:00PM -0700, CSN wrote:
> db=>select ascii('’');
>  ascii
> -------
>    226
>
> db=>select id from news where body ilike '%’%';
> (0 rows)
>
> db=>select id from news where body ilike '%' ||
> chr(226) || '%';
> db'>
> db'>^C
> db=>

What's going on with the last query?  The prompt change suggests
that psql is confused with quoting, and the ^C looks like you hit
Control-C to get the regular prompt back.  Did you ever run this
query?  If it produced no rows then you could widen the search.
Example:

SELECT id FROM news WHERE body ~ '[\200-\377]';

You could use the "string from pattern" variant of substring() to
extract characters in a specific range.  If you have PL/Perl then
it would be trivial to extract all of and only the special characters
along with their ASCII codes:

CREATE FUNCTION special_chars(text) RETURNS text AS '
return join(" ", map {"$_:" . ord($_)} $_[0] =~ /[\200-\377]/g);
' LANGUAGE plperl IMMUTABLE STRICT;

SELECT id, special_chars(body) FROM news WHERE body ~ '[\200-\377]';

--
Michael Fuhr

Re: converting curly apostrophes to standard apostrophes

От
CSN
Дата:
--- Michael Fuhr <mike@fuhr.org> wrote:

> On Mon, Aug 15, 2005 at 01:48:00PM -0700, CSN wrote:
> > db=>select ascii('’');
> >  ascii
> > -------
> >    226
> >
> > db=>select id from news where body ilike '%’%';
> > (0 rows)
> >
> > db=>select id from news where body ilike '%' ||
> > chr(226) || '%';
> > db'>
> > db'>^C
> > db=>
>
> What's going on with the last query?  The prompt
> change suggests
> that psql is confused with quoting, and the ^C looks
> like you hit
> Control-C to get the regular prompt back.  Did you
> ever run this
> query?  If it produced no rows then you could widen
> the search.

Hmm, I'm on another computer and I just tried that
last query and it worked without psql thinking it
needed another single quote. Appears the chr code is
146 not 226 (turns out chr(226) is â - why that
doesn't cause problems with iso-8859-1/utf-8 xml and
the single/double quotes and dashes do I don't know).
Anyhow, I ended up doing this:

update news set body=replace(body,chr(146),''''); --
left single quote
update news set body=replace(body,chr(145),''''); --
right single quote
update news set body=replace(body,chr(147),'"'); --
left double quote
update news set body=replace(body,chr(148),'"'); --
right double quote
update news set body=replace(body,chr(150),'-'); -- en
dash
update news set body=replace(body,chr(151),'-'); -- em
dash

and that seems to do the trick. Most places I found
online listed different chars for these codes, but
http://www.webopedia.com/quick_ref/asciicode.asp lists
them. Jeez, I'm so confused with encodings, charsets,
etc. now. :(

Thanks,
CSN


> Example:
>
> SELECT id FROM news WHERE body ~ '[\200-\377]';
>
> You could use the "string from pattern" variant of
> substring() to
> extract characters in a specific range.  If you have
> PL/Perl then
> it would be trivial to extract all of and only the
> special characters
> along with their ASCII codes:
>
> CREATE FUNCTION special_chars(text) RETURNS text AS
> '
> return join(" ", map {"$_:" . ord($_)} $_[0] =~
> /[\200-\377]/g);
> ' LANGUAGE plperl IMMUTABLE STRICT;
>
> SELECT id, special_chars(body) FROM news WHERE body
> ~ '[\200-\377]';
>
> --
> Michael Fuhr
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com