Обсуждение: SQL-ASCII database cleanup

Поиск
Список
Период
Сортировка

SQL-ASCII database cleanup

От
Mike Blackwell
Дата:
I have an older database that was created with SQL-ASCII encoding.  Over time users have managed to enter all manner of interesting characters, mostly via cut and paste from Windows documents.  I'm attempting to clean up and eventually the database to UTF8.  I've managed to find most of the data that won't nicely convert from some-random-encoding to UTF8, but it seems the users are entering it as fast as I can find it. Is there a way the incoming data from a Perl CGI web application can be automatically limited to UTF8 even though the database is SQL-ASCII?


Mike

Re: SQL-ASCII database cleanup

От
Susan Cassidy
Дата:

Use the Encode module to test/convert back and forth between UTF8 characters and bytes for the SQL ASCII database.  Assuming the input is already UTF-8:

 

use Encode qw(:all);

# connect to db, prepare insert statement, etc.

  my $bytes = encode('utf8', $utf8_text);

  $sth->execute($bytes, $i) or errexit("execute of insert into public_suffixes tbl failed: ", $DBI::errstr);

 

If your input is not already UTF-8, you will have to use decode in an eval statement to convert to utf-8, then check for failure before re-converting and inserting into the database.  Or something similar.

 

This seems to work for me.  When I need to pull the data back out of the database, I have to reconvert from the byte string into UTF-8 characters before displaying the output.

 

Susan


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Mike Blackwell
Sent: Thursday, July 21, 2011 7:49 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] SQL-ASCII database cleanup

 

I have an older database that was created with SQL-ASCII encoding.  Over time users have managed to enter all manner of interesting characters, mostly via cut and paste from Windows documents.  I'm attempting to clean up and eventually the database to UTF8.  I've managed to find most of the data that won't nicely convert from some-random-encoding to UTF8, but it seems the users are entering it as fast as I can find it. Is there a way the incoming data from a Perl CGI web application can be automatically limited to UTF8 even though the database is SQL-ASCII?

 

 

Mike