Обсуждение: MS ASCII characters in text field

Поиск
Список
Период
Сортировка

MS ASCII characters in text field

От
"Kevin McCarthy"
Дата:
I'm running into a problem that, from my investigative work online, seems to be more common than I'd suspected. We are hosting a site using Apache and PHP5 that allows uploading of textual content into field tables via HTML forms processed by code that inserts the text into text fields in various tables.

Often users will copy and paste text directly from MS Word docs into the forms which will invariably contain Microsoft's proprietary formatting of characters such as 'smart' quotes, trademark, copyright symbols, accent grave, etc. We've set the HTML pages as UTF-8 and the database connection to UTF-8. However when our calls to import the data that includes any of these characters into the database, the queries fail complaining that e.g. "[nativecode=ERROR:  character 0xe28093 of encoding "UTF8" has no equivalent in "LATIN9"]"

We've tried on the PHP end to translate various ASCII characters from literal values to specified replacements but have not been able to catch these anomalies. Any suggestions, recommendations, experiences to relate?

TIA


--
Kevin McCarthy
kemccarthy1@gmail.com

Re: MS ASCII characters in text field

От
Tom Lane
Дата:
"Kevin McCarthy" <kemccarthy1@gmail.com> writes:
> Often users will copy and paste text directly from MS Word docs into the
> forms which will invariably contain Microsoft's proprietary formatting of
> characters such as 'smart' quotes, trademark, copyright symbols, accent
> grave, etc. We've set the HTML pages as UTF-8 and the database connection to
> UTF-8. However when our calls to import the data that includes any of these
> characters into the database, the queries fail complaining that e.g.
> "[nativecode=ERROR:  character 0xe28093 of encoding "UTF8" has no equivalent
> in "LATIN9"]"

That error suggests that your database encoding is LATIN9, not UTF-8.
You need to change it.  Beware that you need the server's locale
settings to be in step, too.

            regards, tom lane

Re: MS ASCII characters in text field

От
"Kevin McCarthy"
Дата:
Thanks, this is the case.

As for changing it, docs seem to suggest that encoding can be set only upon database creation. The database which we need to change is running in production although with fairly low traffic at the present time. Would the best suggestion be to dump the current database and re-import? And if so, any hint on how to indicate the encoding upon creation? Docs probably state how but just thought I'd ask.

Thanks again.

On 3/26/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Kevin McCarthy" <kemccarthy1@gmail.com> writes:
> Often users will copy and paste text directly from MS Word docs into the
> forms which will invariably contain Microsoft's proprietary formatting of
> characters such as 'smart' quotes, trademark, copyright symbols, accent
> grave, etc. We've set the HTML pages as UTF-8 and the database connection to
> UTF-8. However when our calls to import the data that includes any of these
> characters into the database, the queries fail complaining that e.g.
> "[nativecode=ERROR:  character 0xe28093 of encoding "UTF8" has no equivalent
> in "LATIN9"]"

That error suggests that your database encoding is LATIN9, not UTF-8.
You need to change it.  Beware that you need the server's locale
settings to be in step, too.

                        regards, tom lane



--
Kevin McCarthy
kemccarthy1@gmail.com
http://www.linkedin.com/in/kevinemccarthy