8.0, UTF8, and CLIENT_ENCODING
| От | Paul Ramsey |
|---|---|
| Тема | 8.0, UTF8, and CLIENT_ENCODING |
| Дата | |
| Msg-id | 464CC160.4080401@refractions.net обсуждение исходный текст |
| Ответы |
Re: 8.0, UTF8, and CLIENT_ENCODING
Re: 8.0, UTF8, and CLIENT_ENCODING |
| Список | pgsql-general |
I have a small database (PgSQL 8.0, database encoding UTF8) that folks are inserting into via a web form. The form itself is declared ISO-8859-1 and the prior to inserting any data, pg_client_encoding is set to LATIN1. Most of the high-bit characters are correctly translated from LATIN1 to UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database. Sometimes, in their wisdom, people cut'n'paste information out of MSWord and put that in the form. Instead of being mapped to 2-byte UTF8 high-bit equivalents, they are going into the database directly as one-byte values > 127. That is, as illegal UTF8 values. When I try to dump'n'restore this database into PgSQL 8.2, my data can't made the transit. Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did I do something wrong? Secondly, anyone know any useful tools to pipe a stream through to strip out illegal UTF8 bytes, so I can pipe my dump through that rather than hand editing it? Thanks, Paul -- Paul Ramsey Refractions Research http://www.refractions.net pramsey@refractions.net Phone: 250-383-3022 Cell: 250-885-0632
В списке pgsql-general по дате отправления: