Re: [GENERAL] Postgres, apps, special characters and UTF-8 encoding

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: [GENERAL] Postgres, apps, special characters and UTF-8 encoding
Дата
Msg-id A737B7A37273E048B164557ADEF4A58B53A0696B@ntex2010i.host.magwien.gv.at
обсуждение исходный текст
Ответ на [GENERAL] Postgres, apps, special characters and UTF-8 encoding  (Ken Tanzer <ken.tanzer@gmail.com>)
Список pgsql-general
Ken Tanzer wrote:
> Hi.  I've got a recurring problem with character encoding for a Postgres-based web PHP app, and am
> hoping someone can clue me in or at least point me in the right direction.  I'll confess upfront my
> understanding of encoding issues is extremely limited.  Here goes.
> 
> The app uses a Postgres database, UTF-8 encoded.  Through their browsers, users can add and edit
> records often including text.  Most of the time this works fine.  Though sometimes this will fail with
> Postgres complaining, for example, "Could query with ... , The error text was: ERROR: invalid byte
> sequence for encoding "UTF8": 0xe9 0x20 0x67"
> 
> So this generally happens when people copy and paste things out of their word documents and such.
> 
> As I understand it, those are likely encoded in something non-UTF-8, like WIN-1251 or something.  And
> that one way or another, the encoding needs to be translated before it can be placed into the
> database.  I'm not clear how this is supposed to happen though.  Automatically by the browser?  Done
> in the app?  Some other way?  And if in the app, how is one supposed to know what the incoming
> encoding is?
> 
> Thanks in advance for any help or pointers.

The byte sequence 0xe9 0x20 0x67 means "é g" in ISO-8859-1 and WINDOWS-1252,
so I think that your setup is as follows:

- The PHP application gets data encoded in ISO-8859-1 or WINDOWS-1252
  and tries to store it in a database.
- The PHP application has a database connection with client_encoding
  set to UTF8.

Then the database thinks it gets UTF-8 and will choke if it gets something
different.

The solution:

- Make sure that your web application gets data in only one encoding.
- Set client_encoding to that encoding.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее
От: Albe Laurenz
Дата:
Сообщение: Re: [GENERAL] Request to confirm which command is use for exclusiveoperation
Следующее
От: hariprasath nallasamy
Дата:
Сообщение: [GENERAL] too may LWLocks