Re: UTF-8 encoding problem w/ libpq

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: UTF-8 encoding problem w/ libpq
Дата
Msg-id 51ACD5F5.3030407@dunslane.net
обсуждение исходный текст
Ответ на Re: UTF-8 encoding problem w/ libpq  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 06/03/2013 12:22 PM, Heikki Linnakangas wrote:
> On 03.06.2013 18:27, ktm@rice.edu wrote:
>> On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote:
>>>
>>>>> If I change the strCreate query and add double quotes around the
>>>>> column
>>>> name, then the problem disappears. But the original name is already in
>>>> lowercase, so I think it should also work without quoting the
>>>> column name.
>>>>> Am I missing some setup in either the database or in the use of
>>>>> libpq?
>>>>>
>>>>> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>>>>>
>>>>> The database uses:
>>>>> ENCODING = 'UTF8'
>>>>> LC_COLLATE = 'English_United Kingdom.1252'
>>>>> LC_CTYPE = 'English_United Kingdom.1252'
>>>>>
>>>>> Thanks for any help,
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>> If you do not want the lowercase behavior, you must put double-quotes
>>>> around the column name per the documentation:
>>>>
>>>> http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
>>>> lexical.html#SQL-SYNTAX-IDENTIFIERS
>>>>
>>>> section 4.1.1.
>>>>
>>>> Regards,
>>>> Ken
>>>
>>> The original name 'id_äß' is already in lowercase. The backend
>>> should leave it unchanged IMO.
>>
>> Only in utf-8 which needs to be double-quoted for a column name as
>> you have
>> seen, otherwise the value will be lowercased per byte.
>
> He *is* using UTF-8. Or trying to, anyway :-). The downcasing in the
> backend is supposed to leave bytes with the high-bit set alone, ie. in
> UTF-8 encoding, it's supposed to leave ä and ß alone.
>
> I suspect that the conversion to UTF-8, before the string is sent to
> the server, is not being done correctly. I'm not sure what's wrong
> there, but I'd suggest printing the actual byte sequence sent to the
> server, to check if it's in fact valid UTF-8. ie. replace the PQexec()
> line with something like:
>
>     const char *s = ToUtf8(strCreate.c_str()).c_str();
>     int i;
>     for (i=0; s[i]; i++)
>       printf("%02x", (unsigned char) s[i]);
>     printf("\n");
>     pResult = PQexec(pConn, s);
>
> That should contain the UTF-8 byte sequence for äß, "c3a4c39f"
>
>


Umm, no, the backend code doesn't do it right. Some time ago I suggested
a fix for this - see
<http://www.postgresql.org/message-id/50ACF7FA.7070108@dunslane.net>.
Tom thought there might be other places that need fixing, and I haven't
had time to look for them. But maybe we should just fix this one for now
at least.

cheers

andrew



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Vacuum, Freeze and Analyze: the big picture
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Vacuum, Freeze and Analyze: the big picture