Re: UTF-8 encoding problem w/ libpq

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: UTF-8 encoding problem w/ libpq
Дата
Msg-id 51ACC2E3.9020309@vmware.com
обсуждение исходный текст
Ответ на Re: UTF-8 encoding problem w/ libpq  ("ktm@rice.edu" <ktm@rice.edu>)
Ответы Re: UTF-8 encoding problem w/ libpq  (Andrew Dunstan <andrew@dunslane.net>)
Re: UTF-8 encoding problem w/ libpq  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 03.06.2013 18:27, ktm@rice.edu wrote:
> On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote:
>>
>>>> If I change the strCreate query and add double quotes around the column
>>> name, then the problem disappears. But the original name is already in
>>> lowercase, so I think it should also work without quoting the column name.
>>>> Am I missing some setup in either the database or in the use of libpq?
>>>>
>>>> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>>>>
>>>> The database uses:
>>>> ENCODING = 'UTF8'
>>>> LC_COLLATE = 'English_United Kingdom.1252'
>>>> LC_CTYPE = 'English_United Kingdom.1252'
>>>>
>>>> Thanks for any help,
>>>>
>>>> Martin
>>>>
>>>
>>> Hi Martin,
>>>
>>> If you do not want the lowercase behavior, you must put double-quotes
>>> around the column name per the documentation:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
>>> lexical.html#SQL-SYNTAX-IDENTIFIERS
>>>
>>> section 4.1.1.
>>>
>>> Regards,
>>> Ken
>>
>> The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO.
>
> Only in utf-8 which needs to be double-quoted for a column name as you have
> seen, otherwise the value will be lowercased per byte.

He *is* using UTF-8. Or trying to, anyway :-). The downcasing in the 
backend is supposed to leave bytes with the high-bit set alone, ie. in 
UTF-8 encoding, it's supposed to leave ä and ß alone.

I suspect that the conversion to UTF-8, before the string is sent to the 
server, is not being done correctly. I'm not sure what's wrong there, 
but I'd suggest printing the actual byte sequence sent to the server, to 
check if it's in fact valid UTF-8. ie. replace the PQexec() line with 
something like:
    const char *s = ToUtf8(strCreate.c_str()).c_str();    int i;    for (i=0; s[i]; i++)      printf("%02x", (unsigned
char)s[i]);    printf("\n");    pResult = PQexec(pConn, s);
 

That should contain the UTF-8 byte sequence for äß, "c3a4c39f"

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David E. Wheeler"
Дата:
Сообщение: Re: Perl 5.18 breaks pl/perl regression tests?
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Re: [HACKERS] high io BUT huge amount of free memory