On Thu, Jan 11, 2007 at 10:19:38AM +0100, Alexander Farber wrote:
> Hello PostgreSQL users!
>
> I have this data stored in WIN1251 encoding, which
> is being fetched by a libpq application I'm developing:
<snip>
> phpbb=> select username, length(username), length(convert(username
> using windows_1251_to_utf8)) from phpbb_users where user_id=224;
> username | length | length
> -----------------+--------+--------
> ????????? ?. ?. | 15 | 26
> (1 row)
>
> My problem is that I need the username in the utf8 encoding.
> So I use the convert(username using windows_1251_to_utf8)
> which works fine except one thing:
If you need the string in UTF-8, why not just set the "client_encoding"
to "utf8" and then the server will only send you strings in utf8, not
conversion necessary.
> Is there please a way to know the length of the utf8 data?
> (I'm using a fixed char array in my C program)
UTF-8 always variable length, I think up to 4 bytes per character.
Maybe you should n't be using a fixed-length array?
> How do you usually handle such cases?
Variable length arrays.
In your next email you ask:
> Can I still be sure that the data returned in the
> convert(username using windows_1251_to_utf8)
> column will be 0-terminated or should I fetch
> the data length using PQgetlength and maintain
> that value in my C-program?
In the client end (as long you're not doing binary transfers) the
strings are always null terminated.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.