Re: Selecting on non ASCII varchars

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Selecting on non ASCII varchars
Дата
Msg-id s342a1aa.000@gwmta.wicourts.gov
обсуждение исходный текст
Ответ на Selecting on non ASCII varchars  (Jeremy LaCivita <jlacivita@broadrelay.com>)
Список pgsql-jdbc
A String object doesn't contain an array of bytes; it contains an
array of characters.  Somehow you created String objects from
bytes using the wrong character encoding technique (not to be
confused with a character set).  Your str.getBytes() is using the
default encoding scheme to convert the characters to bytes.  In
this case, it seems that all the characters are mapping back to
the original bytes, although I don't think that's always necessarily
going to happen.  By specifying the "utf-8" in the String
constructor, you're telling it to use a specific encoding technique
to convert those bytes to characters.

There is nothing in a String object to "flag" it for any particular
encoding.  The encoding only comes into play when turning
bytes into characters or vice versa.

-Kevin


>>> Jeremy LaCivita <jlacivita@broadrelay.com> 10/04/05 3:16 PM >>>
Hmmm

so it turns out if i take all my Strings and do this:

str = new String(str.getBytes(), "utf-8");

then it works.

Correct me if i'm wrong, but that says to me that the Strings were in
UTF-8 already, but Java didn't know it, so it couldn't send them to
postgres properly.

because str.getBytes() will return the same bytes that were used to
create the string, and new String(bytes, "utf-8") will repackage them
into a string using utf-8, so nothing has really changed at the byte
level,  java has just explicitly marked it as UTF-8.

Anyway, problem solved.  As to why my strings aren't flagged as
UTF-8, thats not a postgres problem.

Thanks!

-jl

On Oct 2, 2005, at 9:41 PM, Oliver Jowett wrote:

> Jeremy LaCivita wrote:
>
>
>> PreparedStatement pst = conn.prepareStatement("SELECT * from
>> mytable  m
>> where m.title ~* ?");
>>
>
> If you use direct equality (=), does it work?
>
> There have been comments on pgsql-bugs recently that some areas of the
> backend code (case insensitive comparison and regexp) do not work
> correctly in all cases when multibyte encodings are used. You might
> want
> to repost to -bugs if basic equality works correctly.
>
> Do you have a selfcontained testcase we can try? In particular we need
> to know the actual column values and regexp patterns you have
> problems with.
>
> -O
>



В списке pgsql-jdbc по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Getting status on login failure
Следующее
От: Jeremy LaCivita
Дата:
Сообщение: Re: Selecting on non ASCII varchars