On Aug 23, 2004, at 3:44 PM, Tom Lane wrote:
>> Yes, it means that = is doing the wrong thing!!
>
> I have seen this happen in situations where the strings contained
> character sequences that were illegal according to the encoding that
> the
> locale thought was in force. (It seems that strcoll() will return more
> or less random results in such cases...) In particular, given that you
> have
>
>> LC_COLLATE: en_US.UTF-8
>> LC_CTYPE: en_US.UTF-8
>
> you are at risk if the data is not legal UTF-8 strings.
But is it possible to store non-UTF-8 data in a UNICODE database?
> The real question therefore is whether you have the database encoding
> set correctly --- ie, is it UNICODE (== UTF8)? If not then it may well
> be that Postgres is presenting strings to strcoll() that the latter
> will
> choke on.
The database is UNICODE.
$ psql -U postgres -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
bric | postgres | UNICODE
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
(3 rows)
I plan to dump it, run initdb with LC_COLLATE and LC_CTYPE both set to
"C", and restore the database and see if that helps.
Thanks,
David