Обсуждение: BUG #3737: lower/upper fails to match extended chars in LATIN1

Поиск
Список
Период
Сортировка

BUG #3737: lower/upper fails to match extended chars in LATIN1

От
"Jonas Forsman"
Дата:
The following bug has been logged online:

Bug reference:      3737
Logged by:          Jonas Forsman
Email address:      jonas@axier.org
PostgreSQL version: 8.1.10
Operating system:   Ubuntu 6.06 LTS
Description:        lower/upper fails to match extended chars in  LATIN1
Details:

Try:
select * from table where lower(address) like '%Ã¥%'

This select fails to find addresses including capital Å and similars in
LATIN1 (like Å, Ä, Ö).

The same bug is valid for upper.

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Heikki Linnakangas
Дата:
Jonas Forsman wrote:
> The following bug has been logged online:
>
> Bug reference:      3737
> Logged by:          Jonas Forsman
> Email address:      jonas@axier.org
> PostgreSQL version: 8.1.10
> Operating system:   Ubuntu 6.06 LTS
> Description:        lower/upper fails to match extended chars in  LATIN1
> Details:
>
> Try:
> select * from table where lower(address) like '%Ã¥%'
>
> This select fails to find addresses including capital Å and similars in
> LATIN1 (like Å, Ä, Ö).

The behavior of lower() depends on your locale. What locale are you
running in? Make sure your locale matches the encoding.

Exact locale names and behavior are defined by the operating system, but
for example on Debian, which is probably the same as Ubuntu in this
case, use

initdb -D data --locale=sv_SE.iso88591 --encoding=LATIN1

for Swedish.

Also make sure that your client_encoding is set correctly. For example,
if you run the query in psql in a terminal that uses UTF-8 encoding, you
need to "SET client_encoding='UTF-8'"

See the manual for more details:
http://www.postgresql.org/docs/8.1/static/charset.html#LOCALE

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Heikki Linnakangas
Дата:
Jonas Forsman wrote:
> it is possibly a locale-error. may I ask:
> 1. How do I check the locale?

Within psql:

show lc_ctype; (and other lc_* variables as well)
show server_encoding;
show client_encoding;

> 2. Can I change this on already running db:s ?

Unfortunately you can't. You'll have to re-initdb. :-(

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Gregory Stark
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Jonas Forsman wrote:
>
>> Try:
>> select * from table where lower(address) like '%=C3=A5%'
>>
>> This select fails to find addresses including capital =C3=85 and similar=
s in
>> LATIN1 (like =C3=85, =C3=84, =C3=96).=20

Isn't =C3 an upper-case letter? In which case lower(address) will never mat=
ch it
since by definition lower(address) will only contain characters which are
lower case. That is if address contains a "=C3" then lower(address) will co=
ntain
an "=E3" instead which won't match the "=C3" in the pattern.

I think you either need to put a lower() on both sides of the LIKE or use
ILIKE.

--=20
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL tra=
ining!

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Heikki Linnakangas
Дата:
(please keep the list cc'd)

Jonas Forsman wrote:
> if I cannot change charset within a postgres installation
> and then benefit from all the features in postgres isn't that to be seen
> as a bug
> or at least it should be a workaround easier than dumping everything
> which I think is a too large risk to take if not really neccessary.

Well, it's not a bug in the sense that it's documented and working as
designed.

But I agree that it sucks. Unfortunately changing the locale on-the-fly
isn't as easy as it might sound. The locale affects the sort-ordering,
for example, which means that you'd have to at least recreate all
indexes with text data in them anyway.

Changing the character encoding would be even worse. You'd have to
re-encode all data, which means rewriting all data in the database.

The best solution would be to support per-database and even per-table
and per-column locales and character encodings, so that you could do the
conversions piece by piece. Patches are welcome :-).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Heikki Linnakangas
Дата:
Jonas Forsman wrote:
> Recreating indicies is maybe not great but can be done quit easily compared
> to reinitializing a db or recoding an entire database.
> Sometimes a bit of a solution is all what it is needed. Would it be
> possible
> to change the locale on the fly and only recreate indicies afterwards?

Hmm. I think you could change LC_CTYPE by just hacking the control file.
LC_COLLATE would require the reindexing. Perhaps you could hack that as
well, and then manually run REINDEX on all databases.

Did you confirm what the problem in your case really was? As Greg
pointed out, the example query you posted was in fact flawed..

BTW, 8.3 is more strict on this, and won't allow you to initdb a cluster
with mismatching locale and encoding.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3737: lower/upper fails to match extended chars in LATIN1

От
Jonas Forsman
Дата:
From my point of view (administrators view, not developers) it is such
a drawback
in postgres implementation that it seriously challenges the position as
a competitor
to the big dragons. (IF they suffer from the same missing feature)

Recreating indicies is maybe not great but can be done quit easily compared
to reinitializing a db or recoding an entire database.
Sometimes a bit of a solution is all what it is needed. Would it be possible
to change the locale on the fly and only recreate indicies afterwards?

/ Jonas

ps. I'd love to send you a patch, but there is no way it would be a good
thing
time/performance wise for me doing this. I am trying to be a great
ambassador though,
spreading the words everywhere. ds.

Heikki Linnakangas skrev:
> (please keep the list cc'd)
>
> Jonas Forsman wrote:
>> if I cannot change charset within a postgres installation
>> and then benefit from all the features in postgres isn't that to be
>> seen as a bug
>> or at least it should be a workaround easier than dumping everything
>> which I think is a too large risk to take if not really neccessary.
>
> Well, it's not a bug in the sense that it's documented and working as
> designed.
>
> But I agree that it sucks. Unfortunately changing the locale
> on-the-fly isn't as easy as it might sound. The locale affects the
> sort-ordering, for example, which means that you'd have to at least
> recreate all indexes with text data in them anyway.
>
> Changing the character encoding would be even worse. You'd have to
> re-encode all data, which means rewriting all data in the database.
>
> The best solution would be to support per-database and even per-table
> and per-column locales and character encodings, so that you could do
> the conversions piece by piece. Patches are welcome :-).
>


--
------------------------------------------------
Jonas Forsman
VD, Axier Technologies AB
Malmgatan 2
SE-703 54 Örebro,
SWEDEN
Tel.: +46 (0)19 12 00 90
Mobile: +46 (0)73 506 33 00
jonas.forsman@openit.se

--------------------------------------------------------------------
This message (including any attachments) contains confidential
information intended for a specific individual and purpose,
and is protected by law. If you are not the intended recipient,
you should delete this message and are hereby notified that any disclosure,
copying, or distribution of this message, or the taking of any action
based on it, is strictly prohibited.
--------------------------------------------------------------------