Обсуждение: BUG #3737: lower/upper fails to match extended chars in LATIN1
The following bug has been logged online: Bug reference: 3737 Logged by: Jonas Forsman Email address: jonas@axier.org PostgreSQL version: 8.1.10 Operating system: Ubuntu 6.06 LTS Description: lower/upper fails to match extended chars in LATIN1 Details: Try: select * from table where lower(address) like '%Ã¥%' This select fails to find addresses including capital à and similars in LATIN1 (like à , Ã, Ã). The same bug is valid for upper.
Jonas Forsman wrote: > The following bug has been logged online: > > Bug reference: 3737 > Logged by: Jonas Forsman > Email address: jonas@axier.org > PostgreSQL version: 8.1.10 > Operating system: Ubuntu 6.06 LTS > Description: lower/upper fails to match extended chars in LATIN1 > Details: > > Try: > select * from table where lower(address) like '%Ã¥%' > > This select fails to find addresses including capital à and similars in > LATIN1 (like à , Ã, Ã). The behavior of lower() depends on your locale. What locale are you running in? Make sure your locale matches the encoding. Exact locale names and behavior are defined by the operating system, but for example on Debian, which is probably the same as Ubuntu in this case, use initdb -D data --locale=sv_SE.iso88591 --encoding=LATIN1 for Swedish. Also make sure that your client_encoding is set correctly. For example, if you run the query in psql in a terminal that uses UTF-8 encoding, you need to "SET client_encoding='UTF-8'" See the manual for more details: http://www.postgresql.org/docs/8.1/static/charset.html#LOCALE -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Jonas Forsman wrote: > it is possibly a locale-error. may I ask: > 1. How do I check the locale? Within psql: show lc_ctype; (and other lc_* variables as well) show server_encoding; show client_encoding; > 2. Can I change this on already running db:s ? Unfortunately you can't. You'll have to re-initdb. :-( -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Jonas Forsman wrote: > >> Try: >> select * from table where lower(address) like '%=C3=A5%' >> >> This select fails to find addresses including capital =C3=85 and similar= s in >> LATIN1 (like =C3=85, =C3=84, =C3=96).=20 Isn't =C3 an upper-case letter? In which case lower(address) will never mat= ch it since by definition lower(address) will only contain characters which are lower case. That is if address contains a "=C3" then lower(address) will co= ntain an "=E3" instead which won't match the "=C3" in the pattern. I think you either need to put a lower() on both sides of the LIKE or use ILIKE. --=20 Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL tra= ining!
(please keep the list cc'd) Jonas Forsman wrote: > if I cannot change charset within a postgres installation > and then benefit from all the features in postgres isn't that to be seen > as a bug > or at least it should be a workaround easier than dumping everything > which I think is a too large risk to take if not really neccessary. Well, it's not a bug in the sense that it's documented and working as designed. But I agree that it sucks. Unfortunately changing the locale on-the-fly isn't as easy as it might sound. The locale affects the sort-ordering, for example, which means that you'd have to at least recreate all indexes with text data in them anyway. Changing the character encoding would be even worse. You'd have to re-encode all data, which means rewriting all data in the database. The best solution would be to support per-database and even per-table and per-column locales and character encodings, so that you could do the conversions piece by piece. Patches are welcome :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Jonas Forsman wrote: > Recreating indicies is maybe not great but can be done quit easily compared > to reinitializing a db or recoding an entire database. > Sometimes a bit of a solution is all what it is needed. Would it be > possible > to change the locale on the fly and only recreate indicies afterwards? Hmm. I think you could change LC_CTYPE by just hacking the control file. LC_COLLATE would require the reindexing. Perhaps you could hack that as well, and then manually run REINDEX on all databases. Did you confirm what the problem in your case really was? As Greg pointed out, the example query you posted was in fact flawed.. BTW, 8.3 is more strict on this, and won't allow you to initdb a cluster with mismatching locale and encoding. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
From my point of view (administrators view, not developers) it is such a drawback in postgres implementation that it seriously challenges the position as a competitor to the big dragons. (IF they suffer from the same missing feature) Recreating indicies is maybe not great but can be done quit easily compared to reinitializing a db or recoding an entire database. Sometimes a bit of a solution is all what it is needed. Would it be possible to change the locale on the fly and only recreate indicies afterwards? / Jonas ps. I'd love to send you a patch, but there is no way it would be a good thing time/performance wise for me doing this. I am trying to be a great ambassador though, spreading the words everywhere. ds. Heikki Linnakangas skrev: > (please keep the list cc'd) > > Jonas Forsman wrote: >> if I cannot change charset within a postgres installation >> and then benefit from all the features in postgres isn't that to be >> seen as a bug >> or at least it should be a workaround easier than dumping everything >> which I think is a too large risk to take if not really neccessary. > > Well, it's not a bug in the sense that it's documented and working as > designed. > > But I agree that it sucks. Unfortunately changing the locale > on-the-fly isn't as easy as it might sound. The locale affects the > sort-ordering, for example, which means that you'd have to at least > recreate all indexes with text data in them anyway. > > Changing the character encoding would be even worse. You'd have to > re-encode all data, which means rewriting all data in the database. > > The best solution would be to support per-database and even per-table > and per-column locales and character encodings, so that you could do > the conversions piece by piece. Patches are welcome :-). > -- ------------------------------------------------ Jonas Forsman VD, Axier Technologies AB Malmgatan 2 SE-703 54 Örebro, SWEDEN Tel.: +46 (0)19 12 00 90 Mobile: +46 (0)73 506 33 00 jonas.forsman@openit.se -------------------------------------------------------------------- This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. --------------------------------------------------------------------