Re: Regex code versus Unicode chars beyond codepoint 255
От | David Smith |
---|---|
Тема | Re: Regex code versus Unicode chars beyond codepoint 255 |
Дата | |
Msg-id | Pine.LNX.4.44.1202152050100.2772-100000@localhost.localdomain обсуждение исходный текст |
Ответ на | Regex code versus Unicode chars beyond codepoint 255 (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
on 2010-11-24 at 15:56, Tom Lane wrote: > Bug #5766 points out that we're still not there yet in terms of having > sane behavior for locale-specific regex operations in Unicode > encoding. The reason it's not working is that regc_locale does this to > expand the set of characters that are considered to match [[:alnum:]] > : <SNIP> and it would appear that nobody answered the email. I am currently implementing a library system that needs to search by whole word. I am using \m...\M regexes, and the DB is utf8, which includes text in Hebrew, Greek, Arabic and various European character sets. I need a solution to do whole word searches on the data, and this either means fixing the value of alnum for utf8 to include all character sets, or manually generating a list of all characters and reimplementing a word-start/end in regex myself. I would prefer to avoid the latter if at all possible! What is the current status regarding a full character list for alnum for utf8, and is there anything I can do to help get it working? Thanks, David
В списке pgsql-hackers по дате отправления: