Re: Scadinavian characters in regular expressions
От | Søren Vainio |
---|---|
Тема | Re: Scadinavian characters in regular expressions |
Дата | |
Msg-id | 910513A5A944D5118BE900C04F67CB5A1F82C7@MAIL обсуждение исходный текст |
Ответ на | Scadinavian characters in regular expressions (Søren Vainio <sva@Netpointers.com>) |
Ответы |
Re: Scadinavian characters in regular expressions
|
Список | pgsql-sql |
There is obviously a problem with the scecial characters. The query SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; produced FALSE on a database with ENCODING = 'LATIN1' and TRUE on a database with ENCODING = 'UNICODE'. Do you have a suggestion to how I can find the count of two-word strings with ENCODING = 'UNICODE'? Thank you Søren Vainio > -----Oprindelig meddelelse----- > Fra: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sendt: 9. april 2002 15:34 > Til: Søren Vainio > Cc: 'Andreas Joseph Krogh'; 'pgsql-sql@postgresql.org' > Emne: Re: [SQL] Scadinavian characters in regular expressions > > > Søren Vainio <sva@Netpointers.com> writes: > > Using \s does produce FALSE for SELECT 'oneå two three' ~ > > '^[^\s]+[\s][^\s]+$'; > > But it also produces FALSE for any two-word string ex: > > SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would > expect TRUE??? > > (I am using PostgreSQL 7.1.3) > > I do not believe that Postgres' regular expression engine > recognizes \s > as meaning anything except "s". See > http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/fu nctions-matching.html In the above, it's even worse: the backslashes were eaten by the string-literal parser, so what arrived at the RE engine was just ^[^s]+[s][^s]+$ ... not likely to produce what you wanted. As for the original issue, I wonder whether you are storing the string as UTF-8 or Latin1 encoding. I have a suspicion that the å (å å a-ring) is actually a multibyte sequence inside the database and for some reason Postgres isn't configured to recognize it as a single logical character. regards, tom lane
В списке pgsql-sql по дате отправления: