Re: Case Insensitive Queries
От | tjk@tksoft.com |
---|---|
Тема | Re: Case Insensitive Queries |
Дата | |
Msg-id | 200105301437.HAA30076@smtp3.tksoft.com обсуждение исходный текст |
Ответ на | Re: Case Insensitive Queries (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-sql |
I use a few of them, and in my opinion there is a distinct group of characters at last in the 8859-1 character set which have a lower and upper case instance. The ranges are 0xC0 to 0xDD for upper case and 0xE0 to 0xFD for upper (with the exception of 0xD0, 0xD7, 0xF0, and 0xF7). I haven't examined all the relevant docs, so I might be wrong. The lists are based on my own observations of the characters in question. There is probably no harm in sending a few extra bytes, so I am appending a related function below. If someone finds a flaw with the function, please tell me; that would be greatly appreciated. I am also including a list of related characters. This email is going out with 8859-1 as the charset, so I hope you are able to view them. UPPER CASE: 192: À (0xc0) 193: Á (0xc1) 194: Â (0xc2) 195: Ã (0xc3) 196: Ä (0xc4) 197: Å (0xc5) 198: Æ (0xc6) 199: Ç (0xc7) 200: È (0xc8) 201: É (0xc9) 202: Ê (0xca) 203: Ë (0xcb) 204: Ì (0xcc) 205: Í (0xcd) 206: Î (0xce) 207: Ï (0xcf) 209: Ñ (0xd1) 210: Ò (0xd2) 211: Ó (0xd3) 212: Ô (0xd4) 213: Õ (0xd5) 214: Ö (0xd6) 216: Ø (0xd8) 217: Ù (0xd9) 218: Ú (0xda) 219: Û (0xdb) 220: Ü (0xdc) 221: Ý (0xdd) LOWER CASE: 224: à (0xe0) 225: á (0xe1) 226: â (0xe2) 227: ã (0xe3) 228: ä (0xe4) 229: å (0xe5) 230: æ (0xe6) 231: ç (0xe7) 232: è (0xe8) 233: é (0xe9) 234: ê (0xea) 235: ë (0xeb) 236: ì (0xec) 237: í (0xed) 238: î (0xee) 239: ï (0xef) 241: ñ (0xf1) 242: ò (0xf2) 243: ó (0xf3) 244: ô (0xf4) 245: õ (0xf5) 246: ö (0xf6) 248: ø (0xf8) 249: ù (0xf9) 250: ú (0xfa) 251: û (0xfb) 252: ü (0xfc) 253: ý (0xfd) SKIPPED 208: Ð (0xd0) 215: × (0xd7) 222: Þ (0xde) 240: ð (0xf0) 247: ÷ (0xf7) 254: þ (0xfe) CREATE FUNCTION lower8859_1 (text) RETURNS text AS '/usr/include/pgsql/lib/str8859_1.so' LANGUAGE 'C'; /* No warranty of any kind, use at your own risk. Use freely. */ text * lower8859_1 (text * str1) { text * result; int32 len1 = 0, i; unsigned char * p, * p2, c; unsigned char upper_min= 0xC0; unsigned char upper_max = 0xDD; len1 = VARSIZE(str1) - VARHDRSZ; if (len1 <= 0) return str1; result = (text *) palloc (len1 + 2 + VARHDRSZ); if (! result) return str1; memset (result, 0, len1 + 2 + VARHDRSZ); p = VARDATA(result); p2 = VARDATA(str1); for (i=0; i < len1; i++) { c = p2[i]; if (isupper(c) || (c >= upper_min && c <= upper_max && c != 0xD0 && c !=0xD7)) p[i] = c + 0x20; else p[i] = c; } VARSIZE(result) = len1 + VARHDRSZ; return result; } Troy > "tjk@tksoft.com" <tjk@tksoft.com> writes: > > If upper() and lower() operate on characters in 8859-1 and other character > > sets when the appropriate locale is set, then a difference in the behavior > > of upper() and lower() would seem like a bug. > > Au contraire ... upper() and lower() are not symmetric operations in > quite a few non-English locales. I'll let those who regularly work with > them give specific details, but handling of accents, German esstet (sp?), > etc are the gotchas that I recall. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
В списке pgsql-sql по дате отправления: