Re: Case Insensitive Queries

Поиск
Список
Период
Сортировка
От tjk@tksoft.com
Тема Re: Case Insensitive Queries
Дата
Msg-id 200105301437.HAA30076@smtp3.tksoft.com
обсуждение исходный текст
Ответ на Re: Case Insensitive Queries  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
I use a few of them, and in my opinion there is a distinct group of
characters at last in the 8859-1 character set which have a lower and
upper case instance. The ranges are 0xC0 to 0xDD for upper case and 0xE0
to 0xFD for upper (with the exception of 0xD0, 0xD7, 0xF0, and 0xF7).
I haven't examined all the relevant docs, so I might be wrong.  The lists
are based on my own observations of the characters in question.

There is probably no harm in sending a few extra bytes, so I am appending
a related function below. If someone finds a flaw with the function,
please tell me; that would be greatly appreciated.

I am also including a list of related characters.  This email is going
out with 8859-1 as the charset, so I hope you are able to view them.




UPPER CASE:
192: À (0xc0)  193: Á (0xc1)  194: Â (0xc2)  195: Ã (0xc3)
196: Ä (0xc4)  197: Å (0xc5)  198: Æ (0xc6)  199: Ç (0xc7)
200: È (0xc8)  201: É (0xc9)  202: Ê (0xca)  203: Ë (0xcb)
204: Ì (0xcc)  205: Í (0xcd)  206: Î (0xce)  207: Ï (0xcf)              209: Ñ (0xd1)  210: Ò (0xd2)  211: Ó (0xd3)
212: Ô (0xd4)  213: Õ (0xd5)  214: Ö (0xd6)
216: Ø (0xd8)  217: Ù (0xd9)  218: Ú (0xda)  219: Û (0xdb)
220: Ü (0xdc)  221: Ý (0xdd)  

LOWER CASE:
224: à (0xe0)  225: á (0xe1)  226: â (0xe2)  227: ã (0xe3)
228: ä (0xe4)  229: å (0xe5)  230: æ (0xe6)  231: ç (0xe7)
232: è (0xe8)  233: é (0xe9)  234: ê (0xea)  235: ë (0xeb)
236: ì (0xec)  237: í (0xed)  238: î (0xee)  239: ï (0xef)              241: ñ (0xf1)  242: ò (0xf2)  243: ó (0xf3)
244: ô (0xf4)  245: õ (0xf5)  246: ö (0xf6)
248: ø (0xf8)  249: ù (0xf9)  250: ú (0xfa)  251: û (0xfb)
252: ü (0xfc)  253: ý (0xfd)

SKIPPED
208: Ð (0xd0) 
215: × (0xd7)
222: Þ (0xde)
240: ð (0xf0)  
247: ÷ (0xf7)
254: þ (0xfe)



CREATE FUNCTION lower8859_1 (text) RETURNS text  AS '/usr/include/pgsql/lib/str8859_1.so'  LANGUAGE 'C';




/* No warranty of any kind, use at your own risk. Use freely. */

text * lower8859_1 (text * str1) {  text * result;  int32 len1  = 0, i;  unsigned char * p, * p2, c;  unsigned char
upper_min= 0xC0;  unsigned char upper_max = 0xDD;
 
  len1 = VARSIZE(str1) - VARHDRSZ;
  if (len1 <= 0)     return str1;
  result = (text *) palloc (len1 + 2 + VARHDRSZ);  if (! result)     return str1;
  memset (result, 0, len1 + 2 + VARHDRSZ);
  p = VARDATA(result);  p2 = VARDATA(str1);
  for (i=0; i < len1; i++) {     c = p2[i];     if (isupper(c) || (c >= upper_min && c <= upper_max && c != 0xD0 && c
!=0xD7))        p[i] = c + 0x20;     else        p[i] = c;  }
 
  VARSIZE(result) = len1 + VARHDRSZ;
  return result;
}




Troy








> "tjk@tksoft.com" <tjk@tksoft.com> writes:
> > If upper() and lower() operate on characters in 8859-1 and other character
> > sets when the appropriate locale is set, then a difference in the behavior
> > of upper() and lower() would seem like a bug.
> 
> Au contraire ... upper() and lower() are not symmetric operations in
> quite a few non-English locales.  I'll let those who regularly work with
> them give specific details, but handling of accents, German esstet (sp?),
> etc are the gotchas that I recall.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 



В списке pgsql-sql по дате отправления:

Предыдущее
От: Mark
Дата:
Сообщение: bpchar compares (was Re: Case Insensitive Queries)
Следующее
От: "Koen Antonissen"
Дата:
Сообщение: RE: primary key scans in sequence