LIKE fixed(?) for non-ASCII collation orders

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	LIKE fixed(?) for non-ASCII collation orders
Дата	31 декабря 1999 г. 00:54:35
Msg-id	17785.946619646@sss.pgh.pa.us обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

I have just committed what I hope is the final solution for the problem
of LIKE index optimization in non-ASCII locales.  indxpath.c now
generates both a lower and upper indexqual in all locales.  For example,x LIKE 'foo%t'
will create indexqual conditionsx >= 'foo' AND x < 'fop'
The "<" condition is omitted only if the code is unable to produce a
string greater than the pattern's constant prefix.

Locale-specific variations in collation order are handled by the
cut-and-try method I suggested a while ago:

> The approach I was considering for fixing the problem was to use a
> loop that would repeatedly try to generate a string greater than the
> prefix string.  The basic loop step would increment the rightmost
> byte as Goran has done (or, if it's already up to the limit, chop
> it off and increment the next character position).  Then test to
> see whether the '<' operator actually believes the result is
> greater than the given prefix, and repeat if not.

Although I believe that the code will work in non-ASCII locales and
MULTIBYTE character sets, I'm not set up to try it easily.  Could
some folks try it out and report back?

The critical subroutine is attached, so if you prefer to eyeball
the code, here it is...
        regards, tom lane

/** Try to generate a string greater than the given string or any string it is* a prefix of.  If successful, return a
palloc'dstring; else return NULL.** To work correctly in non-ASCII locales with weird collation orders,* we cannot
simplyincrement "foo" to "fop" --- we have to check whether* we actually produced a string greater than the given one.
Ifnot,* increment the righthand byte again and repeat.  If we max out the righthand* byte, truncate off the last
characterand start incrementing the next.* For example, if "z" were the last character in the sort order, then we*
couldproduce "foo" as a string greater than "fonz".** This could be rather slow in the worst case, but in most cases we
won't*have to try more than one or two strings before succeeding.** XXX in a sufficiently weird locale, this might
produceincorrect results?* For example, in German I believe "ss" is treated specially --- if we are* given "foos" and
return"foot", will this actually be greater than "fooss"?*/
 
static char *
make_greater_string(const char * str, Oid datatype)
{   char       *workstr;   int         len;
   /* Make a modifiable copy, which will be our return value if successful */   workstr = pstrdup((char *) str);
   while ((len = strlen(workstr)) > 0)   {       unsigned char  *lastchar = (unsigned char *) (workstr + len - 1);
       /*        * Try to generate a larger string by incrementing the last byte.        */       while (*lastchar <
(unsignedchar) 255)       {           (*lastchar)++;           if (string_lessthan(str, workstr, datatype))
 return workstr;            /* Success! */       }       /*        * Truncate off the last character, which might be
morethan 1 byte        * in MULTIBYTE case.        */
 
#ifdef MULTIBYTE       len = pg_mbcliplen((const unsigned char *) workstr, len, len-1);       workstr[len] = '\0';
#else       *lastchar = '\0';
#endif   }
   /* Failed... */   pfree(workstr);   return NULL;
}

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

LIKE fixed(?) for non-ASCII collation orders