LIKE fixed(?) for non-ASCII collation orders
От | Tom Lane |
---|---|
Тема | LIKE fixed(?) for non-ASCII collation orders |
Дата | |
Msg-id | 17785.946619646@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
I have just committed what I hope is the final solution for the problem of LIKE index optimization in non-ASCII locales. indxpath.c now generates both a lower and upper indexqual in all locales. For example,x LIKE 'foo%t' will create indexqual conditionsx >= 'foo' AND x < 'fop' The "<" condition is omitted only if the code is unable to produce a string greater than the pattern's constant prefix. Locale-specific variations in collation order are handled by the cut-and-try method I suggested a while ago: > The approach I was considering for fixing the problem was to use a > loop that would repeatedly try to generate a string greater than the > prefix string. The basic loop step would increment the rightmost > byte as Goran has done (or, if it's already up to the limit, chop > it off and increment the next character position). Then test to > see whether the '<' operator actually believes the result is > greater than the given prefix, and repeat if not. Although I believe that the code will work in non-ASCII locales and MULTIBYTE character sets, I'm not set up to try it easily. Could some folks try it out and report back? The critical subroutine is attached, so if you prefer to eyeball the code, here it is... regards, tom lane /** Try to generate a string greater than the given string or any string it is* a prefix of. If successful, return a palloc'dstring; else return NULL.** To work correctly in non-ASCII locales with weird collation orders,* we cannot simplyincrement "foo" to "fop" --- we have to check whether* we actually produced a string greater than the given one. Ifnot,* increment the righthand byte again and repeat. If we max out the righthand* byte, truncate off the last characterand start incrementing the next.* For example, if "z" were the last character in the sort order, then we* couldproduce "foo" as a string greater than "fonz".** This could be rather slow in the worst case, but in most cases we won't*have to try more than one or two strings before succeeding.** XXX in a sufficiently weird locale, this might produceincorrect results?* For example, in German I believe "ss" is treated specially --- if we are* given "foos" and return"foot", will this actually be greater than "fooss"?*/ static char * make_greater_string(const char * str, Oid datatype) { char *workstr; int len; /* Make a modifiable copy, which will be our return value if successful */ workstr = pstrdup((char *) str); while ((len = strlen(workstr)) > 0) { unsigned char *lastchar = (unsigned char *) (workstr + len - 1); /* * Try to generate a larger string by incrementing the last byte. */ while (*lastchar < (unsignedchar) 255) { (*lastchar)++; if (string_lessthan(str, workstr, datatype)) return workstr; /* Success! */ } /* * Truncate off the last character, which might be morethan 1 byte * in MULTIBYTE case. */ #ifdef MULTIBYTE len = pg_mbcliplen((const unsigned char *) workstr, len, len-1); workstr[len] = '\0'; #else *lastchar = '\0'; #endif } /* Failed... */ pfree(workstr); return NULL; }
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Re: [HACKERS] Is backend-libpq's "PQexec/PQfn/portal" code dead?