Re: Character expansion with ICU collations

Поиск
Список
Период
Сортировка
От Finnerty, Jim
Тема Re: Character expansion with ICU collations
Дата
Msg-id 9EC3C20F-0721-415A-BE68-CB7240B06A26@amazon.com
обсуждение исходный текст
Ответ на Character expansion with ICU collations  ("Finnerty, Jim" <jfinnert@amazon.com>)
Ответы Re: Character expansion with ICU collations  ("Finnerty, Jim" <jfinnert@amazon.com>)
Список pgsql-hackers
Re: 
    >> Can a CI collation be ordered upper case first, or is this a limitation of ICU?

    > I don't know the authoritative answer to that, but to me it doesn't make
    > sense, since the effect of a case-insensitive collation is to throw away
    > the third-level weights, so there is nothing left for "upper case first"
    > to operate on.

It wouldn't make sense for the ICU sort key of a CI collation itself because the sort keys need to be binary equal, but
whatthe collation of interest does is equivalent to adding a secondary "C"-collated expression to the ORDER BY clause.
Forexample:
 

SELECT ... ORDER BY expr COLLATE ci_as;

Is ordered as if the query had been written:

SELECT ... ORDER BY expr COLLATE ci_as, expr COLLATE "C";

Re: 
    > tailoring rules
    >> yes

It looks like the relevant API call is ucol_openRules(), 
    Interface documented here: https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucol_8h.html
    example usage from C here:
https://android.googlesource.com/platform/external/icu/+/db20b09/source/test/cintltst/citertst.c

for example:

    /* Test with an expanding character sequence */
    u_uastrcpy(rule, "&a < b < c/abd < d");
    c2 = ucol_openRules(rule, u_strlen(rule), UCOL_OFF, UCOL_DEFAULT_STRENGTH, NULL, &status);

and a reordering rule test:

    u_uastrcpy(rule, "&z < AB");
    coll = ucol_openRules(rule, u_strlen(rule), UCOL_OFF, UCOL_DEFAULT_STRENGTH, NULL, &status);

that looks encouraging.  It returns a UCollator object, like ucol_open(const char *localeString, ...), so it's an
alternativeto ucol_open().  One of the parameters is the equivalent of colStrength, so then the question would be, how
arethe other keyword/value pairs like colCaseFirst, colAlternate, etc. specified via the rules argument?  In the same
waywith the exception of colStrength?
 

e.g. is "colAlternate=shifted;&z < AB" a valid rules string?

The ICU documentation says simply:

" rules    A string describing the collation rules. For the syntax of the rules please see users guide."

Transform rules are documented here: http://userguide.icu-project.org/transforms/general/rules

But there are no examples of using the keyword/value pairs that may appear in a locale string with the transform rules,
andthere's no locale argument on ucol_openRules.  How can the keyword/value pairs that may appear in the locale string
beapplied in combination with tailoring rules (with the exception of colStrength)?
 






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: recovery test failures on hoverfly