Re: Support LIKE with nondeterministic collations

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Support LIKE with nondeterministic collations
Дата
Msg-id b32cefe2-b9e2-499e-b919-fe8f21c5bc22@eisentraut.org
обсуждение исходный текст
Ответ на Re: Support LIKE with nondeterministic collations  ("Daniel Verite" <daniel@manitou-mail.org>)
Список pgsql-hackers
On 03.05.24 16:58, Daniel Verite wrote:
>     * Generating bounds for a sort key (prefix matching)
> 
>     Having sort keys for strings allows for easy creation of bounds -
>     sort keys that are guaranteed to be smaller or larger than any sort
>     key from a give range. For example, if bounds are produced for a
>     sortkey of string “smith”, strings between upper and lower bounds
>     with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
>     of upper bounds can be generated - the first one will match only
>     strings of equal length, while the second one will match all the
>     strings with the same initial prefix.
> 
>     CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
>     the maximum primary weight, so that for example the string
>     “smith\uFFFF” can be used as the upper bound rather than modifying
>     the sort key for “smith”.
> 
> In other words it says that
> 
>    col LIKE 'smith%' collate "nd"
> 
> is equivalent to:
> 
>    col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"
> 
> which could be obtained from an index scan, assuming a btree
> index on "col" collate "nd".
> 
> U+FFFF is a valid code point but a "non-character" [1] so it's
> not supposed to be present in normal strings.

Thanks, this could be very useful!




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Следующее
От: Cary Huang
Дата:
Сообщение: Re: Support tid range scan in parallel?