Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

Поиск
Список
Период
Сортировка
От Amit Khandekar
Тема Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings
Дата
Msg-id CAJ3gD9dYK42pEFWhLfeGny5fnY9PysR-ywZ=MWKtiewP=tPK-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On 2 June 2017 at 23:52, Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> Ok. I was thinking we are doing the tie-breaker because specifically
>> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it,
>> that we do that to be compatible with texteq().
>
> Both of these explanations are correct, in a way. See commit 656beff.
>
>> Secondly, I was also considering if ICU especially has a way to
>> customize an ICU locale by setting some attributes which dictate
>> comparison or sorting rules for a set of characters. I mean, if there
>> is such customized ICU locale defined in the system, and we use that
>> to create PG collation, I thought we might have to strictly follow
>> those rules without a tie-breaker, so as to be 100% conformant to ICU.
>> I can't come up with an example, or may there isn't one, but , say ,
>> there is a locale which is supposed to sort only by lowest comparison
>> strength (de@strength=1 ?? ). In that case, there might be many
>> characters considered equal, but PG < operator or > operator would
>> still return true for those chars.
>
> In the terminology of the Unicode collation algorithm, PostgreSQL
> "forces deterministic comparisons" [1]. There is a lot of information
> on the details of that within the UCA spec.
>
> If we ever wanted to offer a case insensitive collation feature, then
> we wouldn't necessarily have to do the equivalent of a full strxfrm()
> when hashing, at least with collations controlled by ICU. Perhaps we
> could instead use a collator whose UCOL_STRENGTH is only UCOL_PRIMARY
> to build binary sort keys, and leave the rest to a ucol_equal() call
> (within texteq()) that has the usual UCOL_STRENGTH for the underlying
> PostgreSQL collation.
>
> I don't think it would be possible to implement case insensitive
> collations by using some pre-existing ICU collation that is case
> insensitive. Instead, an implementation might directly vary collation
> strength of any given collation to achieve case insensitivity.
> PostgreSQL would know that this collation was case insensitive, so
> regular collations wouldn't need to change their
> behavior/implementation (to use ucol_equal() within texteq(), and so
> on).

Ah ok. Understood, thanks.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: [HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock ofTwoPhaseStateLock when redo 2PC xlog
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [HACKERS] How to refer to resource files from UDFs written in C