Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings
Дата
Msg-id CA+TgmoZCxoCvC9OrKhnNQu+cC=e2rUd2KuZqNj5Yv+_NJuV54w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Jun 2, 2017 at 2:22 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> Ok. I was thinking we are doing the tie-breaker because specifically
>> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it,
>> that we do that to be compatible with texteq().
>
> Both of these explanations are correct, in a way. See commit 656beff.

I have to admit that I'm still a little confused about what's actually
going on here.  Commit says that it "fixes inconsistent behavior under
glibc's hu_HU locale", but it doesn't say what sort of inconsistent
behavior it fixes.  It added a comment - which remains to this day -
saying this:

+         * In some locales strcoll() can claim that nonidentical strings are
+         * equal.  Believing that would be bad news for a number of reasons,
+         * so we follow Perl's lead and sort "equal" strings according to
+         * strcmp().

Again, however, the reasons why believing it would be bad news are not
enumerated.  It is merely asserted that there is more than one such
reason.

Now, it is obviously not true in general that a comparison operator
can never deem two values which are not byte-for-byte identical as
equal, because citext does exactly that (indeed, that's the point).  I
thought maybe citext could get away with it because it lacked indexing
support but, nope, it has indexing support.  Also, the in-core numeric
data type has the same property ('1.0'::numeric = '1'::numeric, but
scale() reveals that they are not byte-for-byte identical).

So, what's special about text that it can never report two
non-byte-for-byte values as equal?  And could we consider changing
that, so that users can select an ICU collator and get exactly the
behavior ICU delivers, without the extra tiebreak?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] Fix tab-completion of ALTER SUBSCRIPTION SETPUBLICATION
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Proposal : For Auto-Prewarm.