Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings
Дата
Msg-id CA+TgmoaRTar_j6SjP6c-ZbMdL6X0U52yJgn-=yEyW1qc17BAkA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Peter Geoghegan <pg@bowt.ie>)
Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jun 9, 2017 at 11:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 6/9/17 11:12, Tom Lane wrote:
>>> https://www.postgresql.org/message-id/27064.1134753128@sss.pgh.pa.us
>
>> Good to know.  That just says that if we were to go with the strcoll()
>> result only, things would work correctly.
>
> There's still the hashing problem.

Tom, that mailing list discussions is very illuminating.  Thanks for
digging it up.

Regarding the question of hashing, one way to support that would be if
we had some sort of canonicalization function.  IOW, suppose there
were a collation API call distill() which had the property that
strcmp(distill(X), distill(Y)) == 0 iff X and Y are considered equal
under that collation.  Then, you could define your hash function as
hash_any(distill(X)).  Alternatively, if the collation library
provided its own hashing function, that would be fine too, and
probably faster.

On the other hand, is there any rule that says we have to support
hashing?  Certainly, if we defined a new datatype collated_text, it
could have a btree opfamily and no hash opfamily.  It's trickier with
only one datatype, but possibly we could come up with a way for an
opfamily to be consulted about whether it is available for a given
choice of collation.  I'm not exactly sure what is possible or
desirable, but I would not be too surprised to hear complaints about
the observed behavior different from the "pure" ICU behavior because
of the tiebreak, and at least some users might even find it worth
giving up hashing in order to get the exact sort order they need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] partial aggregation with internal state type
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings