Обсуждение: Some TODO items for collations

Поиск
Список
Период
Сортировка

Some TODO items for collations

От
Tom Lane
Дата:
I think the collations patch has now gone about as far as it's going to
get for 9.1.  There are a couple of areas that ought to be on the TODO
list for future versions, though:

* Integrating collations with text search configurations.  There are
several places in the tsearch code that currently have hard-wired uses
of DEFAULT_COLLATION_OID to control case-folding and character
classification behavior.  The most obvious way to generalize that would
be to have the tsearch operators/functions respond to COLLATE, but it
seems to me that that's likely a bad idea --- the appropriate collation
to use for these behaviors needs to be tied to the active text search
dictionary or configuration, probably.  Or maybe we should reverse that
and extend the notion of a collation object to include a reference to a
text search configuration.  It needs thought.  One other point is that
we can't easily put a collation selection into tsearch configurations
so long as collation names are platform-specific.  Should we have a TODO
item to find a way of providing platform-independent collation names?

* Integrating collations with to_char() and related functions.  Their
current behavior is a bit schizophrenic, in that things like TMMonth
will do case-folding according to the function's input COLLATE property,
but the month name itself is determined according to the LC_TIME GUC.
Not sure if we should extend the notion of a collation to cover all
of the LC_foo categories --- if we do, we'll have to think about the
interaction with the legacy GUC variables.
        regards, tom lane


Re: Some TODO items for collations

От
Peter Eisentraut
Дата:
On lör, 2011-04-23 at 13:17 -0400, Tom Lane wrote:
> Should we have a TODO item to find a way of providing
> platform-independent collation names?

This is a multifold problem.

One issue is, if I'm looking for a locale for, say, "English, Canada", I
will find it under "en_CA", if it exists at all.  This is particularly
important for users and application developers.  I think we can do
pretty well on that, since we're only supporting 3 platforms at the
moment.  Linux and Mac OS X, we understand, and Windows Vista and later
also have locale names like "en-CA" (dash instead of underscore).  (We
haven't implemented support for that in initdb, and I don't have access
to a sufficiently new Windows environment.)

The other issue is, if I have a locale, I know what language it is for.
We can do that for locales with standard names, as per above, and indeed
initdb already does a little bit of that when picking the default text
search configuration.  I guess if we develop the text search/collation
interaction further, we can just write down how we expect locales to be
named, and if you violate that  you get some stupid basic text search
behavior.  A more elaborate way would be to register the language in the
pg_collation catalog separately, which would be almost equivalent to
having a collation contain a reference to a text search configuration.
(This could have other specialized applications: If I create a locale
that sorts file names or XML specially, I might like a text search setup
that also treats file names or XML specially.)

The third issue is what naming scheme to recommend or to expect for
alternative sort orders within a language.  But I don't think we can
really do much about that, nor do we need to.