Обсуждение: suitable text search configuration

Поиск
Список
Период
Сортировка

suitable text search configuration

От
Alvaro Herrera
Дата:
Hi,

Is initdb supposed to pick up reasonable TS configurations in general?

If so, it's failing for me:

initdb: could not find suitable text search configuration for locale fr_CA.UTF-8
The default text search configuration will be set to "simple".

It fails for es_CL as well.

... oh, I see there's a table in initdb.c

Are we supposed to add entries to it, one for each country?  I'm
wondering if we should try to match the part before the _ using just the
language, if the complete match fails.  (i.e. match "es_CL" using just
"es", "fr_CA" using just "fr", etc).

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)


Re: suitable text search configuration

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> ... oh, I see there's a table in initdb.c

> Are we supposed to add entries to it, one for each country?  I'm
> wondering if we should try to match the part before the _ using just the
> language, if the complete match fails.  (i.e. match "es_CL" using just
> "es", "fr_CA" using just "fr", etc).

Actually, looking at the examples so far, I'm thinking we should just
consider the string up to the first _, period.

An alternative is to try to match the full locale (es_ES) and then try
the language (es) if that wasn't found.  That would leave room to put
country-by-country exceptions in, but for the moment we'd not have any.
        regards, tom lane


Re: suitable text search configuration

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> ... oh, I see there's a table in initdb.c
>>     
>
>   
>> Are we supposed to add entries to it, one for each country?  I'm
>> wondering if we should try to match the part before the _ using just the
>> language, if the complete match fails.  (i.e. match "es_CL" using just
>> "es", "fr_CA" using just "fr", etc).
>>     
>
> Actually, looking at the examples so far, I'm thinking we should just
> consider the string up to the first _, period.
>
> An alternative is to try to match the full locale (es_ES) and then try
> the language (es) if that wasn't found.  That would leave room to put
> country-by-country exceptions in, but for the moment we'd not have any.
>
>             
>   

Can anyone point to a real world example where country by country would 
make sense? If we need to distinguish flavors of some languages, I would 
not be at all surprised if this was not by country anyway.

cheers

andrew


Re: suitable text search configuration

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Actually, looking at the examples so far, I'm thinking we should just
>> consider the string up to the first _, period.

> Can anyone point to a real world example where country by country would 
> make sense?

For the current set of built-in dictionaries it seems pretty clear that
country distinctions are useless.  If we ever did need that distinction
it would only be after adding dictionaries that aren't going to be in
8.3 ... so I'm leaning to keeping the code simple for the moment.
        regards, tom lane


Re: suitable text search configuration

От
Alvaro Herrera
Дата:
Andrew Dunstan wrote:
>
> Tom Lane wrote:

>> Actually, looking at the examples so far, I'm thinking we should just
>> consider the string up to the first _, period.

I studied the standards a bit to see if they mandated that the locale
names must be in the form "language_COUNTRY", and couldn't find
anything.  Which makes me think it's mostly by (very well established)
convention.  I think trying to parse the _ should not be done on a first
attempt.

>> An alternative is to try to match the full locale (es_ES) and then try
>> the language (es) if that wasn't found.  That would leave room to put
>> country-by-country exceptions in, but for the moment we'd not have any.
>
> Can anyone point to a real world example where country by country would 
> make sense? If we need to distinguish flavors of some languages, I would 
> not be at all surprised if this was not by country anyway.

pt_BR versus pt_PT.  I'm not sure if it makes a difference to a stemmer,
but maybe to a thesaurus it does ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: suitable text search configuration

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Andrew Dunstan wrote:
>> Can anyone point to a real world example where country by country would 
>> make sense? If we need to distinguish flavors of some languages, I would 
>> not be at all surprised if this was not by country anyway.

> pt_BR versus pt_PT.  I'm not sure if it makes a difference to a stemmer,
> but maybe to a thesaurus it does ...

Right, but only when we have built-in dictionaries that separately
address the two countries will there be any need to teach initdb about
it.  I think we should KISS for now.
        regards, tom lane


Re: suitable text search configuration

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > ... oh, I see there's a table in initdb.c
> 
> > Are we supposed to add entries to it, one for each country?  I'm
> > wondering if we should try to match the part before the _ using just the
> > language, if the complete match fails.  (i.e. match "es_CL" using just
> > "es", "fr_CA" using just "fr", etc).
> 
> Actually, looking at the examples so far, I'm thinking we should just
> consider the string up to the first _, period.

I found that there is an ISO spec for "cultural elements", ISO/IEC
15897, a working draft for which can be found at
http://www.open-std.org/jtc1/sc22/open/n3586.pdf

Chapter 13 talks about naming of locales.

I think glibc is supposed to follow this standard.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: suitable text search configuration

От
Tom Lane
Дата:
Have we got consensus that initdb should just look at the first
component of the locale name to choose a text search configuration
(at least for 8.3)?  If so, who's going to make the change?
I can do it but don't want to duplicate effort if someone else
was already on it.
        regards, tom lane


Re: suitable text search configuration

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Have we got consensus that initdb should just look at the first
> component of the locale name to choose a text search configuration
> (at least for 8.3)?  If so, who's going to make the change?
> I can do it but don't want to duplicate effort if someone else
> was already on it.

Thanks, it works wonderfully for me now.

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Ni aun el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)