Re: How does the tsearch configuration get selected?

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: How does the tsearch configuration get selected?
Дата
Msg-id 200706151415.l5FEFVG29817@momjian.us
обсуждение исходный текст
Ответ на Re: How does the tsearch configuration get selected?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: How does the tsearch configuration get selected?  (Bruce Momjian <bruce@momjian.us>)
Re: How does the tsearch configuration get selected?  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > First, why are we specifying the server locale here since it never
> > changes:
> 
> It's poorly described.  What it should really say is the language
> that the text-to-be-searched is in.  We can actually support multiple
> languages here today, the restriction being that there have to be
> stemmer instances for the languages with the database encoding you're
> using.  With UTF8 encoding this isn't much of a restriction.  We do need
> to put code into the dictionary stuff to enforce that you can't use a
> stemmer when the database encoding isn't compatible with it.
> 
> I would prefer that we not drive any of this stuff off the server's
> LC_xxx settings, since as you say that restricts things to just one
> locale.

The idea they had was to set the _default_ full text configuration to
match the locale, e.g.UTF8.en_US.  This works well for cases where we
ship a number of pre-installed full text configurations in pg_catalog.
But of course you can support multiple languages with that
encoding/locale, so you have to have the ability to do other languages,
but not necessarily by default.

> > Second, I can't figure out how to reference a non-default
> > configuration.
> 
> See the multi-argument versions of to_tsvector etc.
> 
> I do see a problem with having to_tsvector(config, text) plus
> to_tsvector(text) where the latter implicitly references a config
> selected by a GUC variable: how can you tell whether a query using the
> latter matches a particular index using the former?  There isn't
> anything in the current planner mechanisms that would make that work.

Well, now that I have gotten feedback, we have a few options:

1)  Require the configuration to be always specified.  The problem with
this is that casting (::tsquery) and operators (@@) have no way to
specify a configuration.

2)  Use a GUC that you can set for the configuration, and perhaps
default it if possible to match the locale.  Is the default affected by
search_path (ouch)?

How do we make sure that any index that is accessed is using the same
configuration that is being used by the query, e.g. ::tsquery?  Do we
have to store the configuration name in the index and somehow throw an
error if it doesn't match?  What about changes to the configuration
after the index has been created, e.g. new stop words or dictionaries?

The two big open issues are whether we allow a default configuration,
and whether we require the configuration name to be always specified.

My guess right now is that we use a GUC that will default if a
pg_catalog configuration name matches the lc_ctype locale name, and we
have to throw an error if an accessed index creation GUC doesn't match
the current GUC.

So we create a pg_catalog full text configuration named UTF8.en-US, and
some others like ru_RU.UTF-8.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Teodor Sigaev
Дата:
Сообщение: Re: Tsearch vs Snowball, or what's a source file?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: How does the tsearch configuration get selected?