Re: Clarification of the "simple" dictionary

Поиск

Список

Период

Сортировка

От	Oleg Bartunov
Тема	Re: Clarification of the "simple" dictionary
Дата	22 июля 2010 г. 17:44:52
Msg-id	Pine.LNX.4.64.1007222140470.32129@sn.sai.msu.ru обсуждение исходный текст
Ответ на	Re: Clarification of the "simple" dictionary (Andreas Joseph Krogh <andreak@officenet.no>)
Ответы	Re: Clarification of the "simple" dictionary (Andreas Joseph Krogh <andreak@officenet.no>) Re: Clarification of the "simple" dictionary (John Gage <jsmgage@numericable.fr>)
Список	pgsql-general

Дерево обсуждения

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to lower case and checking it against a file of
stopwords. If it is found in the file then an empty array is returned, causing the token to be discarded. If not, the
lower-casedform of the word is returned as the normalized lexeme. Alternatively, the dictionary can be configured to
reportnon-stop-words as unrecognized, allowing them to be passed on to the next dictionary in the list. 

d=# \dFd+ simple
                                           List of text search dictionaries
    Schema   |  Name  |     Template      | Init options |                        Description
------------+--------+-------------------+--------------+-----------------------------------------------------------
  pg_catalog | simple | pg_catalog.simple |              | simple dictionary: just lower case and check for stopword

By default it has no Init options, so it doesn't check for stopwords.

On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

> On 07/22/2010 06:27 PM, John Gage wrote:
>> The easiest way to look at this is to give the simple dictionary a document
>> with to_tsvector() and see if stopwords pop out.
>>
>> In my experience they do.  In my experience, the simple dictionary just
>> breaks the document down into the space etc. separated words in the
>> document.  It doesn't analyze further.
>
> That's my experience too, I just want to make sure it doesn't actually have
> any stopwords which I've missed. Trying many phrases and checking for
> stopwords isn't really proving anything.
>
> Can anybody confirm the "simple" dict. only lowercases the words and
> "uniques" them?
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее

От: Andreas Joseph Krogh
Дата: 22 июля 2010 г., 17:32:57
Сообщение: Re: Clarification of the "simple" dictionary

Следующее

От: Andreas Joseph Krogh
Дата: 22 июля 2010 г., 17:56:43
Сообщение: Re: Clarification of the "simple" dictionary

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Clarification of the "simple" dictionary

Предыдущее

Следующее