Re: TSearch2: Problems with compound words and stop words

Поиск
Список
Период
Сортировка
От Timo Haberkern
Тема Re: TSearch2: Problems with compound words and stop words
Дата
Msg-id 418B46BE.6@emedia-office.de
обсуждение исходный текст
Ответ на Re: TSearch2: Problems with compound words and stop words  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: TSearch2: Problems with compound words and stop words  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
Oleg,

i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword
patch yesterday. The configuration changed a little bit but the result
is the same. I get no compound words. I'm using the locale de_DE with
encoding ISO8859-1 for the database.

I think i spell is working correctly except the compound words. If i try

SELECT lexize('de_ispell', 'springt')

i get

lexize
{springen,springen}

which seems correct.


But a SELECT lexize('de_ispell', 'Autobahn')

results in

lexize
{autobahn}

i would expect {auto,bahn, autobahn}

The new configuration after the compound word patch:


Actions     dict_name

<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1>

    dict_init

<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1>

    dict_initoption

<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1>

    dict_lexize

<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1>

    dict_comment

<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1>


Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    simple     dex_init(text)     /NULL/     dex_lexize(internal,internal,integer)
Simple example of dictionary.
Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    en_stem     snb_en_init(text)
/usr/local/pgsql/share/contrib/english.stop
snb_lexize(internal,internal,integer)     English Stemmer. Snowball.
Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    ru_stem     snb_ru_init(text)
/usr/local/pgsql/share/contrib/russian.stop
snb_lexize(internal,internal,integer)     Russian Stemmer. Snowball.
Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    ispell_template     spell_init(text)     /NULL/
spell_lexize(internal,internal,integer)     ISpell interface. Must have
.dict and .aff files
Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    synonym     syn_init(text)     /NULL/
syn_lexize(internal,internal,integer)     Example of synonym dictionary
Edit

<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    Delete

<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

    de_ispell     spell_init(text)
DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict",
AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff",
StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop"
spell_lexize(internal,internal,integer)     /NULL/



Timo


Oleg Bartunov wrote:

> Timo,
>
> please, check you apply patch for compound word support.
> What is version of postgresql ?
> Does ispell dict works for non-compound words ?
>
>     Oleg
>
> On Fri, 5 Nov 2004, Timo Haberkern wrote:
>
>> Hi there,
>>
>> i have some troubles with my TSearch2 Installation. I have done this
>> installation as described in
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words
>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words>
>>
>>
>> I used the german myspell dictionary from
>> http://lingucomponent.openoffice.org/spell_dic.html and converted it
>> with
>> my2ispell
>>
>> Nearly everything is working fine so far, except two problems:
>>
>> 1.) The stopword-file seems to be ignored: If i try it with SELECT
>> to_tsvector("default_german", "ein Haus") i get     "ein":1 "haus":2
>>
>> ein should be a Stopword for german (and is defined the german.stop
>> file as
>> well)
>>
>> 2.) The compound words feature doesn"t work too. I have tried a lot
>> of words,
>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german",
>> "Fehlermeldung")
>> i only get
>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated
>> entries. Is there anything wrong with the dictonary or my configuration?
>>
>>
>> My current configuration:
>>
>> pg_ts_cfg:
>>
>> default    default    C
>> default_russian    default    ru_RU.KOI8-R
>> simple    default    NULL
>> default_german    default    de_DE.ISO8859-1
>>     pg_ts_cfgmap:
>>
>> default_german    host    {simple}
>> default_german    hword    {simple}
>> default_german    int    {simple}
>> default_german    nlhword    {simple}
>> default_german    nlpart_hword    {simple}
>> default_german    nlword    {simple}
>> default_german    part_hword    {simple}
>> default_german    sfloat    {simple}
>> default_german    uint    {simple}
>> default_german    uri    {simple}
>> default_german    url    {simple}
>> default_german    version    {simple}
>> default_german    word    {simple}
>> default_german    lpart_hword    {de_ispell,german_snowball}
>> default_german    lword    {de_ispell,german_snowball}
>> default_german    lhword    {de_ispell,german_snowball}
>>
>>
>> pg_ts_dict:
>>
>> de_ispell | 17166    |
>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict",
>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff",
>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop"    |
>> 17167 | NULL
>> german_snowball    | 17357 | NULL    | 17162 | Snowball stemmer for
>> german
>>
>>
>>
>> Can anyone help me?
>>
>> regards
>>
>> Timo
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>

В списке pgsql-general по дате отправления:

Предыдущее
От: "Gregory S. Williamson"
Дата:
Сообщение: Conactenating text with null values
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Conactenating text with null values