Re: TSearch2: Problems with compound words and stop words
От | Timo Haberkern |
---|---|
Тема | Re: TSearch2: Problems with compound words and stop words |
Дата | |
Msg-id | 418B46BE.6@emedia-office.de обсуждение исходный текст |
Ответ на | Re: TSearch2: Problems with compound words and stop words (Oleg Bartunov <oleg@sai.msu.su>) |
Ответы |
Re: TSearch2: Problems with compound words and stop words
(Oleg Bartunov <oleg@sai.msu.su>)
|
Список | pgsql-general |
Oleg, i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch yesterday. The configuration changed a little bit but the result is the same. I get no compound words. I'm using the locale de_DE with encoding ISO8859-1 for the database. I think i spell is working correctly except the compound words. If i try SELECT lexize('de_ispell', 'springt') i get lexize {springen,springen} which seems correct. But a SELECT lexize('de_ispell', 'Autobahn') results in lexize {autobahn} i would expect {auto,bahn, autobahn} The new configuration after the compound word patch: Actions dict_name <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1> dict_init <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1> dict_initoption <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1> dict_lexize <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1> dict_comment <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1> Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> simple dex_init(text) /NULL/ dex_lexize(internal,internal,integer) Simple example of dictionary. Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> en_stem snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop snb_lexize(internal,internal,integer) English Stemmer. Snowball. Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> ru_stem snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> ispell_template spell_init(text) /NULL/ spell_lexize(internal,internal,integer) ISpell interface. Must have .dict and .aff files Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> synonym syn_init(text) /NULL/ syn_lexize(internal,internal,integer) Example of synonym dictionary Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> de_ispell spell_init(text) DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict", AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff", StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" spell_lexize(internal,internal,integer) /NULL/ Timo Oleg Bartunov wrote: > Timo, > > please, check you apply patch for compound word support. > What is version of postgresql ? > Does ispell dict works for non-compound words ? > > Oleg > > On Fri, 5 Nov 2004, Timo Haberkern wrote: > >> Hi there, >> >> i have some troubles with my TSearch2 Installation. I have done this >> installation as described in >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words >> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words> >> >> >> I used the german myspell dictionary from >> http://lingucomponent.openoffice.org/spell_dic.html and converted it >> with >> my2ispell >> >> Nearly everything is working fine so far, except two problems: >> >> 1.) The stopword-file seems to be ignored: If i try it with SELECT >> to_tsvector("default_german", "ein Haus") i get "ein":1 "haus":2 >> >> ein should be a Stopword for german (and is defined the german.stop >> file as >> well) >> >> 2.) The compound words feature doesn"t work too. I have tried a lot >> of words, >> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german", >> "Fehlermeldung") >> i only get >> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated >> entries. Is there anything wrong with the dictonary or my configuration? >> >> >> My current configuration: >> >> pg_ts_cfg: >> >> default default C >> default_russian default ru_RU.KOI8-R >> simple default NULL >> default_german default de_DE.ISO8859-1 >> pg_ts_cfgmap: >> >> default_german host {simple} >> default_german hword {simple} >> default_german int {simple} >> default_german nlhword {simple} >> default_german nlpart_hword {simple} >> default_german nlword {simple} >> default_german part_hword {simple} >> default_german sfloat {simple} >> default_german uint {simple} >> default_german uri {simple} >> default_german url {simple} >> default_german version {simple} >> default_german word {simple} >> default_german lpart_hword {de_ispell,german_snowball} >> default_german lword {de_ispell,german_snowball} >> default_german lhword {de_ispell,german_snowball} >> >> >> pg_ts_dict: >> >> de_ispell | 17166 | >> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict", >> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff", >> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" | >> 17167 | NULL >> german_snowball | 17357 | NULL | 17162 | Snowball stemmer for >> german >> >> >> >> Can anyone help me? >> >> regards >> >> Timo >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
В списке pgsql-general по дате отправления: