Re: TSearch2: Problems with compound words and stop words
От | Oleg Bartunov |
---|---|
Тема | Re: TSearch2: Problems with compound words and stop words |
Дата | |
Msg-id | Pine.GSO.4.61.0411171751010.18871@ra.sai.msu.su обсуждение исходный текст |
Ответ на | Re: TSearch2: Problems with compound words and stop words (Timo Haberkern <thaberkern@emedia-office.de>) |
Ответы |
Re: TSearch2: Problems with compound words and stop words
(Oleg Bartunov <oleg@sai.msu.su>)
|
Список | pgsql-general |
On Wed, 17 Nov 2004, Timo Haberkern wrote: > sorry for the late answer, i was on holyday, > > see my remarks below > > > Oleg Bartunov wrote: > >> On Fri, 5 Nov 2004, Timo Haberkern wrote: >> >>> Oleg, >>> >>> i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch >>> yesterday. The configuration changed a little bit but the result is the >>> same. I get no compound words. I'm using the locale de_DE with encoding >>> ISO8859-1 for the database. >>> >>> I think i spell is working correctly except the compound words. If i try >>> >>> SELECT lexize('de_ispell', 'springt') >>> >>> i get >>> >>> lexize >>> {springen,springen} >>> >>> which seems correct. >>> >>> >>> But a SELECT lexize('de_ispell', 'Autobahn') >>> >>> results in >>> >>> lexize >>> {autobahn} >>> >>> i would expect {auto,bahn, autobahn} >> >> >> Hmm, have you checked 'Autobahn' in ispell dictionary ? Does dictionary you >> used supports 'Z' flag for compound words ? > > Autobahn is in the ispell dictionary. What does a ispell dictionary need to > support the Z flag??? > Try ispell -C Autobahn search 'compound' in 'man ispell' for details. the problem exists only if ispell *does* splits word correctly while tsearch2 doesn't. You should find correct ispell dictionary for german or create it yourself. You may consult monzilla.net http://staff.science.uva.nl/~christof/monzilla/research/project-dr.html > > Timo > > > > > >> >> >>> >>> The new configuration after the compound word patch: >>> >> >> Seems you overestimate my capabilities :) >> >> >>> >>> Actions dict_name >>> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1> >>> dict_init >>> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1> >>> dict_initoption >>> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1> >>> dict_lexize >>> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1> >>> dict_comment >>> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1> >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> simple dex_init(text) /NULL/ >>> dex_lexize(internal,internal,integer) Simple example of dictionary. >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> en_stem snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop >>> snb_lexize(internal,internal,integer) English Stemmer. Snowball. >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> ru_stem snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop >>> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> ispell_template spell_init(text) /NULL/ >>> spell_lexize(internal,internal,integer) ISpell interface. Must have >>> .dict and .aff files >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> synonym syn_init(text) /NULL/ >>> syn_lexize(internal,internal,integer) Example of synonym dictionary >>> Edit >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> Delete >>> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >>> de_ispell spell_init(text) >>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict", >>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff", >>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" >>> spell_lexize(internal,internal,integer) /NULL/ >>> >>> >>> >>> Timo >>> >>> >>> Oleg Bartunov wrote: >>> >>>> Timo, >>>> >>>> please, check you apply patch for compound word support. >>>> What is version of postgresql ? >>>> Does ispell dict works for non-compound words ? >>>> >>>> Oleg >>>> >>>> On Fri, 5 Nov 2004, Timo Haberkern wrote: >>>> >>>>> Hi there, >>>>> >>>>> i have some troubles with my TSearch2 Installation. I have done this >>>>> installation as described in >>>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words >>>>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words> >>>>> I used the german myspell dictionary from >>>>> http://lingucomponent.openoffice.org/spell_dic.html and converted it >>>>> with >>>>> my2ispell >>>>> >>>>> Nearly everything is working fine so far, except two problems: >>>>> >>>>> 1.) The stopword-file seems to be ignored: If i try it with SELECT >>>>> to_tsvector("default_german", "ein Haus") i get "ein":1 "haus":2 >>>>> >>>>> ein should be a Stopword for german (and is defined the german.stop file >>>>> as >>>>> well) >>>>> >>>>> 2.) The compound words feature doesn"t work too. I have tried a lot of >>>>> words, >>>>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german", >>>>> "Fehlermeldung") >>>>> i only get >>>>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated >>>>> entries. Is there anything wrong with the dictonary or my configuration? >>>>> >>>>> >>>>> My current configuration: >>>>> >>>>> pg_ts_cfg: >>>>> >>>>> default default C >>>>> default_russian default ru_RU.KOI8-R >>>>> simple default NULL >>>>> default_german default de_DE.ISO8859-1 >>>>> pg_ts_cfgmap: >>>>> >>>>> default_german host {simple} >>>>> default_german hword {simple} >>>>> default_german int {simple} >>>>> default_german nlhword {simple} >>>>> default_german nlpart_hword {simple} >>>>> default_german nlword {simple} >>>>> default_german part_hword {simple} >>>>> default_german sfloat {simple} >>>>> default_german uint {simple} >>>>> default_german uri {simple} >>>>> default_german url {simple} >>>>> default_german version {simple} >>>>> default_german word {simple} >>>>> default_german lpart_hword {de_ispell,german_snowball} >>>>> default_german lword {de_ispell,german_snowball} >>>>> default_german lhword {de_ispell,german_snowball} >>>>> >>>>> >>>>> pg_ts_dict: >>>>> >>>>> de_ispell | 17166 | >>>>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict", >>>>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff", >>>>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" | >>>>> 17167 | NULL >>>>> german_snowball | 17357 | NULL | 17162 | Snowball stemmer for >>>>> german >>>>> >>>>> >>>>> >>>>> Can anyone help me? >>>>> >>>>> regards >>>>> >>>>> Timo >>>>> >>>>> >>>>> ---------------------------(end of broadcast)--------------------------- >>>>> TIP 4: Don't 'kill -9' the postmaster >>>>> >>>> >>>> Regards, >>>> Oleg >>>> _____________________________________________________________ >>>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>>> Sternberg Astronomical Institute, Moscow University (Russia) >>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>>> phone: +007(095)939-16-83, +007(095)939-23-83 >>>> >>>> ---------------------------(end of broadcast)--------------------------- >>>> TIP 2: you can get off all lists at once with the unregister command >>>> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >>>> >>>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >> Sternberg Astronomical Institute, Moscow University (Russia) >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(095)939-16-83, +007(095)939-23-83 >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
В списке pgsql-general по дате отправления: