Re: TSearch2: Problems with compound words and stop words

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: TSearch2: Problems with compound words and stop words
Дата
Msg-id Pine.GSO.4.61.0411171751010.18871@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: TSearch2: Problems with compound words and stop words  (Timo Haberkern <thaberkern@emedia-office.de>)
Ответы Re: TSearch2: Problems with compound words and stop words  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
On Wed, 17 Nov 2004, Timo Haberkern wrote:

> sorry for the late answer, i was on holyday,
>
> see my remarks below
>
>
> Oleg Bartunov wrote:
>
>> On Fri, 5 Nov 2004, Timo Haberkern wrote:
>>
>>> Oleg,
>>>
>>> i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch
>>> yesterday. The configuration changed a little bit but the result is the
>>> same. I get no compound words. I'm using the locale de_DE with encoding
>>> ISO8859-1 for the database.
>>>
>>> I think i spell is working correctly except the compound words. If i try
>>>
>>> SELECT lexize('de_ispell', 'springt')
>>>
>>> i get
>>>
>>> lexize
>>> {springen,springen}
>>>
>>> which seems correct.
>>>
>>>
>>> But a SELECT lexize('de_ispell', 'Autobahn')
>>>
>>> results in
>>>
>>> lexize
>>> {autobahn}
>>>
>>> i would expect {auto,bahn, autobahn}
>>
>>
>> Hmm, have you checked 'Autobahn' in ispell dictionary ? Does dictionary you
>> used supports 'Z' flag for compound words ?
>
> Autobahn is in the ispell dictionary. What does a ispell dictionary  need to
> support the Z flag???
>

Try ispell -C Autobahn
search 'compound' in  'man ispell' for details.
the problem exists only if ispell *does* splits word correctly while tsearch2
doesn't. You should find correct ispell dictionary for german or create it
yourself. You may consult monzilla.net
http://staff.science.uva.nl/~christof/monzilla/research/project-dr.html


>
> Timo
>
>
>
>
>
>>
>>
>>>
>>> The new configuration after the compound word patch:
>>>
>>
>> Seems you overestimate my capabilities :)
>>
>>
>>>
>>> Actions     dict_name
>>>
<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1>

>>> dict_init
>>>
<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1>

>>> dict_initoption
>>>
<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1>

>>> dict_lexize
>>>
<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1>

>>> dict_comment
>>>
<http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1>

>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> simple     dex_init(text)     /NULL/
>>> dex_lexize(internal,internal,integer) Simple example of dictionary.
>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> en_stem     snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop
>>> snb_lexize(internal,internal,integer)     English Stemmer. Snowball.
>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> ru_stem     snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop
>>> snb_lexize(internal,internal,integer)     Russian Stemmer. Snowball.
>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> ispell_template     spell_init(text)     /NULL/
>>> spell_lexize(internal,internal,integer)     ISpell interface. Must have
>>> .dict and .aff files
>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> synonym     syn_init(text)     /NULL/
>>> syn_lexize(internal,internal,integer) Example of synonym dictionary
>>> Edit
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> Delete
>>>
<http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>

>>> de_ispell     spell_init(text)
>>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict",
>>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff",
>>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop"
>>> spell_lexize(internal,internal,integer)     /NULL/
>>>
>>>
>>>
>>> Timo
>>>
>>>
>>> Oleg Bartunov wrote:
>>>
>>>> Timo,
>>>>
>>>> please, check you apply patch for compound word support.
>>>> What is version of postgresql ?
>>>> Does ispell dict works for non-compound words ?
>>>>
>>>>     Oleg
>>>>
>>>> On Fri, 5 Nov 2004, Timo Haberkern wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> i have some troubles with my TSearch2 Installation. I have done this
>>>>> installation as described in
>>>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words
>>>>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words>
>>>>> I used the german myspell dictionary from
>>>>> http://lingucomponent.openoffice.org/spell_dic.html and converted it
>>>>> with
>>>>> my2ispell
>>>>>
>>>>> Nearly everything is working fine so far, except two problems:
>>>>>
>>>>> 1.) The stopword-file seems to be ignored: If i try it with SELECT
>>>>> to_tsvector("default_german", "ein Haus") i get     "ein":1 "haus":2
>>>>>
>>>>> ein should be a Stopword for german (and is defined the german.stop file
>>>>> as
>>>>> well)
>>>>>
>>>>> 2.) The compound words feature doesn"t work too. I have tried a lot of
>>>>> words,
>>>>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german",
>>>>> "Fehlermeldung")
>>>>> i only get
>>>>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated
>>>>> entries. Is there anything wrong with the dictonary or my configuration?
>>>>>
>>>>>
>>>>> My current configuration:
>>>>>
>>>>> pg_ts_cfg:
>>>>>
>>>>> default    default    C
>>>>> default_russian    default    ru_RU.KOI8-R
>>>>> simple    default    NULL
>>>>> default_german    default    de_DE.ISO8859-1
>>>>>     pg_ts_cfgmap:
>>>>>
>>>>> default_german    host    {simple}
>>>>> default_german    hword    {simple}
>>>>> default_german    int    {simple}
>>>>> default_german    nlhword    {simple}
>>>>> default_german    nlpart_hword    {simple}
>>>>> default_german    nlword    {simple}
>>>>> default_german    part_hword    {simple}
>>>>> default_german    sfloat    {simple}
>>>>> default_german    uint    {simple}
>>>>> default_german    uri    {simple}
>>>>> default_german    url    {simple}
>>>>> default_german    version    {simple}
>>>>> default_german    word    {simple}
>>>>> default_german    lpart_hword    {de_ispell,german_snowball}
>>>>> default_german    lword    {de_ispell,german_snowball}
>>>>> default_german    lhword    {de_ispell,german_snowball}
>>>>>
>>>>>
>>>>> pg_ts_dict:
>>>>>
>>>>> de_ispell | 17166    |
>>>>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict",
>>>>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff",
>>>>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop"    |
>>>>> 17167 | NULL
>>>>> german_snowball    | 17357 | NULL    | 17162 | Snowball stemmer for
>>>>> german
>>>>>
>>>>>
>>>>>
>>>>> Can anyone help me?
>>>>>
>>>>> regards
>>>>>
>>>>> Timo
>>>>>
>>>>>
>>>>> ---------------------------(end of broadcast)---------------------------
>>>>> TIP 4: Don't 'kill -9' the postmaster
>>>>>
>>>>
>>>>     Regards,
>>>>         Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>>> Sternberg Astronomical Institute, Moscow University (Russia)
>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>> phone: +007(095)939-16-83, +007(095)939-23-83
>>>>
>>>> ---------------------------(end of broadcast)---------------------------
>>>> TIP 2: you can get off all lists at once with the unregister command
>>>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>>>
>>>>
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Fuhr
Дата:
Сообщение: Re: Knowing when a row was last updated
Следующее
От: Jeff Eckermann
Дата:
Сообщение: Re: Certifications in military environment